SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Project Gutenberg as an
Information Retrieval System
Kai Li
IST616 Final Assignment
2012.11
Introduction to Project Gutenberg
• The first digital library project in the
world, initiated by the late Michael Hart in
1971.
• Project Gutenberg currently offers more than
41,000 public domain eBooks (in more than
50 languages) as well as other resources (like
scientific data).
• Website: http://www.gutenberg.org/
Intended Audience and Functionalities
• Intended audience: eBook readers and general
users.
• Functionalities: portal of the project, eBook
repository and discovery system.
Mobile Site
• There are two kinds of
interfaces of this
website based on the
device one uses. Only
the traditional nonmobile interface will be
examined in this
presentation due to the
limited scope of the
assignment.
Indexing System
Issues of Indexing/Tag System
• There is a searching box as well as a tag called
“Search Catalog”;
– The searching box is too small to be noticed;
– The tag “Search Catalog” actually leads users to a
page where one cannot find the searching box,
but only some browsing selections;

• There are a number of repetitive tags on the
left-hand bar and on the top of the page;
– For example, the tag “Book Categories”.
Means To Find a Book
• Searching
• Browsing
– By categories
Searching
Issues of Searching
• The display is different from most of the
interfaces one can see on the Internet, which
may result some difficulties for new users;
• Due to a lack of navigation mechanism and
the function to refine the result by facets, it’s
extremely inconvenient to locate a resource if
the result is big.
Precision and Recall
• The retrieval method used by this website is a
string-matching method, which matches the
string inputted by the user with the full-text of all
the resources.
– “Or” relationship used for multiple words.

• Because the scope of the index is the full-text, the
recall is higher than traditional library catalogs;
however, since it is still a string-matching
method, the precision is still not very good.
Browsing
Issues of Browsing
• There are three searching tools offered on this
page, which should have been offered on the
searching page rather than this one.
• Only one standard can be used to limit the
resources at the same time. And after one
chooses a certain standard, there is no other
way to further limit the result.
Categories/Classification
• There are two tiers of the “classification” on
this website:
– Subcategories: 23
• These subcategories are called “bookshelf” too, which
is confusing.

– Bookshelves: 133
• Which can be seen as a lower level than subcategories.
However, not all bookshelves are linked to a given
subcategory.
Overall Evaluation
• Advantages:
– Mobile functionalities:
• Mobile site
• QR codes

• Disadvantages:
– Poorly organized and
designed;
– Failing to display the full
richness of the metadata
on the website:
• LoC classification and
subject headings

– The interface being lack
of communication with
the users;
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Library Networks
Library NetworksLibrary Networks
Library Networks
 
Thesauri
ThesauriThesauri
Thesauri
 
Indest
IndestIndest
Indest
 
LIBRARY AUTOMATION.pptx
LIBRARY AUTOMATION.pptxLIBRARY AUTOMATION.pptx
LIBRARY AUTOMATION.pptx
 
Open access publication (RET workshop)
Open access publication (RET workshop)Open access publication (RET workshop)
Open access publication (RET workshop)
 
Scholarly Communications Presentation
Scholarly Communications PresentationScholarly Communications Presentation
Scholarly Communications Presentation
 
OPEN ACCESS RESOURCES
OPEN ACCESS RESOURCESOPEN ACCESS RESOURCES
OPEN ACCESS RESOURCES
 
Spiral of Scientific method
Spiral of Scientific methodSpiral of Scientific method
Spiral of Scientific method
 
What is a DOI?
What is a DOI?What is a DOI?
What is a DOI?
 
RESOURCE SHARING: A LIBRARY PERCEPTIVE
RESOURCE SHARING: A LIBRARY PERCEPTIVE RESOURCE SHARING: A LIBRARY PERCEPTIVE
RESOURCE SHARING: A LIBRARY PERCEPTIVE
 
Z39.50 basics
Z39.50 basicsZ39.50 basics
Z39.50 basics
 
Marketing and Public Relations in an Academic Library
Marketing and Public Relations in an Academic LibraryMarketing and Public Relations in an Academic Library
Marketing and Public Relations in an Academic Library
 
Digital Library Software
Digital Library SoftwareDigital Library Software
Digital Library Software
 
Unisist ppt
Unisist pptUnisist ppt
Unisist ppt
 
BIBFRAME
BIBFRAMEBIBFRAME
BIBFRAME
 
LIBSYS Limited Profile
LIBSYS Limited ProfileLIBSYS Limited Profile
LIBSYS Limited Profile
 
Serial control
Serial control Serial control
Serial control
 
Presentation federated search
Presentation federated searchPresentation federated search
Presentation federated search
 
Institutional repository
Institutional repositoryInstitutional repository
Institutional repository
 
Resource Sharing and Networking
Resource Sharing and NetworkingResource Sharing and Networking
Resource Sharing and Networking
 

Ähnlich wie Project Gutenberg as Information Retrieval System

Lost in Translation:
Lost in Translation: Lost in Translation:
Lost in Translation:
tmnewberry
 
What Public Library Users Want and How to
What Public Library Users Want and How to What Public Library Users Want and How to
What Public Library Users Want and How to
Nina McHale
 
Device agnostic discovery using drupal and bibliocommons
Device agnostic discovery using drupal and bibliocommonsDevice agnostic discovery using drupal and bibliocommons
Device agnostic discovery using drupal and bibliocommons
onlinenw
 

Ähnlich wie Project Gutenberg as Information Retrieval System (20)

Lost in Translation:
Lost in Translation: Lost in Translation:
Lost in Translation:
 
Leveraging Library Thing (2009)
Leveraging Library Thing (2009)Leveraging Library Thing (2009)
Leveraging Library Thing (2009)
 
What Public Library Users Want and How to
What Public Library Users Want and How to What Public Library Users Want and How to
What Public Library Users Want and How to
 
K3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibraryK3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibrary
 
K3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibraryK3 edith falk_discoverytoolslibrary
K3 edith falk_discoverytoolslibrary
 
Web-Scale Discovery: Post Implementation
Web-Scale Discovery: Post ImplementationWeb-Scale Discovery: Post Implementation
Web-Scale Discovery: Post Implementation
 
web opac
 web opac  web opac
web opac
 
Federated to library discovery platfoms
Federated to library discovery platfomsFederated to library discovery platfoms
Federated to library discovery platfoms
 
Web OPAC
Web OPAC Web OPAC
Web OPAC
 
WorldCat Local@Auraria
WorldCat Local@AurariaWorldCat Local@Auraria
WorldCat Local@Auraria
 
Presentacion tics (1)
Presentacion tics (1)Presentacion tics (1)
Presentacion tics (1)
 
Discovery on a budget
Discovery on a budgetDiscovery on a budget
Discovery on a budget
 
Discovery on a budget: Improved searching without a Web-scale discovery product
Discovery on a budget: Improved searching without a Web-scale discovery productDiscovery on a budget: Improved searching without a Web-scale discovery product
Discovery on a budget: Improved searching without a Web-scale discovery product
 
Rethinking Library Cooperatives: Prepared for the Program for Cooperative Cat...
Rethinking Library Cooperatives: Prepared for the Program for Cooperative Cat...Rethinking Library Cooperatives: Prepared for the Program for Cooperative Cat...
Rethinking Library Cooperatives: Prepared for the Program for Cooperative Cat...
 
Library portal by Gaurav Boudh
Library portal by Gaurav BoudhLibrary portal by Gaurav Boudh
Library portal by Gaurav Boudh
 
Web Scale Discovery Services: Google like search experience
Web Scale Discovery Services: Google like search experienceWeb Scale Discovery Services: Google like search experience
Web Scale Discovery Services: Google like search experience
 
Device agnostic discovery using drupal and bibliocommons
Device agnostic discovery using drupal and bibliocommonsDevice agnostic discovery using drupal and bibliocommons
Device agnostic discovery using drupal and bibliocommons
 
Creating better user interfaces for libraries catalogues: how to present and ...
Creating better user interfaces for libraries catalogues: how to present and ...Creating better user interfaces for libraries catalogues: how to present and ...
Creating better user interfaces for libraries catalogues: how to present and ...
 
Role of libraries in research and scholarly communication
Role of libraries in research and scholarly communicationRole of libraries in research and scholarly communication
Role of libraries in research and scholarly communication
 
opacs.ppt
opacs.pptopacs.ppt
opacs.ppt
 

Mehr von Kai Li

Introduction to Visualizing Uncertainties
Introduction to Visualizing UncertaintiesIntroduction to Visualizing Uncertainties
Introduction to Visualizing Uncertainties
Kai Li
 
How Americans recognize libraries
How Americans recognize librariesHow Americans recognize libraries
How Americans recognize libraries
Kai Li
 
新一代的Opac服务
新一代的Opac服务新一代的Opac服务
新一代的Opac服务
Kai Li
 
Augmented reality @ libraries
Augmented reality @ librariesAugmented reality @ libraries
Augmented reality @ libraries
Kai Li
 

Mehr von Kai Li (20)

Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...Using a keyword extraction pipeline to understand concepts in future work sec...
Using a keyword extraction pipeline to understand concepts in future work sec...
 
Knowledge production between laboratories and scientific texts: a proposal of...
Knowledge production between laboratories and scientific texts: a proposal of...Knowledge production between laboratories and scientific texts: a proposal of...
Knowledge production between laboratories and scientific texts: a proposal of...
 
Data and Software in Scientific Activities: a Literature Review
Data and Software in Scientific Activities: a Literature ReviewData and Software in Scientific Activities: a Literature Review
Data and Software in Scientific Activities: a Literature Review
 
A metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposalA metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposal
 
Software Citation, Reuse and Metadata Considerations: An Exploratory Study ...
Software Citation, Reuse and Metadata Considerations:  An Exploratory Study ...Software Citation, Reuse and Metadata Considerations:  An Exploratory Study ...
Software Citation, Reuse and Metadata Considerations: An Exploratory Study ...
 
On metaphor: a book review of Metaphors we live by
On metaphor: a book review of Metaphors we live byOn metaphor: a book review of Metaphors we live by
On metaphor: a book review of Metaphors we live by
 
Visual perception and mixed-initiative interaction for assisted visualization...
Visual perception and mixed-initiative interaction for assisted visualization...Visual perception and mixed-initiative interaction for assisted visualization...
Visual perception and mixed-initiative interaction for assisted visualization...
 
A family tree of graph types
A family tree of graph typesA family tree of graph types
A family tree of graph types
 
Introduction to Visualizing Uncertainties
Introduction to Visualizing UncertaintiesIntroduction to Visualizing Uncertainties
Introduction to Visualizing Uncertainties
 
InfoVis Final Project: NBA in historical context
InfoVis Final Project: NBA in historical contextInfoVis Final Project: NBA in historical context
InfoVis Final Project: NBA in historical context
 
Introduction to bibframe
Introduction to bibframeIntroduction to bibframe
Introduction to bibframe
 
Grassroots Read: Planning, Marketing and Assessing Plan
Grassroots Read: Planning, Marketing and Assessing PlanGrassroots Read: Planning, Marketing and Assessing Plan
Grassroots Read: Planning, Marketing and Assessing Plan
 
RDFa: an introduction
RDFa: an introductionRDFa: an introduction
RDFa: an introduction
 
Culture Classification: An Analysis
Culture Classification: An AnalysisCulture Classification: An Analysis
Culture Classification: An Analysis
 
RDA in China
RDA in ChinaRDA in China
RDA in China
 
How Americans recognize libraries
How Americans recognize librariesHow Americans recognize libraries
How Americans recognize libraries
 
How libraries use 新浪微博
How libraries use 新浪微博How libraries use 新浪微博
How libraries use 新浪微博
 
新一代的Opac服务
新一代的Opac服务新一代的Opac服务
新一代的Opac服务
 
Ipad and Library
Ipad and LibraryIpad and Library
Ipad and Library
 
Augmented reality @ libraries
Augmented reality @ librariesAugmented reality @ libraries
Augmented reality @ libraries
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 

Project Gutenberg as Information Retrieval System

  • 1. Project Gutenberg as an Information Retrieval System Kai Li IST616 Final Assignment 2012.11
  • 2. Introduction to Project Gutenberg • The first digital library project in the world, initiated by the late Michael Hart in 1971. • Project Gutenberg currently offers more than 41,000 public domain eBooks (in more than 50 languages) as well as other resources (like scientific data). • Website: http://www.gutenberg.org/
  • 3. Intended Audience and Functionalities • Intended audience: eBook readers and general users. • Functionalities: portal of the project, eBook repository and discovery system.
  • 4. Mobile Site • There are two kinds of interfaces of this website based on the device one uses. Only the traditional nonmobile interface will be examined in this presentation due to the limited scope of the assignment.
  • 6. Issues of Indexing/Tag System • There is a searching box as well as a tag called “Search Catalog”; – The searching box is too small to be noticed; – The tag “Search Catalog” actually leads users to a page where one cannot find the searching box, but only some browsing selections; • There are a number of repetitive tags on the left-hand bar and on the top of the page; – For example, the tag “Book Categories”.
  • 7. Means To Find a Book • Searching • Browsing – By categories
  • 9. Issues of Searching • The display is different from most of the interfaces one can see on the Internet, which may result some difficulties for new users; • Due to a lack of navigation mechanism and the function to refine the result by facets, it’s extremely inconvenient to locate a resource if the result is big.
  • 10. Precision and Recall • The retrieval method used by this website is a string-matching method, which matches the string inputted by the user with the full-text of all the resources. – “Or” relationship used for multiple words. • Because the scope of the index is the full-text, the recall is higher than traditional library catalogs; however, since it is still a string-matching method, the precision is still not very good.
  • 12. Issues of Browsing • There are three searching tools offered on this page, which should have been offered on the searching page rather than this one. • Only one standard can be used to limit the resources at the same time. And after one chooses a certain standard, there is no other way to further limit the result.
  • 13. Categories/Classification • There are two tiers of the “classification” on this website: – Subcategories: 23 • These subcategories are called “bookshelf” too, which is confusing. – Bookshelves: 133 • Which can be seen as a lower level than subcategories. However, not all bookshelves are linked to a given subcategory.
  • 14. Overall Evaluation • Advantages: – Mobile functionalities: • Mobile site • QR codes • Disadvantages: – Poorly organized and designed; – Failing to display the full richness of the metadata on the website: • LoC classification and subject headings – The interface being lack of communication with the users;

Hinweis der Redaktion

  1. The project has been accepting eBooks uploaded by members which are not protected by US copyright laws.
  2. Because this website is also the main page of the whole project, the audience include not only the people who want to get the eBooks but also people who are interested in the project itself.
  3. The indexing system is actually very confusing. This slide lists some of the problems.
  4. The searching result page: related bookshelves and subjects are displayed in front of all the books; books are ranked by popularity (times of download), but one can also choose to sort alphabetically or by released date.
  5. The interface was very unintuitive for me when I first used it.If the book is not ranked high in terms of alphabetic, popularity or released date, and if the result is big, it’s almost impossible for one to find a specific book. Like traditional library catalogs, this interface doesn’t support finding an unknown book very well.
  6. String-matching method cannot solve the issues of one words with multiple meanings or different words bearing the same meaning.
  7. Methods: by author; by title; by language; by recently added; by popularity.One can also browse the website by LC classification (as well as LCSH). However, they are not listed on this page. LC classification can be found only from the book pages.
  8. Not all bookshelves can be linked with a subcategory.Moreover, there are also some bookshelves containing materials in other languages that is not inside the above system, which indicates that the classification scheme in English may not cover all the resources on the website.
  9. Many libraries and other parties have imported the metadata of Gutenberg eBooks to the local systems, which makes the issues of this website a less important one.But this is still a problem!