SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Downloaden Sie, um offline zu lesen
© Copyright 2013 LucidWorks
Solr Powered Libraries:
A survey of the world's knowledge bases
May 2, 2013
Presented by Erik Hatcher
Thursday, May 2, 13
© 2013 LucidWorks
Abstract
Using Apache Lucene and Solr search technologies, information and
knowledge have become vastly more searchable, findable, and accessible.
Because scholars and researchers are some of the most demanding users of
search systems, the problems encountered by the implementers are complex.
For example, many of the applications built on these technologies also thrive on
intentionally designed-in serendipitous discovery capabilities, bringing to light
previously unknown, yet related and potentially interesting, content.
Libraries and other public knowledge-sharing environments, such as
Wikipedia, generally embrace "open source" and community improving
contributions as core principles, making a lovely synergy with the power,
features, and community-driven ecosystem provided by Lucene and Solr.
This talk will introduce you to several Solr powered library-related systems,
detail how they work, and leave you with lessons learned that can be applied to
your applications.
2
Thursday, May 2, 13
© 2013 LucidWorks
Real Solar Powered Library !
•http://www.ktsm.com/news/texas-library-runs-sunshine
3
Thursday, May 2, 13
© 2013 LucidWorks
Card carrying library geek
•Applied Research in Patacriticism (ARP)
- Rossetti Archive: http://www.rossettiarchive.org
- NINES: http://www.nines.org/
- Collex: http://www.collex.org
•Blacklight
- originated as an implementation of Solr Flare
•Presentations
- http://code4lib.org/conference: 2007, 2009, 2010, 2011, 2013
- Library of Congress: "Solr Powered Libraries" (2007)
»http://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=4113
- EBTI/CBETA Conference 2008
- Publication: “Library 2.0 Initiatives in Academic Libraries”
•Windsor Lucene Summit
•eIFL-FOSS
4
Thursday, May 2, 13
© 2013 LucidWorks
Rossetti Archive
5
Thursday, May 2, 13
© 2013 LucidWorks
NINES/Collex
6
Thursday, May 2, 13
© 2013 LucidWorks
Card catalog
•the original inverted index
7
http://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Files.jpg
Thursday, May 2, 13
© 2013 LucidWorks
•http://openlibrary.org/
- project of the Internet Archive
•Goal: "A (community editable) web page for every book"
8
Thursday, May 2, 13
© 2013 LucidWorks
dp.la - Digital Public Library of America
9
Lucene/ElasticSearch Powered
Thursday, May 2, 13
© 2013 LucidWorks
Wikimedia/Wikipedia/MediaWiki
•Solr powered: translation memory service, GeoData extension,
etc
•"heavily modified Lucene" powers main site search currently
10
Thursday, May 2, 13
© 2013 LucidWorks
HathiTrust
• "partnership of major research institutions and libraries working to ensure
that the cultural record is preserved and accessible long into the future."
• 10.5M books, 12TB OCR+metadata, hundreds of languages
- "Books are different"
- http://code4lib.org/conference/2013/burton-west
• http://www.hathitrust.org/blogs/large-scale-search
- http://www.hathitrust.org/blogs/large-scale-search/too-many-words
- "org.apache.solr.common.SolrException: Impossible Exception"
- CommonGrams
- word segmentation: autoGeneratePhraseQueries="false"
• HathiTrust Research Center
- The infrastructure includes an entrance portal, search and collection-building tools (using
Blacklight), ... analysis algorithms that can be run against the HathiTrust public domain corpus
(more than 3 million volumes). In addition to the production services, the HTRC offers a
development “sandbox”. The sandbox runs against non-Google scanned content (about
260,000 volumes) and provides a test-bed for interested researchers to experiment with writing
their own algorithms for use in the HTRC infrastructure.
11
Thursday, May 2, 13
© 2013 LucidWorks
Smithsonian Institution
•http://collections.si.edu
•Many disparate data sources:
- 19 museums, 20 libraries, 14 archives,1 National Zoo,1 Astrophysical
Observatory, research centers in Panama,Boston, New York, Maryland,and
Virginia
•"Documents" of all varieties:
- Photographs, paintings, manuscripts, letters, postage stamps,scientific
specimens, rockets, airplanes, postcards, sound recordings, posters,
decorative arts, ceramics, maps, sculptures, publication papers, books, trade
catalogs, etc
•User tagging, negative/exclude filtering, DIH SolrEntityProcessor
•http://bit.ly/13P41YJ
- http://www.basistech.com/pdf/events/open-source-search-conference/
oss-2011-wang-steps-toward-open-government.pdf
12
Thursday, May 2, 13
© 2013 LucidWorks
13
Thursday, May 2, 13
© 2013 LucidWorks
14
Thursday, May 2, 13
© 2013 LucidWorks
•SerialsSolutions Summon
•http://www.serialssolutions.com/en/services/summon
•SaaS, single unified index, match & merge
15
Thursday, May 2, 13
© 2013 LucidWorks
Astrophysics Data System Labs
•Smithsonian, NASA, Harvard
•http://adslabs.org
16
http://code4lib.org/conference/2013/luker
Thursday, May 2, 13
© 2013 LucidWorks
•vufind.org
•Powers main HathiTrust UI (currently) and many more
- see http://vufind.org/wiki/installation_status
17
Thursday, May 2, 13
© 2013 LucidWorks
18
Thursday, May 2, 13
© 2013 LucidWorks
• "Blacklight is an open source Ruby on Rails gem that provides a discovery interface for
any Solr index. Blacklight provides a default user interface which is customizable via the
standard Rails (templating) mechanisms. Blacklight accommodates heterogeneous
data, allowing different information displays for different types of objects."
- http://projectblacklight.org
• Founded at the University of Virginia (2007): search.lib.virginia.edu
- UV-A solar radiation == blacklight
• Initial contributors: UVa, Stanford, JHU, WGBH
• University of Hull, United States Holocaust Memorial Museum, University of Wisconsin-
Madison, Tufts, Australian gov't (Natural Resource Management), Penn State's
ScholarSphere, Northwestern, New York Public Library, NCSU, Columbia University,
Agriculture Network Information Center (USDA), alicelaw.org (American Legislative and
Issue Campaign Exchange, is a one-stop web-based public library of progressive state
and local laws), and many more
• http://projecthydra.org/ uses Blacklight as UI component
19
Thursday, May 2, 13
© 2013 LucidWorks
searchworks at Stanford
20
Thursday, May 2, 13
© 2013 LucidWorks
Advanced search at Stanford's searchworks
21
Thursday, May 2, 13
© 2013 LucidWorks
searchworks:
Mapping Text Boxes to Solr query pieces
•http://code4lib.org/conference/2010/dushay_keck
22
Thursday, May 2, 13
© 2013 LucidWorks
•https://catalyst.library.jhu.edu/
23
Thursday, May 2, 13
© 2013 LucidWorks
Rock and Roll!
•m/
24
Thursday, May 2, 13
© 2013 LucidWorks
Community and Resources
•code4lib:
- http://www.code4lib.org/
•HathiTrust folks
- http://www.hathitrust.org/blogs/large-scale-search
- http://robotlibrarian.billdueber.com/
•http://bighumanities.net/
- The Workshop on Big Humanities will be held in conjunction with the 2013
IEEE International Conference on Big Data (IEEE BigData 2013), which will
take place between 6-9 October 2013 in Silicon Valley, California, USA, and
which provides a leading international forum for disseminating the latest
research in the growing field of “big data
25
Thursday, May 2, 13
© 2013 LucidWorks
26
http://heatherbrewer.com/blog/2013/04/15/libraries-rock/
Thursday, May 2, 13

Weitere ähnliche Inhalte

Was ist angesagt?

Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...lisld
 
Next Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformNext Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformTrevor Owens
 
Building the New Open Linked Library
Building the New Open Linked LibraryBuilding the New Open Linked Library
Building the New Open Linked LibraryJoel Richard
 
Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...lisld
 
The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...Trish Rose-Sandler
 
Challenges and opportunities for academic libraries
Challenges and opportunities for academic librariesChallenges and opportunities for academic libraries
Challenges and opportunities for academic librarieslisld
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Jon Voss
 
Collections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collectionCollections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collectionlisld
 
Documenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repositoryDocumenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repositoryChris Freeland
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosOCLC
 
Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Chris Freeland
 
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...Getaneh Alemu
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM
 
MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1MRJPM
 
Organizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in MissouriOrganizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in MissouriChris Freeland
 
[[edit]] this GLAM
[[edit]] this GLAM[[edit]] this GLAM
[[edit]] this GLAMwittylama
 
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...Chris Freeland
 

Was ist angesagt? (20)

International Digital Library Initiatives
International Digital Library InitiativesInternational Digital Library Initiatives
International Digital Library Initiatives
 
Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...Rightscaling, engagement, learning: reconfiguring the library for a network e...
Rightscaling, engagement, learning: reconfiguring the library for a network e...
 
Next Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformNext Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital Platform
 
Building the New Open Linked Library
Building the New Open Linked LibraryBuilding the New Open Linked Library
Building the New Open Linked Library
 
Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...Working collaboratively: scaling infrastructure, services, learning and innov...
Working collaboratively: scaling infrastructure, services, learning and innov...
 
The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...The Biodiversity Heritage Library and bibliographic citations: towards new u...
The Biodiversity Heritage Library and bibliographic citations: towards new u...
 
Challenges and opportunities for academic libraries
Challenges and opportunities for academic librariesChallenges and opportunities for academic libraries
Challenges and opportunities for academic libraries
 
Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.Intro to Linked Open Data in Libraries Archives & Museums.
Intro to Linked Open Data in Libraries Archives & Museums.
 
Collections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collectionCollections unbound: collection directions and the RLUK collective collection
Collections unbound: collection directions and the RLUK collective collection
 
Documenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repositoryDocumenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repository
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...
 
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...Using Europeana for learning & teaching:  EMMA MOOC “Digital library in princ...
Using Europeana for learning & teaching: EMMA MOOC “Digital library in princ...
 
Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...
Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...
Burke, "Discovery Tools - Changing the Nature of Collections in an Item-cente...
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1MichalkoLibrary Collaboration iSchoolShareable-1
MichalkoLibrary Collaboration iSchoolShareable-1
 
Organizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in MissouriOrganizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in Missouri
 
[[edit]] this GLAM
[[edit]] this GLAM[[edit]] this GLAM
[[edit]] this GLAM
 
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
 
Islandora and Omeka: Building U of T Digital Collections & Exhibits
Islandora and Omeka: Building U of T Digital Collections & ExhibitsIslandora and Omeka: Building U of T Digital Collections & Exhibits
Islandora and Omeka: Building U of T Digital Collections & Exhibits
 

Ähnlich wie Solr Powered Libraries

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Providing First World Library services By using Koha, DSpace, vufind and Drupal
Providing First World Library services By using  Koha, DSpace, vufind and DrupalProviding First World Library services By using  Koha, DSpace, vufind and Drupal
Providing First World Library services By using Koha, DSpace, vufind and DrupalNur Ahammad
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futureslisld
 
Agile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital libraryAgile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital libraryJisc
 
Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Karen S Calhoun
 
Charper.lawdi.20130531
Charper.lawdi.20130531Charper.lawdi.20130531
Charper.lawdi.20130531charper
 
Open Source ILS Add-Ons
Open Source ILS Add-OnsOpen Source ILS Add-Ons
Open Source ILS Add-Onsloriayre
 
Fuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network FlowFuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network Flowkramsey
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless OpportunityRachel Frick
 
LOD/LAM Presentation
LOD/LAM PresentationLOD/LAM Presentation
LOD/LAM PresentationHafabe
 
Virtual systems
Virtual systemsVirtual systems
Virtual systemsjsutclif
 
Scholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to showScholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to showDerek Keats
 
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository MeetingNetworking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository MeetingKaren S Calhoun
 

Ähnlich wie Solr Powered Libraries (20)

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Providing First World Library services By using Koha, DSpace, vufind and Drupal
Providing First World Library services By using  Koha, DSpace, vufind and DrupalProviding First World Library services By using  Koha, DSpace, vufind and Drupal
Providing First World Library services By using Koha, DSpace, vufind and Drupal
 
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today..."In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
 
Library discovery: past, present and some futures
Library discovery: past, present and some futuresLibrary discovery: past, present and some futures
Library discovery: past, present and some futures
 
Agile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital libraryAgile resources on the open web …. a global digital library
Agile resources on the open web …. a global digital library
 
Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008Calhoun Rbms Rev June 2008
Calhoun Rbms Rev June 2008
 
Charper.lawdi.20130531
Charper.lawdi.20130531Charper.lawdi.20130531
Charper.lawdi.20130531
 
Open Source ILS Add-Ons
Open Source ILS Add-OnsOpen Source ILS Add-Ons
Open Source ILS Add-Ons
 
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
NISO Virtual Conference: Web-Scale Discovery Services: Transforming Access to...
 
Fuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network FlowFuller Disclosure: Getting More Collections into the Network Flow
Fuller Disclosure: Getting More Collections into the Network Flow
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless Opportunity
 
The Open Access Community, and OAIster
The Open Access Community, and OAIsterThe Open Access Community, and OAIster
The Open Access Community, and OAIster
 
LOD/LAM Presentation
LOD/LAM PresentationLOD/LAM Presentation
LOD/LAM Presentation
 
Open access (1)
Open access (1)Open access (1)
Open access (1)
 
Virtual systems
Virtual systemsVirtual systems
Virtual systems
 
Oair du
Oair duOair du
Oair du
 
Scholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to showScholarship in a connected world: New ways to know, new ways to show
Scholarship in a connected world: New ways to know, new ways to show
 
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository MeetingNetworking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
Networking Repositories, Optimizing Impact: Georgia Knowledge Repository Meeting
 
Open access
Open accessOpen access
Open access
 

Mehr von Erik Hatcher

Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 

Mehr von Erik Hatcher (20)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Solr Powered Libraries

  • 1. © Copyright 2013 LucidWorks Solr Powered Libraries: A survey of the world's knowledge bases May 2, 2013 Presented by Erik Hatcher Thursday, May 2, 13
  • 2. © 2013 LucidWorks Abstract Using Apache Lucene and Solr search technologies, information and knowledge have become vastly more searchable, findable, and accessible. Because scholars and researchers are some of the most demanding users of search systems, the problems encountered by the implementers are complex. For example, many of the applications built on these technologies also thrive on intentionally designed-in serendipitous discovery capabilities, bringing to light previously unknown, yet related and potentially interesting, content. Libraries and other public knowledge-sharing environments, such as Wikipedia, generally embrace "open source" and community improving contributions as core principles, making a lovely synergy with the power, features, and community-driven ecosystem provided by Lucene and Solr. This talk will introduce you to several Solr powered library-related systems, detail how they work, and leave you with lessons learned that can be applied to your applications. 2 Thursday, May 2, 13
  • 3. © 2013 LucidWorks Real Solar Powered Library ! •http://www.ktsm.com/news/texas-library-runs-sunshine 3 Thursday, May 2, 13
  • 4. © 2013 LucidWorks Card carrying library geek •Applied Research in Patacriticism (ARP) - Rossetti Archive: http://www.rossettiarchive.org - NINES: http://www.nines.org/ - Collex: http://www.collex.org •Blacklight - originated as an implementation of Solr Flare •Presentations - http://code4lib.org/conference: 2007, 2009, 2010, 2011, 2013 - Library of Congress: "Solr Powered Libraries" (2007) »http://www.loc.gov/today/cyberlc/feature_wdesc.php?rec=4113 - EBTI/CBETA Conference 2008 - Publication: “Library 2.0 Initiatives in Academic Libraries” •Windsor Lucene Summit •eIFL-FOSS 4 Thursday, May 2, 13
  • 5. © 2013 LucidWorks Rossetti Archive 5 Thursday, May 2, 13
  • 7. © 2013 LucidWorks Card catalog •the original inverted index 7 http://commons.wikimedia.org/wiki/File:Copyright_Card_Catalog_Files.jpg Thursday, May 2, 13
  • 8. © 2013 LucidWorks •http://openlibrary.org/ - project of the Internet Archive •Goal: "A (community editable) web page for every book" 8 Thursday, May 2, 13
  • 9. © 2013 LucidWorks dp.la - Digital Public Library of America 9 Lucene/ElasticSearch Powered Thursday, May 2, 13
  • 10. © 2013 LucidWorks Wikimedia/Wikipedia/MediaWiki •Solr powered: translation memory service, GeoData extension, etc •"heavily modified Lucene" powers main site search currently 10 Thursday, May 2, 13
  • 11. © 2013 LucidWorks HathiTrust • "partnership of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future." • 10.5M books, 12TB OCR+metadata, hundreds of languages - "Books are different" - http://code4lib.org/conference/2013/burton-west • http://www.hathitrust.org/blogs/large-scale-search - http://www.hathitrust.org/blogs/large-scale-search/too-many-words - "org.apache.solr.common.SolrException: Impossible Exception" - CommonGrams - word segmentation: autoGeneratePhraseQueries="false" • HathiTrust Research Center - The infrastructure includes an entrance portal, search and collection-building tools (using Blacklight), ... analysis algorithms that can be run against the HathiTrust public domain corpus (more than 3 million volumes). In addition to the production services, the HTRC offers a development “sandbox”. The sandbox runs against non-Google scanned content (about 260,000 volumes) and provides a test-bed for interested researchers to experiment with writing their own algorithms for use in the HTRC infrastructure. 11 Thursday, May 2, 13
  • 12. © 2013 LucidWorks Smithsonian Institution •http://collections.si.edu •Many disparate data sources: - 19 museums, 20 libraries, 14 archives,1 National Zoo,1 Astrophysical Observatory, research centers in Panama,Boston, New York, Maryland,and Virginia •"Documents" of all varieties: - Photographs, paintings, manuscripts, letters, postage stamps,scientific specimens, rockets, airplanes, postcards, sound recordings, posters, decorative arts, ceramics, maps, sculptures, publication papers, books, trade catalogs, etc •User tagging, negative/exclude filtering, DIH SolrEntityProcessor •http://bit.ly/13P41YJ - http://www.basistech.com/pdf/events/open-source-search-conference/ oss-2011-wang-steps-toward-open-government.pdf 12 Thursday, May 2, 13
  • 15. © 2013 LucidWorks •SerialsSolutions Summon •http://www.serialssolutions.com/en/services/summon •SaaS, single unified index, match & merge 15 Thursday, May 2, 13
  • 16. © 2013 LucidWorks Astrophysics Data System Labs •Smithsonian, NASA, Harvard •http://adslabs.org 16 http://code4lib.org/conference/2013/luker Thursday, May 2, 13
  • 17. © 2013 LucidWorks •vufind.org •Powers main HathiTrust UI (currently) and many more - see http://vufind.org/wiki/installation_status 17 Thursday, May 2, 13
  • 19. © 2013 LucidWorks • "Blacklight is an open source Ruby on Rails gem that provides a discovery interface for any Solr index. Blacklight provides a default user interface which is customizable via the standard Rails (templating) mechanisms. Blacklight accommodates heterogeneous data, allowing different information displays for different types of objects." - http://projectblacklight.org • Founded at the University of Virginia (2007): search.lib.virginia.edu - UV-A solar radiation == blacklight • Initial contributors: UVa, Stanford, JHU, WGBH • University of Hull, United States Holocaust Memorial Museum, University of Wisconsin- Madison, Tufts, Australian gov't (Natural Resource Management), Penn State's ScholarSphere, Northwestern, New York Public Library, NCSU, Columbia University, Agriculture Network Information Center (USDA), alicelaw.org (American Legislative and Issue Campaign Exchange, is a one-stop web-based public library of progressive state and local laws), and many more • http://projecthydra.org/ uses Blacklight as UI component 19 Thursday, May 2, 13
  • 20. © 2013 LucidWorks searchworks at Stanford 20 Thursday, May 2, 13
  • 21. © 2013 LucidWorks Advanced search at Stanford's searchworks 21 Thursday, May 2, 13
  • 22. © 2013 LucidWorks searchworks: Mapping Text Boxes to Solr query pieces •http://code4lib.org/conference/2010/dushay_keck 22 Thursday, May 2, 13
  • 24. © 2013 LucidWorks Rock and Roll! •m/ 24 Thursday, May 2, 13
  • 25. © 2013 LucidWorks Community and Resources •code4lib: - http://www.code4lib.org/ •HathiTrust folks - http://www.hathitrust.org/blogs/large-scale-search - http://robotlibrarian.billdueber.com/ •http://bighumanities.net/ - The Workshop on Big Humanities will be held in conjunction with the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), which will take place between 6-9 October 2013 in Silicon Valley, California, USA, and which provides a leading international forum for disseminating the latest research in the growing field of “big data 25 Thursday, May 2, 13