SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
About Solr
                         People as A Search Problem




Thursday, May 26, 2011
About Me


                    • Building websites since 1996, Java since
                      1997
                    • Prior web search experience
                    • Building and scaling eHarmony
                      products since 2002



Thursday, May 26, 2011
What is Jazzed


                    • Subscription Based
                      Dating Site
                    • Incubated by
                      eHarmony




Thursday, May 26, 2011
What is Jazzed


                     • Create a profile
                     • Search for others
                     • View their photos
                     • Privately
                       Communicate


Thursday, May 26, 2011
What is Jazzed


                     • Create a profile
                     • Search for others
                     • View their photos
                     • Privately
                       Communicate


Thursday, May 26, 2011
What is Jazzed


                     • Create a profile
                     • Search for others
                     • View their photos
                     • Privately
                       Communicate


Thursday, May 26, 2011
What is Jazzed


                     • Create a profile
                     • Search for others
                     • View their photos
                     • Privately
                       Communicate


Thursday, May 26, 2011
How is it different?


                    • Covers broader range of relationships
                    • Easy to get started
                    • Real profiles screened by machine and
                      humans
                    • Fast, effective search oriented tools



Thursday, May 26, 2011
Jazzed Stats

                    • Started Fall 2009
                    • Beta Summer 2010
                    • Launched October 2010
                    • 100,000s of Profiles
                    • 1,000s of Searches Daily


Thursday, May 26, 2011
Jazzed Architecture



                    • Event-driven SOA
                    • REST, JSON, EIP, Not-only-SQL
                    • Technology incubation




Thursday, May 26, 2011
Tech Stack


                    • Java 6, Spring 3, Jersey 1.1, JMS
                      (AQMP)
                    • RHEL 4, Oracle 11g, Voldemort 0.81,
                      Solr 1.4.1, NFS




Thursday, May 26, 2011
Thursday, May 26, 2011
Thursday, May 26, 2011
Not Covered


                    • Distributed Search
                    • Caching Strategies
                    • Data Import
                    • Analyzers/Tokenizers



Thursday, May 26, 2011
Why Lucene?

                    • Proven Solid IR library
                    • Prefer Open Source Solutions
                    • Not Only SQL
                    • Flexible Ranking
                    • Pluggable


Thursday, May 26, 2011
Why Solr


                    • Performant, Extensible, RESTful Service
                    • Configuration, Schema, Multicores
                    • Admin Interface
                    • Replication, Backups, Monitoring



Thursday, May 26, 2011
Open Source



                    • Strengthens Engineering Team
                    • Be apart of great community
                    • Not Brochure-ware




Thursday, May 26, 2011
Not Only SQL



                    • One solution does not fit all
                    • Prefer availability over consistency
                    • Horizontal Scaling over Vertical




Thursday, May 26, 2011
Flexible Ranking

                    • Query Strategies
                         • Boolean Algebra
                         • Vector Space Analysis
                         • Hybrids
                    • Extensive Function Support
                    • Index and Query Boosting


Thursday, May 26, 2011
...Oh My!


                    • Standard Plugins - Geospatial*,
                      Faceting, Spelling, MoreLikeThis
                    • Full Text with Highlighted Results
                    • Client agnostic



Thursday, May 26, 2011
Inevitable Question

                    • “Does it scale?”
                    • Solr POC Benchmark
                         • 10 Million profiles
                         • >200 queries/sec under 100ms 90th
                         • Default tuning until 5 million profiles


Thursday, May 26, 2011
Profile Service



                    • RESTful Hybrid Data Service
                    • Public, Private, Attributes
                    • Event Producer




Thursday, May 26, 2011
Profiles

                    • Mostly structured
                    • Categories - Eye Color, Desired
                      Ethnicity
                    • Dates - Birthdate
                    • Numbers - Coordinates, Age Range
                    • Text -Name, Headline


Thursday, May 26, 2011
Inverting People
                                            Term          Document
                                           MALE           1, 3, 5, 7, 9
                                          FEMALE         2, 4, 6, 8, 10
                    • Stored as an        HAIR_RED              8
                      inverted index     HAIR_BLOND        1, 2, 5, 6
                                          EYE_BLUE         1, 2, 3, 10
                    • Index random
                                         EYE_BROWN      4, 5, 6, 7, 8, 9
                      accessed by term       fun           1, 3, 7, 9
                                            funny          2, 4, 6, 10
                                            beach     1, 2, 3, 4, 5, 6, 7, 8


Thursday, May 26, 2011
Schema Design


                    • Single “Table”
                    • One-to-many = multi-value fields
                    • Individual vs Composite Fields
                         • copyTo and have both!



Thursday, May 26, 2011
Field considerations


                    • Stored or not
                    • Indexed or not
                    • Multivalued - desires fields
                    • Type



Thursday, May 26, 2011
Solr Types Used
                                                 The ‘t’ is for Trie
                    • tdate, tint, tfloat* - birthdate, loginAt
                    • text - all text
                    • string - id, non indexed text
                    • random - good for random sorts
                    • enum - for all enumerations


Thursday, May 26, 2011
Data Duplication


                    • By function - numberPhotos &
                      hasPhotos
                    • By relationship - hiddenBy & hidden
                    • By analysis - name & text



Thursday, May 26, 2011
Saving Profiles


                    • Updating is in memory operation
                    • No partial updates
                    • Commit means flush index changes
                    • Autocommit on maxDocs, maxTime or
                      both



Thursday, May 26, 2011
Why Also Voldemort


                    • Private profiles can not be stale
                    • Many fields not searchable or viewable
                      by others
                    • Isolate queries from fetch by id



Thursday, May 26, 2011
Querying


                    • Superset of Lucene
                    • Efficient Range Queries
                    • Multiple Query Handlers
                         • Dismax, Boost, Geo



Thursday, May 26, 2011
Recall vs Precision



                    • Focus on recall when corpus is small
                    • Precision once it is at critical mass




Thursday, May 26, 2011
Boolean Queries


                    • Default operator set to AND
                    • +gender:FEMALE +seeking:MALE
                      +eyeColor:EYE_BLUE +hairColor:
                      (HAIR_RED, HAIR_BLONDE)
                    • Sort order is important



Thursday, May 26, 2011
Hybrid Queries


                    • Default operator set to OR
                    • +gender:FEMALE +seeking:MALE
                      eyeColor:EYE_BLUE hairColor:
                      (HAIR_RED, HAIR_BLONDE)




Thursday, May 26, 2011
Why you’re lucky if you
                      like redheads

                    • Inverse Document
                      Frequency (IDF)  1.Blue eyed, redheads
                                       2.Blue eyed, blonds
                    • Rarer is favored
                                       3.Redheads
                      over more common
                                       4.Blonds
                    • More fields
                      matched = higher
                      ranking

Thursday, May 26, 2011
Boosting



                    • Query time by importance
                         • eyeColor:EYE_BLUE^2
                           hairColor:HAIR_BLOND




Thursday, May 26, 2011
Filter Fields

                                             id   hidden
                                             1    2, 4, 6
                    • Useful for roles and
                      other lists            2      1

                    • -hidden:(2 4 6)




Thursday, May 26, 2011
Filter Fields

                                             id    hidden
                                             1     2, 4, 6
                    • Useful for roles and
                      other lists            2       1

                    • -hidden:(2 4 6)        id   hiddenBy
                                             1       2
                    • -hiddenBy:1
                                             2       1
                                             4       1
                                             6       1

Thursday, May 26, 2011
Date Math



                    • Simplifies query preprocessing
                    • +birthDate:[NOW/DAY+1DAY-36YEAR
                      TO NOW/DAY-25YEAR]




Thursday, May 26, 2011
Date Math



                    • Simplifies query preprocessing
                    • +birthDate:[NOW/DAY+1DAY-36YEAR
                      TO NOW/DAY-25YEAR]

                          Between 25 and 35 years old



Thursday, May 26, 2011
Distance Searching




                    • lat, lon, distance
                    • SolrLocal by Patrick O’Leary
                    • Additional overhead ~90ms per query
                    • Superceded in Solr 3.1



Thursday, May 26, 2011
Testing Queries



                    • Log queries and ids returned
                    • Version your search strategies
                    • Improve one thing at a time




Thursday, May 26, 2011
Geo Service


                    • Read-mostly service
                    • Fields - Postal Code, Country,
                      State, Cities, Lat, Lon
                    • Usage - Registration
                      Validation, City Selection



Thursday, May 26, 2011
Operations



                    • Servlet container and filesystem
                    • Jetty 6, 64 Java 6 JVM
                    • 8G Heap -XX:+UseCompressedOops




Thursday, May 26, 2011
Operations


                    • Active/Passive
                    • Layer 7 Load balancing
                    • Nightly snapshots
                    • Eventually SolrCloud



Thursday, May 26, 2011
Multicore


                    • Run multiple schemas on the same
                    • Hot swappable for backwards
                      compatible changes
                    • private / public profiles



Thursday, May 26, 2011
Security


                     • No security provided
                     • At minimum secure      <delete>
                                                <query>*:*</query>
                       your UpdateHandler     </delete>


                     • Separate Cores



Thursday, May 26, 2011
Future

                    • Solr 3.1
                    • Mutual Matching
                    • Faceting / Guided Search
                    • Incorporating spelling
                    • Hierarchies, categories, better ranking
                      models


Thursday, May 26, 2011
Faceting

                    • Returns counts
                      with query
                      results
                    • Efficient
                    • Guides the user
                      toward precision


Thursday, May 26, 2011
Thank you
                         jtuberville@eharmony.com
                            Twitter: @jtuberville




Thursday, May 26, 2011

Weitere ähnliche Inhalte

Andere mochten auch

Tennis
TennisTennis
Tennisaritz
 
Updated: Preparing an investor presentation
Updated:  Preparing an investor presentationUpdated:  Preparing an investor presentation
Updated: Preparing an investor presentationMarty Kaszubowski
 
Maroon5
Maroon5Maroon5
Maroon5tanica
 
Tate Tyler - Designing the Search Experience
Tate Tyler - Designing the Search ExperienceTate Tyler - Designing the Search Experience
Tate Tyler - Designing the Search ExperienceLucidworks (Archived)
 
How The Guardian Embraced the Internet using Content, Search, and Open Source
How The Guardian Embraced the Internet using Content, Search, and Open SourceHow The Guardian Embraced the Internet using Content, Search, and Open Source
How The Guardian Embraced the Internet using Content, Search, and Open SourceLucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
Using Solr to find the Right Person for the Right Job
Using Solr to find the Right Person for the Right JobUsing Solr to find the Right Person for the Right Job
Using Solr to find the Right Person for the Right JobLucidworks (Archived)
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchLucidworks (Archived)
 
20101023 ie9 cache
20101023 ie9 cache20101023 ie9 cache
20101023 ie9 cache彰 村地
 
Gaiety Hotel - full version
Gaiety Hotel - full versionGaiety Hotel - full version
Gaiety Hotel - full versiondummypackages
 
I love you mommy
I love you mommyI love you mommy
I love you mommyNyiah
 
IE のサポート変更が Azure に及ぼす影響
IE のサポート変更が Azure に及ぼす影響IE のサポート変更が Azure に及ぼす影響
IE のサポート変更が Azure に及ぼす影響彰 村地
 
Cancer
CancerCancer
Cancertanica
 

Andere mochten auch (20)

Tennis
TennisTennis
Tennis
 
Updated: Preparing an investor presentation
Updated:  Preparing an investor presentationUpdated:  Preparing an investor presentation
Updated: Preparing an investor presentation
 
Maroon5
Maroon5Maroon5
Maroon5
 
Tate Tyler - Designing the Search Experience
Tate Tyler - Designing the Search ExperienceTate Tyler - Designing the Search Experience
Tate Tyler - Designing the Search Experience
 
How To Get The Justin Bieber Smile
How To Get The Justin Bieber SmileHow To Get The Justin Bieber Smile
How To Get The Justin Bieber Smile
 
How The Guardian Embraced the Internet using Content, Search, and Open Source
How The Guardian Embraced the Internet using Content, Search, and Open SourceHow The Guardian Embraced the Internet using Content, Search, and Open Source
How The Guardian Embraced the Internet using Content, Search, and Open Source
 
Short Presentation
Short PresentationShort Presentation
Short Presentation
 
Search Analytics What? Why? How?
Search Analytics What? Why? How?Search Analytics What? Why? How?
Search Analytics What? Why? How?
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Using Solr to find the Right Person for the Right Job
Using Solr to find the Right Person for the Right JobUsing Solr to find the Right Person for the Right Job
Using Solr to find the Right Person for the Right Job
 
Creep
CreepCreep
Creep
 
Simbad marinela
Simbad marinelaSimbad marinela
Simbad marinela
 
Solr & Lucene at Etsy
Solr & Lucene at EtsySolr & Lucene at Etsy
Solr & Lucene at Etsy
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source Search
 
20101023 ie9 cache
20101023 ie9 cache20101023 ie9 cache
20101023 ie9 cache
 
Gaiety Hotel - full version
Gaiety Hotel - full versionGaiety Hotel - full version
Gaiety Hotel - full version
 
I love you mommy
I love you mommyI love you mommy
I love you mommy
 
The Seven Deadly Sins of Solr
The Seven Deadly Sins of SolrThe Seven Deadly Sins of Solr
The Seven Deadly Sins of Solr
 
IE のサポート変更が Azure に及ぼす影響
IE のサポート変更が Azure に及ぼす影響IE のサポート変更が Azure に及ぼす影響
IE のサポート変更が Azure に及ぼす影響
 
Cancer
CancerCancer
Cancer
 

Ähnlich wie Jazeed about Solr - People as A Search Problem

Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian
 
JavaScript Intro
JavaScript IntroJavaScript Intro
JavaScript IntroEric Brown
 
Building Languages for the JVM - StarTechConf 2011
Building Languages for the JVM - StarTechConf 2011Building Languages for the JVM - StarTechConf 2011
Building Languages for the JVM - StarTechConf 2011Charles Nutter
 
Fred Spencer: Designing a Great UI
Fred Spencer: Designing a Great UIFred Spencer: Designing a Great UI
Fred Spencer: Designing a Great UIAxway Appcelerator
 
Building an experimentation framework
Building an experimentation frameworkBuilding an experimentation framework
Building an experimentation frameworkzsqr
 
Business of APIs Conference 2011 - Unicorns
Business of APIs Conference 2011 - UnicornsBusiness of APIs Conference 2011 - Unicorns
Business of APIs Conference 2011 - UnicornsMashery
 
Preparing and Researching Presentations
Preparing and Researching PresentationsPreparing and Researching Presentations
Preparing and Researching PresentationsAllThatMedia
 
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)Balanced Team
 
Sustainable Theming with Fusion - DCCO 2011
Sustainable Theming with Fusion - DCCO 2011Sustainable Theming with Fusion - DCCO 2011
Sustainable Theming with Fusion - DCCO 2011sheenadonnelly
 
Bonfire... How'd You Do That?! - AtlasCamp 2011
Bonfire... How'd You Do That?! - AtlasCamp 2011Bonfire... How'd You Do That?! - AtlasCamp 2011
Bonfire... How'd You Do That?! - AtlasCamp 2011Atlassian
 
P90 X Your Database!!
P90 X Your Database!!P90 X Your Database!!
P90 X Your Database!!Denish Patel
 
Skills & Training for Library Publishing
Skills & Training for Library PublishingSkills & Training for Library Publishing
Skills & Training for Library Publishingkimballs
 
Education 2.3 m erwin
Education 2.3 m erwinEducation 2.3 m erwin
Education 2.3 m erwinErwin Huang
 
AIIM Ottawa May 12 2011 Agenda
AIIM Ottawa May 12 2011 AgendaAIIM Ottawa May 12 2011 Agenda
AIIM Ottawa May 12 2011 AgendaCheryl McKinnon
 

Ähnlich wie Jazeed about Solr - People as A Search Problem (15)

Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide Deck
 
JavaScript Intro
JavaScript IntroJavaScript Intro
JavaScript Intro
 
Building Languages for the JVM - StarTechConf 2011
Building Languages for the JVM - StarTechConf 2011Building Languages for the JVM - StarTechConf 2011
Building Languages for the JVM - StarTechConf 2011
 
Fred Spencer: Designing a Great UI
Fred Spencer: Designing a Great UIFred Spencer: Designing a Great UI
Fred Spencer: Designing a Great UI
 
Building an experimentation framework
Building an experimentation frameworkBuilding an experimentation framework
Building an experimentation framework
 
Business of APIs Conference 2011 - Unicorns
Business of APIs Conference 2011 - UnicornsBusiness of APIs Conference 2011 - Unicorns
Business of APIs Conference 2011 - Unicorns
 
Preparing and Researching Presentations
Preparing and Researching PresentationsPreparing and Researching Presentations
Preparing and Researching Presentations
 
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
 
Sustainable Theming with Fusion - DCCO 2011
Sustainable Theming with Fusion - DCCO 2011Sustainable Theming with Fusion - DCCO 2011
Sustainable Theming with Fusion - DCCO 2011
 
JSLOL
JSLOLJSLOL
JSLOL
 
Bonfire... How'd You Do That?! - AtlasCamp 2011
Bonfire... How'd You Do That?! - AtlasCamp 2011Bonfire... How'd You Do That?! - AtlasCamp 2011
Bonfire... How'd You Do That?! - AtlasCamp 2011
 
P90 X Your Database!!
P90 X Your Database!!P90 X Your Database!!
P90 X Your Database!!
 
Skills & Training for Library Publishing
Skills & Training for Library PublishingSkills & Training for Library Publishing
Skills & Training for Library Publishing
 
Education 2.3 m erwin
Education 2.3 m erwinEducation 2.3 m erwin
Education 2.3 m erwin
 
AIIM Ottawa May 12 2011 Agenda
AIIM Ottawa May 12 2011 AgendaAIIM Ottawa May 12 2011 Agenda
AIIM Ottawa May 12 2011 Agenda
 

Mehr von Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 

Mehr von Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 

Kürzlich hochgeladen

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Jazeed about Solr - People as A Search Problem

  • 1. About Solr People as A Search Problem Thursday, May 26, 2011
  • 2. About Me • Building websites since 1996, Java since 1997 • Prior web search experience • Building and scaling eHarmony products since 2002 Thursday, May 26, 2011
  • 3. What is Jazzed • Subscription Based Dating Site • Incubated by eHarmony Thursday, May 26, 2011
  • 4. What is Jazzed • Create a profile • Search for others • View their photos • Privately Communicate Thursday, May 26, 2011
  • 5. What is Jazzed • Create a profile • Search for others • View their photos • Privately Communicate Thursday, May 26, 2011
  • 6. What is Jazzed • Create a profile • Search for others • View their photos • Privately Communicate Thursday, May 26, 2011
  • 7. What is Jazzed • Create a profile • Search for others • View their photos • Privately Communicate Thursday, May 26, 2011
  • 8. How is it different? • Covers broader range of relationships • Easy to get started • Real profiles screened by machine and humans • Fast, effective search oriented tools Thursday, May 26, 2011
  • 9. Jazzed Stats • Started Fall 2009 • Beta Summer 2010 • Launched October 2010 • 100,000s of Profiles • 1,000s of Searches Daily Thursday, May 26, 2011
  • 10. Jazzed Architecture • Event-driven SOA • REST, JSON, EIP, Not-only-SQL • Technology incubation Thursday, May 26, 2011
  • 11. Tech Stack • Java 6, Spring 3, Jersey 1.1, JMS (AQMP) • RHEL 4, Oracle 11g, Voldemort 0.81, Solr 1.4.1, NFS Thursday, May 26, 2011
  • 14. Not Covered • Distributed Search • Caching Strategies • Data Import • Analyzers/Tokenizers Thursday, May 26, 2011
  • 15. Why Lucene? • Proven Solid IR library • Prefer Open Source Solutions • Not Only SQL • Flexible Ranking • Pluggable Thursday, May 26, 2011
  • 16. Why Solr • Performant, Extensible, RESTful Service • Configuration, Schema, Multicores • Admin Interface • Replication, Backups, Monitoring Thursday, May 26, 2011
  • 17. Open Source • Strengthens Engineering Team • Be apart of great community • Not Brochure-ware Thursday, May 26, 2011
  • 18. Not Only SQL • One solution does not fit all • Prefer availability over consistency • Horizontal Scaling over Vertical Thursday, May 26, 2011
  • 19. Flexible Ranking • Query Strategies • Boolean Algebra • Vector Space Analysis • Hybrids • Extensive Function Support • Index and Query Boosting Thursday, May 26, 2011
  • 20. ...Oh My! • Standard Plugins - Geospatial*, Faceting, Spelling, MoreLikeThis • Full Text with Highlighted Results • Client agnostic Thursday, May 26, 2011
  • 21. Inevitable Question • “Does it scale?” • Solr POC Benchmark • 10 Million profiles • >200 queries/sec under 100ms 90th • Default tuning until 5 million profiles Thursday, May 26, 2011
  • 22. Profile Service • RESTful Hybrid Data Service • Public, Private, Attributes • Event Producer Thursday, May 26, 2011
  • 23. Profiles • Mostly structured • Categories - Eye Color, Desired Ethnicity • Dates - Birthdate • Numbers - Coordinates, Age Range • Text -Name, Headline Thursday, May 26, 2011
  • 24. Inverting People Term Document MALE 1, 3, 5, 7, 9 FEMALE 2, 4, 6, 8, 10 • Stored as an HAIR_RED 8 inverted index HAIR_BLOND 1, 2, 5, 6 EYE_BLUE 1, 2, 3, 10 • Index random EYE_BROWN 4, 5, 6, 7, 8, 9 accessed by term fun 1, 3, 7, 9 funny 2, 4, 6, 10 beach 1, 2, 3, 4, 5, 6, 7, 8 Thursday, May 26, 2011
  • 25. Schema Design • Single “Table” • One-to-many = multi-value fields • Individual vs Composite Fields • copyTo and have both! Thursday, May 26, 2011
  • 26. Field considerations • Stored or not • Indexed or not • Multivalued - desires fields • Type Thursday, May 26, 2011
  • 27. Solr Types Used The ‘t’ is for Trie • tdate, tint, tfloat* - birthdate, loginAt • text - all text • string - id, non indexed text • random - good for random sorts • enum - for all enumerations Thursday, May 26, 2011
  • 28. Data Duplication • By function - numberPhotos & hasPhotos • By relationship - hiddenBy & hidden • By analysis - name & text Thursday, May 26, 2011
  • 29. Saving Profiles • Updating is in memory operation • No partial updates • Commit means flush index changes • Autocommit on maxDocs, maxTime or both Thursday, May 26, 2011
  • 30. Why Also Voldemort • Private profiles can not be stale • Many fields not searchable or viewable by others • Isolate queries from fetch by id Thursday, May 26, 2011
  • 31. Querying • Superset of Lucene • Efficient Range Queries • Multiple Query Handlers • Dismax, Boost, Geo Thursday, May 26, 2011
  • 32. Recall vs Precision • Focus on recall when corpus is small • Precision once it is at critical mass Thursday, May 26, 2011
  • 33. Boolean Queries • Default operator set to AND • +gender:FEMALE +seeking:MALE +eyeColor:EYE_BLUE +hairColor: (HAIR_RED, HAIR_BLONDE) • Sort order is important Thursday, May 26, 2011
  • 34. Hybrid Queries • Default operator set to OR • +gender:FEMALE +seeking:MALE eyeColor:EYE_BLUE hairColor: (HAIR_RED, HAIR_BLONDE) Thursday, May 26, 2011
  • 35. Why you’re lucky if you like redheads • Inverse Document Frequency (IDF) 1.Blue eyed, redheads 2.Blue eyed, blonds • Rarer is favored 3.Redheads over more common 4.Blonds • More fields matched = higher ranking Thursday, May 26, 2011
  • 36. Boosting • Query time by importance • eyeColor:EYE_BLUE^2 hairColor:HAIR_BLOND Thursday, May 26, 2011
  • 37. Filter Fields id hidden 1 2, 4, 6 • Useful for roles and other lists 2 1 • -hidden:(2 4 6) Thursday, May 26, 2011
  • 38. Filter Fields id hidden 1 2, 4, 6 • Useful for roles and other lists 2 1 • -hidden:(2 4 6) id hiddenBy 1 2 • -hiddenBy:1 2 1 4 1 6 1 Thursday, May 26, 2011
  • 39. Date Math • Simplifies query preprocessing • +birthDate:[NOW/DAY+1DAY-36YEAR TO NOW/DAY-25YEAR] Thursday, May 26, 2011
  • 40. Date Math • Simplifies query preprocessing • +birthDate:[NOW/DAY+1DAY-36YEAR TO NOW/DAY-25YEAR] Between 25 and 35 years old Thursday, May 26, 2011
  • 41. Distance Searching • lat, lon, distance • SolrLocal by Patrick O’Leary • Additional overhead ~90ms per query • Superceded in Solr 3.1 Thursday, May 26, 2011
  • 42. Testing Queries • Log queries and ids returned • Version your search strategies • Improve one thing at a time Thursday, May 26, 2011
  • 43. Geo Service • Read-mostly service • Fields - Postal Code, Country, State, Cities, Lat, Lon • Usage - Registration Validation, City Selection Thursday, May 26, 2011
  • 44. Operations • Servlet container and filesystem • Jetty 6, 64 Java 6 JVM • 8G Heap -XX:+UseCompressedOops Thursday, May 26, 2011
  • 45. Operations • Active/Passive • Layer 7 Load balancing • Nightly snapshots • Eventually SolrCloud Thursday, May 26, 2011
  • 46. Multicore • Run multiple schemas on the same • Hot swappable for backwards compatible changes • private / public profiles Thursday, May 26, 2011
  • 47. Security • No security provided • At minimum secure <delete> <query>*:*</query> your UpdateHandler </delete> • Separate Cores Thursday, May 26, 2011
  • 48. Future • Solr 3.1 • Mutual Matching • Faceting / Guided Search • Incorporating spelling • Hierarchies, categories, better ranking models Thursday, May 26, 2011
  • 49. Faceting • Returns counts with query results • Efficient • Guides the user toward precision Thursday, May 26, 2011
  • 50. Thank you jtuberville@eharmony.com Twitter: @jtuberville Thursday, May 26, 2011