SlideShare ist ein Scribd-Unternehmen logo
1 von 19
you complete me

Anne Veling – June 5th, 2012 – Berlin Buzzwords
                 @anneveling
AGENDA
•   9292.nl Public Transport Site
•   Naive Address Autocompletion
•   Field Inspection Semantic Autocompletion
•   Conclusions
9292.NL
•   Largest public transport site of The Netherlands
•   1M travel advices per day!
•   Complete new site by Q42
     •   Linking to existing routing engine
     •   Moving from multiple input boxes to one
     •   Mobile applications for Windows, iPhone, Android
DATA
•   10M points
     •   Train and metro stations
     •   Bus stops
     •   Places of Interest
     •   Streets
     •   Street ranges
     •   Addresses
•   Highly ambiguous
     •   Streets / city names / POI
     •   Spelling mistakes
     •   No single order
NAIVE IMPLEMENTATION
•   One concatenated field in Lucene
•   Tune tokenizer/analyzer
•   Tune query analyzer
•   Tune weights


•   Syntax Only
100%



   80%




quality




          effort
FIELD INSPECTION
•   Taking advantage of
     •   Number of fields
     •   Speed of Lucene
•   Query Analysis
•   For each term, query in all fields
     •   Does it appear in that field? Count > 0?
     •   Use that information to do semantic interpretation
etten             leur                 zeil
city?
station?
bus stop?
street?




            city:etten-leur          street:zeil
RESULTS
•   Implemented in Scala
•   Lucene RequestHandler in Solr
•   Ajax front-end
TUNING
•   Iterative Tuning
•   Using real user inputs from production log files
•   Regression Testing to track index/algorithm changes over time
     •   For how many test queries is the expected result
          • The top result?
          • In the top 5?
CONCLUSIONS
•   Very positive feedback
•   Iterative tuning based on actual user input from log files
     •   Regression test
•   Lucene is fast
     •   Entire type-ahead still within 40ms
•   But: partner currently evaluating naive-only approach
     •   sometimes good enough is good enough
•   Field Inspection will allow high quality selection
     •   With fallback to naive syntactic search
THANK YOU




@anneveling

Weitere ähnliche Inhalte

Ähnlich wie Smart Autocompl... with Solr

PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
neju3
 
Dc roundtablesmall webservices_2002
Dc roundtablesmall webservices_2002Dc roundtablesmall webservices_2002
Dc roundtablesmall webservices_2002
eaiti
 

Ähnlich wie Smart Autocompl... with Solr (9)

How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...
How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...
How does the Cloud Foundry Diego Project Run at Scale, and Updates on .NET Su...
 
How does the Cloud Foundry Diego Project Run at Scale?
How does the Cloud Foundry Diego Project Run at Scale?How does the Cloud Foundry Diego Project Run at Scale?
How does the Cloud Foundry Diego Project Run at Scale?
 
HueDecide: A lecture voting system augmented by IoT
HueDecide: A lecture voting system augmented by IoTHueDecide: A lecture voting system augmented by IoT
HueDecide: A lecture voting system augmented by IoT
 
Sprintintegration ajip
Sprintintegration ajipSprintintegration ajip
Sprintintegration ajip
 
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineerin...
 
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
 
Open Source Routing Machine - FOSS4G 2016 Bonn
Open Source Routing Machine - FOSS4G 2016 BonnOpen Source Routing Machine - FOSS4G 2016 Bonn
Open Source Routing Machine - FOSS4G 2016 Bonn
 
RabbitMQ and EasyNetQ
RabbitMQ and EasyNetQRabbitMQ and EasyNetQ
RabbitMQ and EasyNetQ
 
Dc roundtablesmall webservices_2002
Dc roundtablesmall webservices_2002Dc roundtablesmall webservices_2002
Dc roundtablesmall webservices_2002
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Smart Autocompl... with Solr

  • 1. you complete me Anne Veling – June 5th, 2012 – Berlin Buzzwords @anneveling
  • 2. AGENDA • 9292.nl Public Transport Site • Naive Address Autocompletion • Field Inspection Semantic Autocompletion • Conclusions
  • 3.
  • 4. 9292.NL • Largest public transport site of The Netherlands • 1M travel advices per day! • Complete new site by Q42 • Linking to existing routing engine • Moving from multiple input boxes to one • Mobile applications for Windows, iPhone, Android
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. DATA • 10M points • Train and metro stations • Bus stops • Places of Interest • Streets • Street ranges • Addresses • Highly ambiguous • Streets / city names / POI • Spelling mistakes • No single order
  • 10. NAIVE IMPLEMENTATION • One concatenated field in Lucene • Tune tokenizer/analyzer • Tune query analyzer • Tune weights • Syntax Only
  • 11. 100% 80% quality effort
  • 12.
  • 13. FIELD INSPECTION • Taking advantage of • Number of fields • Speed of Lucene • Query Analysis • For each term, query in all fields • Does it appear in that field? Count > 0? • Use that information to do semantic interpretation
  • 14. etten leur zeil city? station? bus stop? street? city:etten-leur street:zeil
  • 15. RESULTS • Implemented in Scala • Lucene RequestHandler in Solr • Ajax front-end
  • 16.
  • 17. TUNING • Iterative Tuning • Using real user inputs from production log files • Regression Testing to track index/algorithm changes over time • For how many test queries is the expected result • The top result? • In the top 5?
  • 18. CONCLUSIONS • Very positive feedback • Iterative tuning based on actual user input from log files • Regression test • Lucene is fast • Entire type-ahead still within 40ms • But: partner currently evaluating naive-only approach • sometimes good enough is good enough • Field Inspection will allow high quality selection • With fallback to naive syntactic search