SlideShare ist ein Scribd-Unternehmen logo
1 von 19
FIBEP World Media Intelligence Congress17-20 November 2015, ViennaFIBEP World Media Intelligence Congress17-20 November 2015, Vienna
www.wmicongress.com
Speaker:
Twitter:
How Infomedia upgraded their closed-source
search engine to a fast, scalable and flexible
open-source platform
Session Title:
2015-11-19
Kristian Schou, Infomedia & Charlie Hull, Flax
@InfomediaDK @Flaxsearch Web: www.flax.co.uk
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
About Infomedia
• Founded in 2003
• The leading Danish provider of media monitoring and media
analysis
• Largest and oldest Danish Media archive with access to
approximately 75 million searchable articles
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
About Flax
• Founded in 2001 in Cambridge, U.K.
• Independent, honest advice and analysis
• Expert design & development, Apache Solr committers
• Test-driven relevancy and performance tuning
• Custom training & mentoring for your staff
• Flexible support up to 24/7/365 with SLAs
• Some of our clients:
@_FIBEP #_FIBEP #WMIC152015-11-19

FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
The situation at Infomedia in 2013
• Very old media monitoring system based on Verity
• Verity was put into production in 2001 at the company that would later become
Infomedia!
• Slightly less old installation of Autonomy IDOL used for
Infomedia’s Media Archive
• put into production at Infomedia in 2009/10
• Drawbacks:
– Verity at almost max capacity needing constant attention
– Old and complex workflow for receiving and processing articles
– Different platforms for monitoring and archive searches meant we were ‘bi-lingual’,
using two different query languages in-house.
– Verity no longer supported by the owning company (HP)
– Verity not scalable!
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
What to do?
• Different upgrading options explored throughout 2011-2012
• Upgrade everything to Autonomy IDOL?
• Switch to other commercial search engine?
• Go open-source?
• Recommendations and internal testing drew us to Apache Solr, an
open source enterprise search platform
• Advantages:
– Transparency (going from commercial to open-source)
– Rapid maturity of Solr – development moving very fast
– Large and active Solr Community
– Customizability
– Solr is known to be fast and highly scalable
– No license fees
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Defining the project with Flax
• Infomedia searched for Solr expertise in Denmark/Scandinavia
– could not find an option that we were comfortable with
• Introduced to Flax through networking and recommendations
– Experience from similar upgrade projects with Gorkana and AAP
– Very impressed with Flax’s insight, knowledge and credentials
– Actual committer to Apache Solr
• Project began in autumn of 2013 with the goals of:
– Building a completely new search architecture to replace Verity and IDOL
– Defining Infomedia's own query language, IQL, owned and controlled by Infomedia
– Translating old monitoring queries (app. 8.000) to this new IQL syntax
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Replacing Verity
• Verity replaced by Flax Monitor
– Parses IQL to Lucene queries
– Runs on 2 servers
– Uses Luwak, Flax's 'stored search' library:
• Built on Apache Lucene (as is Solr)
• Also used by Bloomberg, Booz Allen Hamilton & others
• In use for 1m stored searches (some 250k characters), 1m stories/day
• 40x faster than Elasticsearch Percolator
• Open source at https://github.com/flaxsearch/luwak
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Result
Query
QueryStored
Queries $$$
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Result
Query
QueryStored
Queries
1 million queries
Some 250k long
Complex rules
1 million new
documents a
day
$$$
Within 5-100ms
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Result
Query
QueryStored
Queries
1 million queries
Some 250k long
Complex rules
1 million new
documents a
day
$$$$$$
Within 5-100ms
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Result
Query
QueryStored
Queries
1 million queries
Some 250k long
Complex rules
1 million new
documents a
day
$$$$$$
Within 5-100ms
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Query
QueryStored
Queries 1.
Pre
Query
Subset
1 million queries
Some 250k long
Complex rules
~200
Doc
1 million new
documents
a day
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Turning search upside down
@_FIBEP #_FIBEP #WMIC152015-11-19
Docs
Query
QueryStored
Queries 1.
Pre
Query
Subset
Result
1 million queries
Some 250k long
Complex rules
~200
2.
Search
Doc
1 million new
documents
a day
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Replacing Autonomy IDOL
• Autonomy IDOL replaced by Apache Solr
− Parses IQL to Lucene queries
− SolrCloud distributes the index & queries across several servers
− Setup: 75 million documents hosted on 8 servers,
6 cores/24GB memory and 125 GB storage per server
− This setup is doubled to have full redundancy
− Features added to standard Solr by Flax:
• Custom highlighting,
• Framework to handle multiple languages
• Extended error logging
• Cluster management
• Performance enhancements for complex wildcard queries
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Benefits of the project
• Articles indexed and searchable within minutes of receiving them
• New, much smarter tools for constructing and comparing
monitoring queries
• The Flax Monitor is an extremely smart and performant monitoring
solution
• Huge benefits from defining the Infomedia Query Language, IQL
– Extremely enlightening and empowering process to analyze what we actually need from a
query language
– We fully understand and have documented how IQL works
– IQL is designed to match Infomedia’s demands and preferences
– We can revise and expand IQL as new needs and opportunities arrive
– Not bound to any search platform. We can take it with us
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Learnings/Where are we now?
• A challenging, complex, time-consuming but ultimately rewarding project
• The ripple effect – we have had to revisit and update a lot of legacy systems
• Customization is great, but can also mean more specification
• Open Source prevents lock-in but demands investment in education - otherwise it is still
just a magic box
• Flax‘s expert knowledge has been invaluable
• A succesful migration
• More than 90% of Infomedia’s monitoring queries have been migrated to IQL with
practically no negative change in precision or recall
• The collaboration with Flax continues
• As Infomedia develops, so do new ideas and feature requests
• A customized open source platform also means continuous improvement
• Currently updating to Solr 5.3
• Still experimenting with different ways to scale our Solr installation
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
Other lessons
• You can also keep your old query language
- Flax have written dtSearch & Verity parsers for Lucene
• Some of your old queries might not be working
- e.g. Verity doesn't always tell you when queries are broken!
• Open source can help future-proof your search
- and you have control of the software
• Engage with the open source community:
- User groups
- Mailing lists
- Contribute back if you can
@_FIBEP #_FIBEP #WMIC152015-11-19
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
@_FIBEP #_FIBEP #WMIC15Date of Presentation
Thanks for listening
- any questions?
Kristian Schou, Infomedia & Charlie Hull, Flax
@InfomediaDK @Flaxsearch Web: www.flax.co.uk
FIBEP World Media Intelligence Congress17-20 November 2015, Vienna
@_FIBEP #_FIBEP #WMIC15Date of Presentation
Something else you might like
Think outside the search box!
2DSearch is a patent pending, radical alternative to traditional keyword
search. Instead of a one-dimensional search box, concepts are
expressed and manipulated as objects on a two-dimensional canvas.
So you spend less time worrying about Boolean strings, and more
time creating semantically transparent queries and effective search
strategies.
Sign up to gain early access at www.2dsearch.com

Weitere ähnliche Inhalte

Was ist angesagt?

Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j
 
Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAPaolo Platter
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the CloudNeo4j
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databasesjexp
 
Slide 3 Fast Data processing with kafka, rfx and redis
Slide 3 Fast Data processing with kafka, rfx and redisSlide 3 Fast Data processing with kafka, rfx and redis
Slide 3 Fast Data processing with kafka, rfx and redisTrieu Nguyen
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source DatabaseAll Things Open
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
 
Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)Trieu Nguyen
 
NSGIC 2011 Presentation on geo open source
NSGIC 2011 Presentation on geo open sourceNSGIC 2011 Presentation on geo open source
NSGIC 2011 Presentation on geo open sourceMichael Terner
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesNeo4j
 
Mastering On-Site Search / Custom Site Search
Mastering On-Site Search / Custom Site SearchMastering On-Site Search / Custom Site Search
Mastering On-Site Search / Custom Site SearchRalf Schwoebel
 
RFX - Full-Stack Technology for Real-time Big Data
RFX - Full-Stack Technology for Real-time Big DataRFX - Full-Stack Technology for Real-time Big Data
RFX - Full-Stack Technology for Real-time Big DataTrieu Nguyen
 
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j
 
Finding the Needle in a Haystack With Knowledge Graphs
Finding the Needle in a Haystack With Knowledge GraphsFinding the Needle in a Haystack With Knowledge Graphs
Finding the Needle in a Haystack With Knowledge GraphsNeo4j
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning ProductsAndrew Musselman
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB
 

Was ist angesagt? (20)

Neo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4jNeo4j-Databridge: Enterprise-scale ETL for Neo4j
Neo4j-Databridge: Enterprise-scale ETL for Neo4j
 
Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKA
 
Full Stack Graph in the Cloud
Full Stack Graph in the CloudFull Stack Graph in the Cloud
Full Stack Graph in the Cloud
 
A whirlwind tour of graph databases
A whirlwind tour of graph databasesA whirlwind tour of graph databases
A whirlwind tour of graph databases
 
Slide 3 Fast Data processing with kafka, rfx and redis
Slide 3 Fast Data processing with kafka, rfx and redisSlide 3 Fast Data processing with kafka, rfx and redis
Slide 3 Fast Data processing with kafka, rfx and redis
 
Choosing the Right Open Source Database
Choosing the Right Open Source DatabaseChoosing the Right Open Source Database
Choosing the Right Open Source Database
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)Real-time Big Data at FPT (for TechCamp University)
Real-time Big Data at FPT (for TechCamp University)
 
NSGIC 2011 Presentation on geo open source
NSGIC 2011 Presentation on geo open sourceNSGIC 2011 Presentation on geo open source
NSGIC 2011 Presentation on geo open source
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and Ontologies
 
Mastering On-Site Search / Custom Site Search
Mastering On-Site Search / Custom Site SearchMastering On-Site Search / Custom Site Search
Mastering On-Site Search / Custom Site Search
 
RFX - Full-Stack Technology for Real-time Big Data
RFX - Full-Stack Technology for Real-time Big DataRFX - Full-Stack Technology for Real-time Big Data
RFX - Full-Stack Technology for Real-time Big Data
 
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4jNeo4j GraphTalks - Introduction to GraphDatabases and Neo4j
Neo4j GraphTalks - Introduction to GraphDatabases and Neo4j
 
Tech view on Regulatory Compliance
Tech view on Regulatory ComplianceTech view on Regulatory Compliance
Tech view on Regulatory Compliance
 
Finding the Needle in a Haystack With Knowledge Graphs
Finding the Needle in a Haystack With Knowledge GraphsFinding the Needle in a Haystack With Knowledge Graphs
Finding the Needle in a Haystack With Knowledge Graphs
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 

Ähnlich wie FIBEP Congress: How Infomedia Upgraded Search to Apache Solr

Flink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFlink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFabian Hueske
 
Introduction to (web) APIs - definitions, examples, concepts and trends
Introduction to (web) APIs - definitions, examples, concepts and trendsIntroduction to (web) APIs - definitions, examples, concepts and trends
Introduction to (web) APIs - definitions, examples, concepts and trendsOlaf Janssen
 
July OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for DominoJuly OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for DominoHoward Greenberg
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastSammy Fung
 
Freme general-overview-version-june-2015
Freme general-overview-version-june-2015Freme general-overview-version-june-2015
Freme general-overview-version-june-2015FREMEProjectH2020
 
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)Pedro Príncipe
 
The Europeana API Strategy
The Europeana API StrategyThe Europeana API Strategy
The Europeana API StrategyDavid Haskiya
 
OpenStack August 2014 Marketing Meeting
OpenStack August 2014 Marketing MeetingOpenStack August 2014 Marketing Meeting
OpenStack August 2014 Marketing MeetingOpenStack Foundation
 
Open source business models
Open source business modelsOpen source business models
Open source business modelsDave Neary
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingLuke Han
 
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEsFIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEsCodemotion
 
20170720 fiware lab_at_open_stack_days_tokyo
20170720 fiware lab_at_open_stack_days_tokyo20170720 fiware lab_at_open_stack_days_tokyo
20170720 fiware lab_at_open_stack_days_tokyostefano de panfilis
 
Piwik presentation 2011
Piwik presentation 2011Piwik presentation 2011
Piwik presentation 2011Matthieu Aubry
 
INTERFACE, by apidays - Lessons learned from implementing our custom ‘Big Da...
INTERFACE, by apidays  - Lessons learned from implementing our custom ‘Big Da...INTERFACE, by apidays  - Lessons learned from implementing our custom ‘Big Da...
INTERFACE, by apidays - Lessons learned from implementing our custom ‘Big Da...apidays
 
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2
 
Liberate Your Library Building A Scottish Consortium November 16th 2009
Liberate Your Library   Building A Scottish Consortium November 16th 2009Liberate Your Library   Building A Scottish Consortium November 16th 2009
Liberate Your Library Building A Scottish Consortium November 16th 2009Jonathan Field
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciencesterradue
 
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...WSO2
 

Ähnlich wie FIBEP Congress: How Infomedia Upgraded Search to Apache Solr (20)

Flink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASFFlink's Journey from Academia to the ASF
Flink's Journey from Academia to the ASF
 
Introduction to (web) APIs - definitions, examples, concepts and trends
Introduction to (web) APIs - definitions, examples, concepts and trendsIntroduction to (web) APIs - definitions, examples, concepts and trends
Introduction to (web) APIs - definitions, examples, concepts and trends
 
July OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for DominoJuly OpenNTF Webinar - HCL Presents Keep, a new API for Domino
July OpenNTF Webinar - HCL Presents Keep, a new API for Domino
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 Forecast
 
Freme general-overview-version-june-2015
Freme general-overview-version-june-2015Freme general-overview-version-june-2015
Freme general-overview-version-june-2015
 
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
OpenAIRE infrastructure and Services (OpenAIRE Workshop Malta)
 
The Europeana API Strategy
The Europeana API StrategyThe Europeana API Strategy
The Europeana API Strategy
 
OpenStack August 2014 Marketing Meeting
OpenStack August 2014 Marketing MeetingOpenStack August 2014 Marketing Meeting
OpenStack August 2014 Marketing Meeting
 
Open source business models
Open source business modelsOpen source business models
Open source business models
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the Cloud
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
 
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEsFIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
FIWARE Accelerator Programme: 80 Milion Euro for Start-Ups and SMEs
 
20170720 fiware lab_at_open_stack_days_tokyo
20170720 fiware lab_at_open_stack_days_tokyo20170720 fiware lab_at_open_stack_days_tokyo
20170720 fiware lab_at_open_stack_days_tokyo
 
Piwik presentation 2011
Piwik presentation 2011Piwik presentation 2011
Piwik presentation 2011
 
INTERFACE, by apidays - Lessons learned from implementing our custom ‘Big Da...
INTERFACE, by apidays  - Lessons learned from implementing our custom ‘Big Da...INTERFACE, by apidays  - Lessons learned from implementing our custom ‘Big Da...
INTERFACE, by apidays - Lessons learned from implementing our custom ‘Big Da...
 
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the WorldWSO2Con EU 2015: Opening Keynote - Helping You Connect the World
WSO2Con EU 2015: Opening Keynote - Helping You Connect the World
 
Liberate Your Library Building A Scottish Consortium November 16th 2009
Liberate Your Library   Building A Scottish Consortium November 16th 2009Liberate Your Library   Building A Scottish Consortium November 16th 2009
Liberate Your Library Building A Scottish Consortium November 16th 2009
 
Semantic Technology in Publishing & Finance
Semantic Technology in Publishing & FinanceSemantic Technology in Publishing & Finance
Semantic Technology in Publishing & Finance
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
 
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...
[WSO2 Integration Summit London 2019] An API-enabled Journey Towards Empoweri...
 

Mehr von Charlie Hull

Lucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesLucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesCharlie Hull
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big dataCharlie Hull
 
Search Solutions 2015: Towards a new model of search relevance testing
Search Solutions 2015:  Towards a new model of search relevance testingSearch Solutions 2015:  Towards a new model of search relevance testing
Search Solutions 2015: Towards a new model of search relevance testingCharlie Hull
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015Charlie Hull
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformaticsCharlie Hull
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studyCharlie Hull
 

Mehr von Charlie Hull (6)

Lucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challengesLucene, Solr and java 9 - opportunities and challenges
Lucene, Solr and java 9 - opportunities and challenges
 
Making sense of big data
Making sense of big dataMaking sense of big data
Making sense of big data
 
Search Solutions 2015: Towards a new model of search relevance testing
Search Solutions 2015:  Towards a new model of search relevance testingSearch Solutions 2015:  Towards a new model of search relevance testing
Search Solutions 2015: Towards a new model of search relevance testing
 
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
 
Bio solr building a better search for bioinformatics
Bio solr   building a better search for bioinformaticsBio solr   building a better search for bioinformatics
Bio solr building a better search for bioinformatics
 
Solr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance studySolr and Elasticsearch, a performance study
Solr and Elasticsearch, a performance study
 

Kürzlich hochgeladen

Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 

Kürzlich hochgeladen (20)

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 

FIBEP Congress: How Infomedia Upgraded Search to Apache Solr

  • 1. FIBEP World Media Intelligence Congress17-20 November 2015, ViennaFIBEP World Media Intelligence Congress17-20 November 2015, Vienna www.wmicongress.com Speaker: Twitter: How Infomedia upgraded their closed-source search engine to a fast, scalable and flexible open-source platform Session Title: 2015-11-19 Kristian Schou, Infomedia & Charlie Hull, Flax @InfomediaDK @Flaxsearch Web: www.flax.co.uk
  • 2. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna About Infomedia • Founded in 2003 • The leading Danish provider of media monitoring and media analysis • Largest and oldest Danish Media archive with access to approximately 75 million searchable articles @_FIBEP #_FIBEP #WMIC152015-11-19
  • 3. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna About Flax • Founded in 2001 in Cambridge, U.K. • Independent, honest advice and analysis • Expert design & development, Apache Solr committers • Test-driven relevancy and performance tuning • Custom training & mentoring for your staff • Flexible support up to 24/7/365 with SLAs • Some of our clients: @_FIBEP #_FIBEP #WMIC152015-11-19 
  • 4. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna The situation at Infomedia in 2013 • Very old media monitoring system based on Verity • Verity was put into production in 2001 at the company that would later become Infomedia! • Slightly less old installation of Autonomy IDOL used for Infomedia’s Media Archive • put into production at Infomedia in 2009/10 • Drawbacks: – Verity at almost max capacity needing constant attention – Old and complex workflow for receiving and processing articles – Different platforms for monitoring and archive searches meant we were ‘bi-lingual’, using two different query languages in-house. – Verity no longer supported by the owning company (HP) – Verity not scalable! @_FIBEP #_FIBEP #WMIC152015-11-19
  • 5. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna What to do? • Different upgrading options explored throughout 2011-2012 • Upgrade everything to Autonomy IDOL? • Switch to other commercial search engine? • Go open-source? • Recommendations and internal testing drew us to Apache Solr, an open source enterprise search platform • Advantages: – Transparency (going from commercial to open-source) – Rapid maturity of Solr – development moving very fast – Large and active Solr Community – Customizability – Solr is known to be fast and highly scalable – No license fees @_FIBEP #_FIBEP #WMIC152015-11-19
  • 6. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Defining the project with Flax • Infomedia searched for Solr expertise in Denmark/Scandinavia – could not find an option that we were comfortable with • Introduced to Flax through networking and recommendations – Experience from similar upgrade projects with Gorkana and AAP – Very impressed with Flax’s insight, knowledge and credentials – Actual committer to Apache Solr • Project began in autumn of 2013 with the goals of: – Building a completely new search architecture to replace Verity and IDOL – Defining Infomedia's own query language, IQL, owned and controlled by Infomedia – Translating old monitoring queries (app. 8.000) to this new IQL syntax @_FIBEP #_FIBEP #WMIC152015-11-19
  • 7. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Replacing Verity • Verity replaced by Flax Monitor – Parses IQL to Lucene queries – Runs on 2 servers – Uses Luwak, Flax's 'stored search' library: • Built on Apache Lucene (as is Solr) • Also used by Bloomberg, Booz Allen Hamilton & others • In use for 1m stored searches (some 250k characters), 1m stories/day • 40x faster than Elasticsearch Percolator • Open source at https://github.com/flaxsearch/luwak @_FIBEP #_FIBEP #WMIC152015-11-19
  • 8. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries $$$
  • 9. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day $$$ Within 5-100ms
  • 10. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day $$$$$$ Within 5-100ms
  • 11. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Result Query QueryStored Queries 1 million queries Some 250k long Complex rules 1 million new documents a day $$$$$$ Within 5-100ms
  • 12. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Query QueryStored Queries 1. Pre Query Subset 1 million queries Some 250k long Complex rules ~200 Doc 1 million new documents a day
  • 13. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Turning search upside down @_FIBEP #_FIBEP #WMIC152015-11-19 Docs Query QueryStored Queries 1. Pre Query Subset Result 1 million queries Some 250k long Complex rules ~200 2. Search Doc 1 million new documents a day
  • 14. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Replacing Autonomy IDOL • Autonomy IDOL replaced by Apache Solr − Parses IQL to Lucene queries − SolrCloud distributes the index & queries across several servers − Setup: 75 million documents hosted on 8 servers, 6 cores/24GB memory and 125 GB storage per server − This setup is doubled to have full redundancy − Features added to standard Solr by Flax: • Custom highlighting, • Framework to handle multiple languages • Extended error logging • Cluster management • Performance enhancements for complex wildcard queries @_FIBEP #_FIBEP #WMIC152015-11-19
  • 15. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Benefits of the project • Articles indexed and searchable within minutes of receiving them • New, much smarter tools for constructing and comparing monitoring queries • The Flax Monitor is an extremely smart and performant monitoring solution • Huge benefits from defining the Infomedia Query Language, IQL – Extremely enlightening and empowering process to analyze what we actually need from a query language – We fully understand and have documented how IQL works – IQL is designed to match Infomedia’s demands and preferences – We can revise and expand IQL as new needs and opportunities arrive – Not bound to any search platform. We can take it with us @_FIBEP #_FIBEP #WMIC152015-11-19
  • 16. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Learnings/Where are we now? • A challenging, complex, time-consuming but ultimately rewarding project • The ripple effect – we have had to revisit and update a lot of legacy systems • Customization is great, but can also mean more specification • Open Source prevents lock-in but demands investment in education - otherwise it is still just a magic box • Flax‘s expert knowledge has been invaluable • A succesful migration • More than 90% of Infomedia’s monitoring queries have been migrated to IQL with practically no negative change in precision or recall • The collaboration with Flax continues • As Infomedia develops, so do new ideas and feature requests • A customized open source platform also means continuous improvement • Currently updating to Solr 5.3 • Still experimenting with different ways to scale our Solr installation @_FIBEP #_FIBEP #WMIC152015-11-19
  • 17. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna Other lessons • You can also keep your old query language - Flax have written dtSearch & Verity parsers for Lucene • Some of your old queries might not be working - e.g. Verity doesn't always tell you when queries are broken! • Open source can help future-proof your search - and you have control of the software • Engage with the open source community: - User groups - Mailing lists - Contribute back if you can @_FIBEP #_FIBEP #WMIC152015-11-19
  • 18. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna @_FIBEP #_FIBEP #WMIC15Date of Presentation Thanks for listening - any questions? Kristian Schou, Infomedia & Charlie Hull, Flax @InfomediaDK @Flaxsearch Web: www.flax.co.uk
  • 19. FIBEP World Media Intelligence Congress17-20 November 2015, Vienna @_FIBEP #_FIBEP #WMIC15Date of Presentation Something else you might like Think outside the search box! 2DSearch is a patent pending, radical alternative to traditional keyword search. Instead of a one-dimensional search box, concepts are expressed and manipulated as objects on a two-dimensional canvas. So you spend less time worrying about Boolean strings, and more time creating semantically transparent queries and effective search strategies. Sign up to gain early access at www.2dsearch.com