SlideShare a Scribd company logo
1 of 23
Apache Solr Ratification




     cdevecchi@gmail.com
Solr – What is it?


• Apache Project
• Open source engine based in lucene
• APIs XML/HTTP e JSON
Features


•   Lemmatization

•   Hit Highlight

•   Dictionaries

•   Geosearch

•   Faceted Search

•   Caching

•   Index Replication and Databases Integration
Characteristics


•   Java -> Tomcat / Jboss / Jetty

•   Schema

•   Client solrj

•   Jmx statistics
Query

• Highlighting
   – Activated by query (hl=true)


• Text Analysis
   – Use dictionary and thesaurus
   – Relevancy searchs
   – Spelling suggestions
   – Search by similarity (“More like this”)
   – Fuzzy (Damerau-Levenshtein distance)
Query


• Querying data
   – Words
   – Words by field
   – Orderly (sort)


• Faceted Search
   – Categories
Query


• Faceted Search, the queries could be a problem?
   – Exemple


   http://localhost:8983/solr/select?

   q=video&rows=0&facet=true&facet.field=inStock

   &facet.query=price:[*+TO+500]

   &facet.query=price:[500+TO+*]

   &facet.prefix=xx&facet.limit=5&facet.mincount=1
Data indexing




• Solr XML native
• CSV
• Database (DIH)
• Rich Documents
• Crawler
Index




• Index is being larger than you imagine?


• Could be adjusted in:
   – Index size segments
   – Merge index segments
Collections



• It is possible to create separated index by documents kind
Data Replication


• Master / Slave
   - Index
   - Config files
Sharding

• ZooKeeper
  – http://hadoop.apache.org/zookeeper/
SolrCaching


• Put searched docs on cache
• Two implementations
   – Solr.search.LRUCache (LRU= Least Recently Used in
     memory)
   – Solr.search.FastLRUCache (a partir da versão 1.4)
• How to use
   – filterCache
   – queryResultCache
   – documentCache (sobe tudo em memória)
Cluster – Carrot2



• Search Results Clustering Engine

• Search in many nodes




•   Live Demo

     – http://search.carrot2.org/stable/search
Crawling


• Apache Nutch
  – Search, parse and parallel indexing or distributed indexing
  – Many formats
     • Ex. plain text, html, xml, zip, .doc, javascript, rss, pdf, etc
  – Cluster
  – MapReduce
  – Distributed Filesystem (via hadoop)
Backup / Snapshot



• Active by scripts (solr-tools)

• Index snapshots

• Diferencial backups

   – $solr_data/yyyymmdd
Architecture (Master/Slave)
Architecture (Índice Distribuído)
Indexing Tests



• Indexing tests
   • 7k xml sized, with 111 fields


• 1,2 milion docs on index


• VM -> 2GB RAM, processor 2.33 Ghz
Indexing Tests




                 90

                 44
Search Tests
QPS




      61
       0
37
 5

38    38
References




•   http://lucene.apache.org/solr/

•   http://wiki.apache.org/solr/

•   http://project.carrot2.org/

•   http://download.carrot2.org/head/manual/index.html#chapter.introduction

•   http://wiki.apache.org/solr/ZooKeeperIntegration

More Related Content

What's hot

/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repositoryJukka Zitting
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friendslucenerevolution
 
Nutch - web-scale search engine toolkit
Nutch - web-scale search engine toolkitNutch - web-scale search engine toolkit
Nutch - web-scale search engine toolkitabial
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearchJoey Wen
 
Modules Building Presentation
Modules Building PresentationModules Building Presentation
Modules Building Presentationhtyson
 
Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Mike Frampton
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr Tommaso Teofili
 
Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB MongoDB
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014Roy Russo
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch BasicsShifa Khan
 
Turning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseTurning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseMatthias Wahl
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with ElasticsearchSamantha Quiñones
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Nutch as a Web data mining platform
Nutch as a Web data mining platformNutch as a Web data mining platform
Nutch as a Web data mining platformabial
 
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)Elasticsearch in Production (London version)
Elasticsearch in Production (London version)foundsearch
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLNguyen Van Vuong
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSMongoDB
 
Large scale crawling with Apache Nutch
Large scale crawling with Apache NutchLarge scale crawling with Apache Nutch
Large scale crawling with Apache NutchJulien Nioche
 

What's hot (20)

/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 
Nutch - web-scale search engine toolkit
Nutch - web-scale search engine toolkitNutch - web-scale search engine toolkit
Nutch - web-scale search engine toolkit
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Modules Building Presentation
Modules Building PresentationModules Building Presentation
Modules Building Presentation
 
Web scraping with nutch solr part 2
Web scraping with nutch solr part 2Web scraping with nutch solr part 2
Web scraping with nutch solr part 2
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr
 
Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB Back to Basics: Build Something Big With MongoDB
Back to Basics: Build Something Big With MongoDB
 
ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014ElasticSearch - DevNexus Atlanta - 2014
ElasticSearch - DevNexus Atlanta - 2014
 
Elasticsearch Basics
Elasticsearch BasicsElasticsearch Basics
Elasticsearch Basics
 
Turning a Search Engine into a Relational Database
Turning a Search Engine into a Relational DatabaseTurning a Search Engine into a Relational Database
Turning a Search Engine into a Relational Database
 
Managing Your Content with Elasticsearch
Managing Your Content with ElasticsearchManaging Your Content with Elasticsearch
Managing Your Content with Elasticsearch
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Nutch as a Web data mining platform
Nutch as a Web data mining platformNutch as a Web data mining platform
Nutch as a Web data mining platform
 
Elasticsearch in Production (London version)
Elasticsearch in Production (London version)Elasticsearch in Production (London version)
Elasticsearch in Production (London version)
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQL
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
Large scale crawling with Apache Nutch
Large scale crawling with Apache NutchLarge scale crawling with Apache Nutch
Large scale crawling with Apache Nutch
 

Viewers also liked

Citation Searching for Promotion and Tenure in Web of Science
Citation Searching for Promotion and Tenure in Web of ScienceCitation Searching for Promotion and Tenure in Web of Science
Citation Searching for Promotion and Tenure in Web of ScienceCary Gouldin
 
Solr data importhandler
Solr data importhandlerSolr data importhandler
Solr data importhandlerDikshant Shahi
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Database indexing framework
Database indexing frameworkDatabase indexing framework
Database indexing frameworkNitin Pande
 
Web of Science – A Short Insight
Web of Science – A Short InsightWeb of Science – A Short Insight
Web of Science – A Short Insightkatrinetzrodt
 
K2 a tech workshopfinal
K2 a tech workshopfinalK2 a tech workshopfinal
K2 a tech workshopfinalKatie McKnight
 
Developing with pyGTK in EeePC
Developing with pyGTK in EeePCDeveloping with pyGTK in EeePC
Developing with pyGTK in EeePCTsungWei Hu
 
E-commerce (Social Commerce) Infanto-Juvenil
E-commerce (Social Commerce) Infanto-JuvenilE-commerce (Social Commerce) Infanto-Juvenil
E-commerce (Social Commerce) Infanto-JuvenilKari Kotake
 
Responsive design business_case_or_not
Responsive design business_case_or_notResponsive design business_case_or_not
Responsive design business_case_or_notTommi Pelkonen
 
ASIJ Elementary School Counseling and Guidance Back to School 2014
ASIJ Elementary School Counseling and Guidance Back to School 2014ASIJ Elementary School Counseling and Guidance Back to School 2014
ASIJ Elementary School Counseling and Guidance Back to School 2014Naho Kikuchi
 
雲端運算的演進與定義
雲端運算的演進與定義雲端運算的演進與定義
雲端運算的演進與定義Awei Hsu
 
Astronomia con il software libero
Astronomia con il software liberoAstronomia con il software libero
Astronomia con il software liberoStefano Morandi
 
Generace Y - výzkum Telefonica O2
Generace Y - výzkum Telefonica O2Generace Y - výzkum Telefonica O2
Generace Y - výzkum Telefonica O2MarketingSalesMedia
 

Viewers also liked (20)

Web Of Science
Web Of ScienceWeb Of Science
Web Of Science
 
Citation Searching for Promotion and Tenure in Web of Science
Citation Searching for Promotion and Tenure in Web of ScienceCitation Searching for Promotion and Tenure in Web of Science
Citation Searching for Promotion and Tenure in Web of Science
 
Solr data importhandler
Solr data importhandlerSolr data importhandler
Solr data importhandler
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Database indexing framework
Database indexing frameworkDatabase indexing framework
Database indexing framework
 
Web of Science – A Short Insight
Web of Science – A Short InsightWeb of Science – A Short Insight
Web of Science – A Short Insight
 
Evaluacion segunda unidad
Evaluacion segunda unidadEvaluacion segunda unidad
Evaluacion segunda unidad
 
Performance em javascript
Performance em javascriptPerformance em javascript
Performance em javascript
 
K2 a tech workshopfinal
K2 a tech workshopfinalK2 a tech workshopfinal
K2 a tech workshopfinal
 
Developing with pyGTK in EeePC
Developing with pyGTK in EeePCDeveloping with pyGTK in EeePC
Developing with pyGTK in EeePC
 
E-commerce (Social Commerce) Infanto-Juvenil
E-commerce (Social Commerce) Infanto-JuvenilE-commerce (Social Commerce) Infanto-Juvenil
E-commerce (Social Commerce) Infanto-Juvenil
 
Responsive design business_case_or_not
Responsive design business_case_or_notResponsive design business_case_or_not
Responsive design business_case_or_not
 
ASIJ Elementary School Counseling and Guidance Back to School 2014
ASIJ Elementary School Counseling and Guidance Back to School 2014ASIJ Elementary School Counseling and Guidance Back to School 2014
ASIJ Elementary School Counseling and Guidance Back to School 2014
 
E commerce Programming
E commerce Programming E commerce Programming
E commerce Programming
 
1. lipid
1. lipid1. lipid
1. lipid
 
Travel cambodia
Travel cambodiaTravel cambodia
Travel cambodia
 
雲端運算的演進與定義
雲端運算的演進與定義雲端運算的演進與定義
雲端運算的演進與定義
 
Astronomia con il software libero
Astronomia con il software liberoAstronomia con il software libero
Astronomia con il software libero
 
Generace Y - výzkum Telefonica O2
Generace Y - výzkum Telefonica O2Generace Y - výzkum Telefonica O2
Generace Y - výzkum Telefonica O2
 
Speech
SpeechSpeech
Speech
 

Similar to Solr

Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchBigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchNetConstructor, Inc.
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopDmitry Kan
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Sematext Group, Inc.
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1Stefan Schmidt
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoopgregchanan
 

Similar to Solr (20)

Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearchBigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache solr
Apache solrApache solr
Apache solr
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
SolrCloud on Hadoop
SolrCloud on HadoopSolrCloud on Hadoop
SolrCloud on Hadoop
 
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
 

Recently uploaded

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Solr

  • 1. Apache Solr Ratification cdevecchi@gmail.com
  • 2. Solr – What is it? • Apache Project • Open source engine based in lucene • APIs XML/HTTP e JSON
  • 3. Features • Lemmatization • Hit Highlight • Dictionaries • Geosearch • Faceted Search • Caching • Index Replication and Databases Integration
  • 4. Characteristics • Java -> Tomcat / Jboss / Jetty • Schema • Client solrj • Jmx statistics
  • 5. Query • Highlighting – Activated by query (hl=true) • Text Analysis – Use dictionary and thesaurus – Relevancy searchs – Spelling suggestions – Search by similarity (“More like this”) – Fuzzy (Damerau-Levenshtein distance)
  • 6. Query • Querying data – Words – Words by field – Orderly (sort) • Faceted Search – Categories
  • 7. Query • Faceted Search, the queries could be a problem? – Exemple http://localhost:8983/solr/select? q=video&rows=0&facet=true&facet.field=inStock &facet.query=price:[*+TO+500] &facet.query=price:[500+TO+*] &facet.prefix=xx&facet.limit=5&facet.mincount=1
  • 8. Data indexing • Solr XML native • CSV • Database (DIH) • Rich Documents • Crawler
  • 9. Index • Index is being larger than you imagine? • Could be adjusted in: – Index size segments – Merge index segments
  • 10. Collections • It is possible to create separated index by documents kind
  • 11. Data Replication • Master / Slave - Index - Config files
  • 12. Sharding • ZooKeeper – http://hadoop.apache.org/zookeeper/
  • 13. SolrCaching • Put searched docs on cache • Two implementations – Solr.search.LRUCache (LRU= Least Recently Used in memory) – Solr.search.FastLRUCache (a partir da versão 1.4) • How to use – filterCache – queryResultCache – documentCache (sobe tudo em memória)
  • 14. Cluster – Carrot2 • Search Results Clustering Engine • Search in many nodes • Live Demo – http://search.carrot2.org/stable/search
  • 15. Crawling • Apache Nutch – Search, parse and parallel indexing or distributed indexing – Many formats • Ex. plain text, html, xml, zip, .doc, javascript, rss, pdf, etc – Cluster – MapReduce – Distributed Filesystem (via hadoop)
  • 16. Backup / Snapshot • Active by scripts (solr-tools) • Index snapshots • Diferencial backups – $solr_data/yyyymmdd
  • 19. Indexing Tests • Indexing tests • 7k xml sized, with 111 fields • 1,2 milion docs on index • VM -> 2GB RAM, processor 2.33 Ghz
  • 20. Indexing Tests 90 44
  • 22. QPS 61 0 37 5 38 38
  • 23. References • http://lucene.apache.org/solr/ • http://wiki.apache.org/solr/ • http://project.carrot2.org/ • http://download.carrot2.org/head/manual/index.html#chapter.introduction • http://wiki.apache.org/solr/ZooKeeperIntegration

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. Filter cache (usado para 3 condições)\n1. cacheia conteúdo dos parâmetros “fq”\n2. cachear as faceted\n3. Sort -> se estiver setado para true\n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n