SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
Orange search engine 
Jean-Pierre Paris 
Orange France 
november 13th, 2014
agenda 
part 1 Orange French search engine 
part 2 why Elasticsearch? 
part 3 conclusion 
2 Orange French search engines and Elastisearch
Orange search engine 
4 millions 
~1 million 
8 bn docs 
FR 
3 Orange French search engines and Elastisearch 
80 persons 
~1000 servers 
3 datacenters
search engine response page 
§ one response page… 
§ with a lot of data sources 
§ and a lot of engines 
4 Orange French search engines and Elastisearch
vertical search engines 
5 Orange French search engines and Elastisearch
web search and web graph 
6 Orange French search engines and Elastisearch 
repris de Wikipedia
volume 
§ vertical search engines 
– 10m documents 
– 5 engines in 2014 
§ web graph 
– 8bn urls 
– 2bn internal vertices 
– 6bn leaf vertices 
– 100bn edges 
7 Orange French search engines and Elastisearch 
10GB 
13TB
agenda 
part 1 Orange French search engine 
part 2 why Elasticsearch? 
part 3 conclusion 
8 Orange French search engines and Elastisearch
our needs 
§ vertical search engines 
– adopt one common technology 
– lower maintenance cost 
– prepare future needs 
§ web graph 
– gain insight on large dataset 
– build analysis and visualization 
– test new technology with large volume 
9 Orange French search engines and Elastisearch
Elasticsearch responses 
§ rest interface 
§ near real time distributed indexing and distributed search 
§ native full text search 
– with a lot of different queries and wildcards 
§ facets… oups! aggregations! 
– values distribution on a specific criterion 
§ interactive mode while exploring a dataset 
– short query response time 
10 Orange French search engines and Elastisearch
hardware architecture 
… 
store store store 
… 
11 Orange French search engines and Elastisearch 
x30 
x30 
Elasticsearch cluster
indexing with ES v0.90 
§ performances 
– starting at 160 doc/s (1 injector, 4 ES 2cpus, 4GB) 
– with bulk 1000: 920 doc/s (1 injector, 4 ES 2cpus, 4GB) 
– 3 injectors: 570 doc/s * 3 = 1700 doc/s 
– 1 injector, 30 ES (8cpus, 16GB): 1700 doc/s 
– 30 injectors, 30 ES (8cpus, 16GB): 32,000 doc/s 
– 30 injectors, 60 ES (http-data) (8cpus, 16GB): 36,000 doc/s 
– 240 injectors, 60 ES (http-data) (8cpus, 16GB): 75,000 doc/s, then 
43,000 doc/s 
– 1bn docs in 5h (55,000 doc/s) 
12 Orange French search engines and Elastisearch
hardware architecture 
… 
… 
store 
http 
store 
http 
13 Orange French search engines and Elastisearch 
x30 
x30 
Elasticsearch cluster 
store 
http 
data data data
number of shards 
sec! 
1200 
1000 
800 
600 
400 
200 
0 
321 sec for 12 shards 
0 5 10 15 20 25 30 
14 Orange French search engines and Elastisearch 
#shards!
bulksize 
600 
500 
400 
300 
200 
100 
0 
278 sec for bulksize 1700 
0 1000 2000 3000 4000 5000 6000 7000 8000 
15 Orange French search engines and Elastisearch 
bulksize! 
sec!
searching 
§ performance 
– 2 req/s out of the box with 6.5TB index 
– OS cache is mandatory 
– 130 req/s in cache 
– lot of requests needed to load cache 
§ relevance 
– good for vertical engines 
– non significant in web graph experimentation 
16 Orange French search engines and Elastisearch
why Elasticsearch AND hadoop? 
§ simply use existing bridge 
– open-sourced by Elasticsearch 
§ ability to choose best technology 
– performance 
– expression power 
§ examples 
– compute and re-inject back links 
– distribute Elasticsearch injections 
17 Orange French search engines and Elastisearch
hardware architecture 
… 
… 
18 Orange French search engines and Elastisearch 
x30 
x30 
Elasticsearch cluster 
hadoop cluster 
x1 
http http http 
data 
hdfs 
hadoop 
hive 
pig 
master 
data 
hdfs 
hadoop 
hive 
pig 
data 
hdfs 
hadoop 
hive 
pig
agenda 
part 1 Orange French search engine 
part 2 why Elasticsearch? 
part 3 conclusion 
19 Orange French search engines and Elastisearch
conclusion 
§ vertical engines 
– migration decided 09/13 
– first set in production 01/14 
§ web graph 
– experimentation decided 09/13 
– 1bn docs indexed 12/13, significant queries 03/14 
§ professional community 
§ connectors to others technologies 
§ flexibility 
– production and experimentation 
– high volume 
20 Orange French search engines and Elastisearch
thanks! 
more infos http://blog.lemoteur.fr 
or @lemoteur

Weitere ähnliche Inhalte

Andere mochten auch

Bespoke service discovery with HAProxy and Marathon on Mesos
Bespoke service discovery with HAProxy and Marathon on MesosBespoke service discovery with HAProxy and Marathon on Mesos
Bespoke service discovery with HAProxy and Marathon on MesosBart Spaans
 
The moment my site got hacked
The moment my site got hackedThe moment my site got hacked
The moment my site got hackedMarko Heijnen
 
VarnaConf - Blue/Green Deployments with Docker, haproxy and Consul
VarnaConf - Blue/Green Deployments with Docker, haproxy and ConsulVarnaConf - Blue/Green Deployments with Docker, haproxy and Consul
VarnaConf - Blue/Green Deployments with Docker, haproxy and Consulzeridon
 
How we cooked Elasticsearch, Consul, HAproxy and DNS-recursor
How we cooked Elasticsearch, Consul, HAproxy and DNS-recursorHow we cooked Elasticsearch, Consul, HAproxy and DNS-recursor
How we cooked Elasticsearch, Consul, HAproxy and DNS-recursorOleg Tokarev
 
HAProxy scale out using open source
HAProxy scale out using open sourceHAProxy scale out using open source
HAProxy scale out using open sourceIngo Walz
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaAvinash Ramineni
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
Redis in Practice
Redis in PracticeRedis in Practice
Redis in PracticeNoah Davis
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaAmazee Labs
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerSematext Group, Inc.
 
Everything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to askEverything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to askCarlos Abalde
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseAlexandre Rafalovitch
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 

Andere mochten auch (16)

Bespoke service discovery with HAProxy and Marathon on Mesos
Bespoke service discovery with HAProxy and Marathon on MesosBespoke service discovery with HAProxy and Marathon on Mesos
Bespoke service discovery with HAProxy and Marathon on Mesos
 
The moment my site got hacked
The moment my site got hackedThe moment my site got hacked
The moment my site got hacked
 
VarnaConf - Blue/Green Deployments with Docker, haproxy and Consul
VarnaConf - Blue/Green Deployments with Docker, haproxy and ConsulVarnaConf - Blue/Green Deployments with Docker, haproxy and Consul
VarnaConf - Blue/Green Deployments with Docker, haproxy and Consul
 
How we cooked Elasticsearch, Consul, HAproxy and DNS-recursor
How we cooked Elasticsearch, Consul, HAproxy and DNS-recursorHow we cooked Elasticsearch, Consul, HAproxy and DNS-recursor
How we cooked Elasticsearch, Consul, HAproxy and DNS-recursor
 
HAProxy scale out using open source
HAProxy scale out using open sourceHAProxy scale out using open source
HAProxy scale out using open source
 
Log analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and KibanaLog analysis using Logstash,ElasticSearch and Kibana
Log analysis using Logstash,ElasticSearch and Kibana
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Redis in Practice
Redis in PracticeRedis in Practice
Redis in Practice
 
Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
 
Everything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to askEverything you always wanted to know about Redis but were afraid to ask
Everything you always wanted to know about Redis but were afraid to ask
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 

Ähnlich wie Meetup Elasticsearch 13 novembre 2014

HOW TO SCALE FROM ZERO TO BILLIONS!
HOW TO SCALE FROM ZERO TO BILLIONS!HOW TO SCALE FROM ZERO TO BILLIONS!
HOW TO SCALE FROM ZERO TO BILLIONS!Maziyar PANAHI
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code EuropeDavid Pilato
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataData Con LA
 
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkSunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkShay Hassidim
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC OsloDavid Pilato
 
DIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL LeeDIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL LeeMyNOG
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Ibm power sales bootcamp
Ibm power sales bootcampIbm power sales bootcamp
Ibm power sales bootcampsolarisyougood
 
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit
 
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Behar Veliqi
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsVoltDB
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionJasonRafeMiller
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)Yuuki Takano
 
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Data Con LA
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Spark Summit
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultDataWorks Summit
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Community
 

Ähnlich wie Meetup Elasticsearch 13 novembre 2014 (20)

HOW TO SCALE FROM ZERO TO BILLIONS!
HOW TO SCALE FROM ZERO TO BILLIONS!HOW TO SCALE FROM ZERO TO BILLIONS!
HOW TO SCALE FROM ZERO TO BILLIONS!
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code Europe
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform BenchmarkSunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
Sunx4450 Intel7460 GigaSpaces XAP Platform Benchmark
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
 
DIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL LeeDIY Netflow Data Analytic with ELK Stack by CL Lee
DIY Netflow Data Analytic with ELK Stack by CL Lee
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Ibm power sales bootcamp
Ibm power sales bootcampIbm power sales bootcamp
Ibm power sales bootcamp
 
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar VeliqiSpark Summit EU talk by Ruben Pulido and Behar Veliqi
Spark Summit EU talk by Ruben Pulido and Behar Veliqi
 
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
 
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
Spark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar VeliqiSpark Summit EU talk by Ruben Pulido Behar Veliqi
Spark Summit EU talk by Ruben Pulido Behar Veliqi
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data
 

Kürzlich hochgeladen

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Kürzlich hochgeladen (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Meetup Elasticsearch 13 novembre 2014

  • 1. Orange search engine Jean-Pierre Paris Orange France november 13th, 2014
  • 2. agenda part 1 Orange French search engine part 2 why Elasticsearch? part 3 conclusion 2 Orange French search engines and Elastisearch
  • 3. Orange search engine 4 millions ~1 million 8 bn docs FR 3 Orange French search engines and Elastisearch 80 persons ~1000 servers 3 datacenters
  • 4. search engine response page § one response page… § with a lot of data sources § and a lot of engines 4 Orange French search engines and Elastisearch
  • 5. vertical search engines 5 Orange French search engines and Elastisearch
  • 6. web search and web graph 6 Orange French search engines and Elastisearch repris de Wikipedia
  • 7. volume § vertical search engines – 10m documents – 5 engines in 2014 § web graph – 8bn urls – 2bn internal vertices – 6bn leaf vertices – 100bn edges 7 Orange French search engines and Elastisearch 10GB 13TB
  • 8. agenda part 1 Orange French search engine part 2 why Elasticsearch? part 3 conclusion 8 Orange French search engines and Elastisearch
  • 9. our needs § vertical search engines – adopt one common technology – lower maintenance cost – prepare future needs § web graph – gain insight on large dataset – build analysis and visualization – test new technology with large volume 9 Orange French search engines and Elastisearch
  • 10. Elasticsearch responses § rest interface § near real time distributed indexing and distributed search § native full text search – with a lot of different queries and wildcards § facets… oups! aggregations! – values distribution on a specific criterion § interactive mode while exploring a dataset – short query response time 10 Orange French search engines and Elastisearch
  • 11. hardware architecture … store store store … 11 Orange French search engines and Elastisearch x30 x30 Elasticsearch cluster
  • 12. indexing with ES v0.90 § performances – starting at 160 doc/s (1 injector, 4 ES 2cpus, 4GB) – with bulk 1000: 920 doc/s (1 injector, 4 ES 2cpus, 4GB) – 3 injectors: 570 doc/s * 3 = 1700 doc/s – 1 injector, 30 ES (8cpus, 16GB): 1700 doc/s – 30 injectors, 30 ES (8cpus, 16GB): 32,000 doc/s – 30 injectors, 60 ES (http-data) (8cpus, 16GB): 36,000 doc/s – 240 injectors, 60 ES (http-data) (8cpus, 16GB): 75,000 doc/s, then 43,000 doc/s – 1bn docs in 5h (55,000 doc/s) 12 Orange French search engines and Elastisearch
  • 13. hardware architecture … … store http store http 13 Orange French search engines and Elastisearch x30 x30 Elasticsearch cluster store http data data data
  • 14. number of shards sec! 1200 1000 800 600 400 200 0 321 sec for 12 shards 0 5 10 15 20 25 30 14 Orange French search engines and Elastisearch #shards!
  • 15. bulksize 600 500 400 300 200 100 0 278 sec for bulksize 1700 0 1000 2000 3000 4000 5000 6000 7000 8000 15 Orange French search engines and Elastisearch bulksize! sec!
  • 16. searching § performance – 2 req/s out of the box with 6.5TB index – OS cache is mandatory – 130 req/s in cache – lot of requests needed to load cache § relevance – good for vertical engines – non significant in web graph experimentation 16 Orange French search engines and Elastisearch
  • 17. why Elasticsearch AND hadoop? § simply use existing bridge – open-sourced by Elasticsearch § ability to choose best technology – performance – expression power § examples – compute and re-inject back links – distribute Elasticsearch injections 17 Orange French search engines and Elastisearch
  • 18. hardware architecture … … 18 Orange French search engines and Elastisearch x30 x30 Elasticsearch cluster hadoop cluster x1 http http http data hdfs hadoop hive pig master data hdfs hadoop hive pig data hdfs hadoop hive pig
  • 19. agenda part 1 Orange French search engine part 2 why Elasticsearch? part 3 conclusion 19 Orange French search engines and Elastisearch
  • 20. conclusion § vertical engines – migration decided 09/13 – first set in production 01/14 § web graph – experimentation decided 09/13 – 1bn docs indexed 12/13, significant queries 03/14 § professional community § connectors to others technologies § flexibility – production and experimentation – high volume 20 Orange French search engines and Elastisearch
  • 21. thanks! more infos http://blog.lemoteur.fr or @lemoteur