SlideShare a Scribd company logo
1 of 31
Download to read offline
Using Elastic to
search over 2.5B
videos
Talk structure
● 4 steps to make user experience great again
● 4 patterns to simplify architecture and reduce costs
© 2016 Tubular Labs
2
Data size
● 2.5B documents
● AVG doc size 2Kb, 4Tb total size
● 200M daily updates (~8% of the index)
● Constant indexing rate of 3k/s with spikes
● Querying rate 1-3 r/s (low concurrency)
© 2016 Tubular Labs
3
Hardware
● 52 x c3.4xlarge
● 128 shards
● 16 cores per node
● ~3 shards per node
● 832 cores, 16Tb
SSD, 1.5Tb RAM
© 2016 Tubular Labs
4
● 26 x c3.8xlarge
● 416 shards
● 32 cores per node
● 16 shards per node
● 832 cores, 16Tb
SSD, 1.5Tb RAM
Before After (25% bigger)
Indexing
Optimize indexing
● Using bulk API
• 1Mb per batch (500 docs), should be 5k docs/s
• Recommended 5-15Mb
● Increasing refresh interval
• From 1 to 30 seconds
● Monitoring bulk.rejected
• Increased bulk.queueSize from 50 to 2000
© 2016 Tubular Labs
6
Searching
Product view
© 2016 Tubular Labs
8
Summary
Search results
Term aggregations
Before optimization
© 2016 Tubular Labs
9
Goal
© 2016 Tubular Labs
10
• Slow queries • From 15 to 5 seconds for 95th
• Seeking for 3x improvement
Problem Goal
Understand hardware utilization
© 2016 Tubular Labs
11
• Run the heaviest query
• No bottlenecks (CPU, disk IO, network)
• Thread pool search.size 25
• Max search.active is 3
CPU utilization
© 2016 Tubular Labs
12
• Know
• Your
• Concurrency
Benchmarking # of shards
© 2016 Tubular Labs
13
On a single
32 cores node
More CPU per request results
© 2016 Tubular Labs
14
15s to 7.5s
Search & Aggregations
© 2016 Tubular Labs
15
• Searching and sorting
is fast
• 8 term aggregations
are slow
Aggregation impact
© 2016 Tubular Labs
16
Check facet usage
© 2016 Tubular Labs
17
● Talk to your product manager
● Low product usage
● Remove networks and claims aggregations
● Replace facets with filters
Removing two aggregations results
© 2016 Tubular Labs
18
15s to 5.3s
Cardinality
© 2016 Tubular Labs
19
● Reduce cardinality
● Going from 200M to 5M (channels to creators)
● Reducing # of topics from 5M to 500
Reducing cardinality results
© 2016 Tubular Labs
20
15s to 4.4s
Split query and aggregations
© 2016 Tubular Labs
21
● Searching and aggregating separately
● Using shard-level query cache
● Showing results in UI asynchronously
Split query and aggregations results
© 2016 Tubular Labs
22
15s to 4.0s
Performance gain
© 2016 Tubular Labs
23
● From 15 to 4 seconds (<5 seconds)
● Overall improvement 3.7x
● What about costs?
Architecture patterns
Part 2. Goals
© 2016 Tubular Labs
25
● Reduce costs
● Improve reliability
● Simplify architecture
● Reduce variability in latency
Current flow
© 2016 Tubular Labs
26
● Too many
dependencies
● Expensive
intermediate
storage
Denormalization
© 2016 Tubular Labs
27
● 90% of data is
shared
● No extra calls
from frontend
Partial updates with Update API (experimental)
© 2016 Tubular Labs
28
“Partial” updates with parent-child relations (experimental)
© 2016 Tubular Labs
29
Split data by hot/full (idea for future)
© 2016 Tubular Labs
30
● Cheaper
hardware on full
● Shard allocation
filtering
Thank you

More Related Content

What's hot

Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.comRenzo Tomà
 
Presto changes
Presto changesPresto changes
Presto changesN Masahiro
 
Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVMaragozin
 
Data- How Does It Work-
Data- How Does It Work-Data- How Does It Work-
Data- How Does It Work-Boyang Niu
 
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang HuiStor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang HuiCeph Community
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaLucidworks
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoopChristophe Marchal
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyftkbajda
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon
 
Managing your CF templates as a code with python and troposphere
Managing your CF templates as a code with python and troposphereManaging your CF templates as a code with python and troposphere
Managing your CF templates as a code with python and troposphereYaroslav Tarasenko
 
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...DevOps.com
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioAlluxio, Inc.
 
Fast dataarchitecture
Fast dataarchitectureFast dataarchitecture
Fast dataarchitectureKnoldus Inc.
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysisDivante
 
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Ceph Community
 
Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Matthew Campbell
 

What's hot (20)

Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your Architecture
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
Presto changes
Presto changesPresto changes
Presto changes
 
Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVM
 
OLAP Architecture
OLAP ArchitectureOLAP Architecture
OLAP Architecture
 
Data- How Does It Work-
Data- How Does It Work-Data- How Does It Work-
Data- How Does It Work-
 
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang HuiStor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
Managing your CF templates as a code with python and troposphere
Managing your CF templates as a code with python and troposphereManaging your CF templates as a code with python and troposphere
Managing your CF templates as a code with python and troposphere
 
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 
Fast dataarchitecture
Fast dataarchitectureFast dataarchitecture
Fast dataarchitecture
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
 
NRD: Nagios Result Distributor
NRD: Nagios Result DistributorNRD: Nagios Result Distributor
NRD: Nagios Result Distributor
 
Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)
 

Viewers also liked

AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...
AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...
AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...Amazon Web Services
 
Mobile and Serverless : an Untold Story
Mobile and Serverless : an Untold StoryMobile and Serverless : an Untold Story
Mobile and Serverless : an Untold StoryVidyasagar Machupalli
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQlxfontes
 
The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)Julien SIMON
 
Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?LeanIX GmbH
 
Reproducible Science with Python
Reproducible Science with PythonReproducible Science with Python
Reproducible Science with PythonAndreas Schreiber
 
Neuigkeiten von DEPAROM & Co
Neuigkeiten von DEPAROM & CoNeuigkeiten von DEPAROM & Co
Neuigkeiten von DEPAROM & CoArne Krueger
 
Evolution of OPNFV CI System: What already exists and what can be introduced
Evolution of OPNFV CI System: What already exists and what can be introduced  Evolution of OPNFV CI System: What already exists and what can be introduced
Evolution of OPNFV CI System: What already exists and what can be introduced OPNFV
 
(SEC313) Security & Compliance at the Petabyte Scale
(SEC313) Security & Compliance at the Petabyte Scale(SEC313) Security & Compliance at the Petabyte Scale
(SEC313) Security & Compliance at the Petabyte ScaleAmazon Web Services
 
Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...Bhakti Mehta
 
Reversing malware analysis training part3 windows pefile formatbasics
Reversing malware analysis training part3 windows pefile formatbasicsReversing malware analysis training part3 windows pefile formatbasics
Reversing malware analysis training part3 windows pefile formatbasicsCysinfo Cyber Security Community
 
Data Visualization on the Tech Side
Data Visualization on the Tech SideData Visualization on the Tech Side
Data Visualization on the Tech SideMathieu Elie
 
Persistence in the cloud with bosh
Persistence in the cloud with boshPersistence in the cloud with bosh
Persistence in the cloud with boshm_richardson
 
Security For Humans
Security For HumansSecurity For Humans
Security For Humansconjur_inc
 
Business selectors
Business selectorsBusiness selectors
Business selectorsbenwaine
 
Een Gezond Gebit2
Een Gezond Gebit2Een Gezond Gebit2
Een Gezond Gebit2guest031320
 

Viewers also liked (20)

AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...
AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...
AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...
 
ITV& Bashton
ITV& Bashton ITV& Bashton
ITV& Bashton
 
Mobile and Serverless : an Untold Story
Mobile and Serverless : an Untold StoryMobile and Serverless : an Untold Story
Mobile and Serverless : an Untold Story
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQ
 
The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)
 
Heelal
HeelalHeelal
Heelal
 
Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?
 
Reproducible Science with Python
Reproducible Science with PythonReproducible Science with Python
Reproducible Science with Python
 
Neuigkeiten von DEPAROM & Co
Neuigkeiten von DEPAROM & CoNeuigkeiten von DEPAROM & Co
Neuigkeiten von DEPAROM & Co
 
Evolution of OPNFV CI System: What already exists and what can be introduced
Evolution of OPNFV CI System: What already exists and what can be introduced  Evolution of OPNFV CI System: What already exists and what can be introduced
Evolution of OPNFV CI System: What already exists and what can be introduced
 
(SEC313) Security & Compliance at the Petabyte Scale
(SEC313) Security & Compliance at the Petabyte Scale(SEC313) Security & Compliance at the Petabyte Scale
(SEC313) Security & Compliance at the Petabyte Scale
 
Introduction to smpc
Introduction to smpc Introduction to smpc
Introduction to smpc
 
Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...
 
Reversing malware analysis training part3 windows pefile formatbasics
Reversing malware analysis training part3 windows pefile formatbasicsReversing malware analysis training part3 windows pefile formatbasics
Reversing malware analysis training part3 windows pefile formatbasics
 
Data Visualization on the Tech Side
Data Visualization on the Tech SideData Visualization on the Tech Side
Data Visualization on the Tech Side
 
Persistence in the cloud with bosh
Persistence in the cloud with boshPersistence in the cloud with bosh
Persistence in the cloud with bosh
 
Security For Humans
Security For HumansSecurity For Humans
Security For Humans
 
AWS + Puppet = Dynamic Scale
AWS + Puppet = Dynamic ScaleAWS + Puppet = Dynamic Scale
AWS + Puppet = Dynamic Scale
 
Business selectors
Business selectorsBusiness selectors
Business selectors
 
Een Gezond Gebit2
Een Gezond Gebit2Een Gezond Gebit2
Een Gezond Gebit2
 

Similar to Tubular Labs - Using Elastic to Search Over 2.5B Videos

DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefGaurav "GP" Pal
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4Gaurav "GP" Pal
 
Meetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWSMeetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWSAWS Vietnam Community
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantSwiss Data Forum Swiss Data Forum
 
Data Lessons Learned at Scale
Data Lessons Learned at ScaleData Lessons Learned at Scale
Data Lessons Learned at ScaleCharlie Reverte
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016Pierre Mavro
 
Loggly - Benchmarking 5 Node.js Logging Libraries
Loggly - Benchmarking 5 Node.js Logging LibrariesLoggly - Benchmarking 5 Node.js Logging Libraries
Loggly - Benchmarking 5 Node.js Logging LibrariesSolarWinds Loggly
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...Anna Ossowski
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3LibbySchulze
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudHostedbyConfluent
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...Amazon Web Services
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedShubham Tagra
 
Training Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of ApplicationsTraining Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of ApplicationsOutSystems
 
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mydbops
 
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleMirantis
 

Similar to Tubular Labs - Using Elastic to Search Over 2.5B Videos (20)

DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
 
Meetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWSMeetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWS
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
Data Lessons Learned at Scale
Data Lessons Learned at ScaleData Lessons Learned at Scale
Data Lessons Learned at Scale
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
moi-connect16
moi-connect16moi-connect16
moi-connect16
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Loggly - Benchmarking 5 Node.js Logging Libraries
Loggly - Benchmarking 5 Node.js Logging LibrariesLoggly - Benchmarking 5 Node.js Logging Libraries
Loggly - Benchmarking 5 Node.js Logging Libraries
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
BAXTER phase 1b
BAXTER phase 1bBAXTER phase 1b
BAXTER phase 1b
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speed
 
Training Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of ApplicationsTraining Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of Applications
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
 
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at Scale
 

Recently uploaded

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Recently uploaded (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Tubular Labs - Using Elastic to Search Over 2.5B Videos

  • 1. Using Elastic to search over 2.5B videos
  • 2. Talk structure ● 4 steps to make user experience great again ● 4 patterns to simplify architecture and reduce costs © 2016 Tubular Labs 2
  • 3. Data size ● 2.5B documents ● AVG doc size 2Kb, 4Tb total size ● 200M daily updates (~8% of the index) ● Constant indexing rate of 3k/s with spikes ● Querying rate 1-3 r/s (low concurrency) © 2016 Tubular Labs 3
  • 4. Hardware ● 52 x c3.4xlarge ● 128 shards ● 16 cores per node ● ~3 shards per node ● 832 cores, 16Tb SSD, 1.5Tb RAM © 2016 Tubular Labs 4 ● 26 x c3.8xlarge ● 416 shards ● 32 cores per node ● 16 shards per node ● 832 cores, 16Tb SSD, 1.5Tb RAM Before After (25% bigger)
  • 6. Optimize indexing ● Using bulk API • 1Mb per batch (500 docs), should be 5k docs/s • Recommended 5-15Mb ● Increasing refresh interval • From 1 to 30 seconds ● Monitoring bulk.rejected • Increased bulk.queueSize from 50 to 2000 © 2016 Tubular Labs 6
  • 8. Product view © 2016 Tubular Labs 8 Summary Search results Term aggregations
  • 10. Goal © 2016 Tubular Labs 10 • Slow queries • From 15 to 5 seconds for 95th • Seeking for 3x improvement Problem Goal
  • 11. Understand hardware utilization © 2016 Tubular Labs 11 • Run the heaviest query • No bottlenecks (CPU, disk IO, network) • Thread pool search.size 25 • Max search.active is 3
  • 12. CPU utilization © 2016 Tubular Labs 12 • Know • Your • Concurrency
  • 13. Benchmarking # of shards © 2016 Tubular Labs 13 On a single 32 cores node
  • 14. More CPU per request results © 2016 Tubular Labs 14 15s to 7.5s
  • 15. Search & Aggregations © 2016 Tubular Labs 15 • Searching and sorting is fast • 8 term aggregations are slow
  • 16. Aggregation impact © 2016 Tubular Labs 16
  • 17. Check facet usage © 2016 Tubular Labs 17 ● Talk to your product manager ● Low product usage ● Remove networks and claims aggregations ● Replace facets with filters
  • 18. Removing two aggregations results © 2016 Tubular Labs 18 15s to 5.3s
  • 19. Cardinality © 2016 Tubular Labs 19 ● Reduce cardinality ● Going from 200M to 5M (channels to creators) ● Reducing # of topics from 5M to 500
  • 20. Reducing cardinality results © 2016 Tubular Labs 20 15s to 4.4s
  • 21. Split query and aggregations © 2016 Tubular Labs 21 ● Searching and aggregating separately ● Using shard-level query cache ● Showing results in UI asynchronously
  • 22. Split query and aggregations results © 2016 Tubular Labs 22 15s to 4.0s
  • 23. Performance gain © 2016 Tubular Labs 23 ● From 15 to 4 seconds (<5 seconds) ● Overall improvement 3.7x ● What about costs?
  • 25. Part 2. Goals © 2016 Tubular Labs 25 ● Reduce costs ● Improve reliability ● Simplify architecture ● Reduce variability in latency
  • 26. Current flow © 2016 Tubular Labs 26 ● Too many dependencies ● Expensive intermediate storage
  • 27. Denormalization © 2016 Tubular Labs 27 ● 90% of data is shared ● No extra calls from frontend
  • 28. Partial updates with Update API (experimental) © 2016 Tubular Labs 28
  • 29. “Partial” updates with parent-child relations (experimental) © 2016 Tubular Labs 29
  • 30. Split data by hot/full (idea for future) © 2016 Tubular Labs 30 ● Cheaper hardware on full ● Shard allocation filtering