SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Go Big Quick
Jason Scheller
Platform & Content Analytics, Eikon
Pricing & Text Analytics Platform
• Mission - Ingest, enrich, store, analyze everything. Provide a
single platform for search and analytics capabilities over any
hosted content. Serve as a platform for future innovation.
• Content
• Twitter (~675 Tweets/sec, 15 days history)
• News (~40 articles/sec, 18 months history)
• Research (40 million docs, 3 million/year)
• Filings (29 million docs, 2.5 million/year)
• Trade data (500k RICS, 30K/sec, 10 years)
• Various metadata and derived content sets
Pricing & Text Analytics Platform
Pricing & Text Analytics Platform
Infrastructure
IBM Streams
30 servers
18 servers
86 TB
Where to start?
Data
Max Shard
Index
Shard
0
Data
JMeter
Max Shard
• Disk space
• Request load
• RAM usage
Maximum Shard Size
• This same experiment will also give you the ratio of data to
index size, which is great for planning. Just make sure you’re
using your real analyzer settings.
• The rest is just math!
• Don’t forget to account for:
• Memory required to facet & sort
• Replica shards
• Data compression
Max Total Index Size / Max Shard Size = # Nodes
SPREADSHEET
But do I always use Max Shards?
ALLOCATION & HARDWARE
Cluster Allocation
• Elasticsearch will figure out which node should host which shard. Let it! Its
better than you at figuring this out and moving shards around.
• Well mostly….
• Let’s say you have indices A – D, 4 shards each, 0 replicas, 4 nodes.
Elasticsearch might arrange your shards like this based on the size of each
shard.
A1
C1
B1
C4D4C3
B3A3B4A4B2A2
D2C2D3D1
Cluster Allocation
• But what about other considerations?
• Hot spotting
• Access frequency
• Connectivity for River-based ingestion
• Heterogeneous hardware
A1
C1
B1
C4D4C3
B3A3B4A4B2A2
D2C2D3D1
Cluster Allocation – Heterogeneous Hardware
• Suppose you know that indices A and B get queried 1000s of times per
second, but C and D are only hit ~1 a second. Maybe bought some better
hardware to host A and B and don’t want to waste those machines on C and
D.
• Is this a good allocation?
Slow HW Slow HW Fast HW Fast HW
A1
C1
B1
C4D4C2
B3A1B4A4B2A2
D2C3D3D1
Cluster Allocation – Heterogeneous Hardware
• Suppose you know that indices A and B get queried 1000s of times per
second, but C and D are only hit ~1 a second. Maybe bought some better
hardware to host A and B and don’t want to waste those machines on C and
D.
• Is this a good allocation?
• Not really. The slower machines will slow all queries to A & B. And I’m not
getting my money’s worth from that better hardware!
Slow HW Slow HW Fast HW Fast HW
A1
C1
B1
C4D4C2
B3A1B4A4B2A2
D2C3D3D1
Cluster Allocation – Heterogeneous Hardware
• Wouldn’t this be better?
• Shard allocation settings allow us to “control” which nodes host which indices
without ever specifying specific machines or IPs.
Slow HW Slow HW Fast HW Fast HW
A1C1 B1
C4
D4C2
B3A1B4A4
B2A2
D2C3D3
D1
Cluster Allocation – Heterogeneous Hardware
Slow HW Slow HW Fast HW Fast HW
A1C1 B1
C4
D4C2
B3A1B4A4
B2A2
D2C3D3
D1
node.hardware: slow node.hardware: fast
Index.routing.allocation.require.hardware: fast
Node Settings Node Settings
Index Settings: A & B
Cluster Allocation – Heterogeneous Hardware
Slow HW Fast HW Fast HW Fast HW
A1C1 B1
C4 D4
C2
B3A1
B4
A4
B2A2
D2C3D3
D1
• Is this ok? …Sure, why not?!
Cluster Allocation – Archive Example
• We can use the same feature for large data sets of a time-based feed. Say
we keep an index for all news ever. People are generally searching the
most recent 12 months, not the last 30 years.
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HWSlow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HWSlow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW Slow
HW
Slow
HW
Slow
HW
Slow
HW Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Slow
HW
Fast
HW
Fast
HW
Fast
HW
Fast
HW
Fast
HW
Fast
HW
Fast
HW
Fast
HW
Go Big Quick" document provides overview of pricing and text analytics platform

Weitere ähnliche Inhalte

Was ist angesagt?

Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Zhenxiao Luo
 
Seattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersSeattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersbtoddb
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
Lightning Talk: MongoDB Sharding
Lightning Talk: MongoDB ShardingLightning Talk: MongoDB Sharding
Lightning Talk: MongoDB ShardingMongoDB
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseArangoDB Database
 
Webinar: Right and Wrong Ways to Implement MongoDB
Webinar: Right and Wrong Ways to Implement MongoDBWebinar: Right and Wrong Ways to Implement MongoDB
Webinar: Right and Wrong Ways to Implement MongoDBMongoDB
 
Chronografand dashboarding
Chronografand dashboardingChronografand dashboarding
Chronografand dashboardingInfluxData
 
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Coburn Watson
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleScyllaDB
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerMichael Spector
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with DruidYousun Jeong
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017Zhenxiao Luo
 
What Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and GoWhat Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and GoScyllaDB
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyftkbajda
 
Optimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real WorldOptimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real WorldDevOps.com
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Coburn Watson
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackElasticsearch
 

Was ist angesagt? (20)

Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
DynamoDB at HasOffers
DynamoDB at HasOffers DynamoDB at HasOffers
DynamoDB at HasOffers
 
Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017Presto @ Uber Hadoop summit2017
Presto @ Uber Hadoop summit2017
 
Seattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersSeattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffers
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Lightning Talk: MongoDB Sharding
Lightning Talk: MongoDB ShardingLightning Talk: MongoDB Sharding
Lightning Talk: MongoDB Sharding
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
 
Webinar: Right and Wrong Ways to Implement MongoDB
Webinar: Right and Wrong Ways to Implement MongoDBWebinar: Right and Wrong Ways to Implement MongoDB
Webinar: Right and Wrong Ways to Implement MongoDB
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
Chronografand dashboarding
Chronografand dashboardingChronografand dashboarding
Chronografand dashboarding
 
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetu...
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Data Analytics with Druid
Data Analytics with DruidData Analytics with Druid
Data Analytics with Druid
 
Presto Apache BigData 2017
Presto Apache BigData 2017Presto Apache BigData 2017
Presto Apache BigData 2017
 
What Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and GoWhat Kiwi.com Has Learned Running ScyllaDB and Go
What Kiwi.com Has Learned Running ScyllaDB and Go
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
 
Optimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real WorldOptimizing Time Series Performance in the Real World
Optimizing Time Series Performance in the Real World
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic Stack
 

Andere mochten auch

51846050
5184605051846050
51846050w7ed-6
 
Ebay Today
Ebay TodayEbay Today
Ebay TodayDRuiz84
 
Presentation
PresentationPresentation
Presentationchmusman
 
Կոալայի մասին
Կոալայի մասինԿոալայի մասին
Կոալայի մասինannaisaxanyan10
 
Презентація SKM
Презентація SKMПрезентація SKM
Презентація SKMmarkskm
 
BANJO INC Marketing + Branding
BANJO INC Marketing + BrandingBANJO INC Marketing + Branding
BANJO INC Marketing + BrandingThiago Homero
 
The light princess
The light princessThe light princess
The light princesskchasarik
 
OPTECH2015 - IoE - Patrick Sudderth - Vendor Neutral
OPTECH2015 - IoE - Patrick Sudderth - Vendor NeutralOPTECH2015 - IoE - Patrick Sudderth - Vendor Neutral
OPTECH2015 - IoE - Patrick Sudderth - Vendor NeutralPatrick Sudderth
 

Andere mochten auch (8)

51846050
5184605051846050
51846050
 
Ebay Today
Ebay TodayEbay Today
Ebay Today
 
Presentation
PresentationPresentation
Presentation
 
Կոալայի մասին
Կոալայի մասինԿոալայի մասին
Կոալայի մասին
 
Презентація SKM
Презентація SKMПрезентація SKM
Презентація SKM
 
BANJO INC Marketing + Branding
BANJO INC Marketing + BrandingBANJO INC Marketing + Branding
BANJO INC Marketing + Branding
 
The light princess
The light princessThe light princess
The light princess
 
OPTECH2015 - IoE - Patrick Sudderth - Vendor Neutral
OPTECH2015 - IoE - Patrick Sudderth - Vendor NeutralOPTECH2015 - IoE - Patrick Sudderth - Vendor Neutral
OPTECH2015 - IoE - Patrick Sudderth - Vendor Neutral
 

Ähnlich wie Go Big Quick" document provides overview of pricing and text analytics platform

AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB ClusterMongoDB
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters MongoDB
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
Deep Dive into DynamoDB
Deep Dive into DynamoDBDeep Dive into DynamoDB
Deep Dive into DynamoDBAWS Germany
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBMongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best PracticesLewis Lin 🦊
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayAmazon Web Services Korea
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWSSungmin Kim
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server DatabasesColdFusionConference
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesAlexandra Sasha Blumenfeld
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupChristopher Batey
 

Ähnlich wie Go Big Quick" document provides overview of pricing and text analytics platform (20)

AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB Cluster
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
Deep Dive into DynamoDB
Deep Dive into DynamoDBDeep Dive into DynamoDB
Deep Dive into DynamoDB
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
Cloud DWH deep dive
Cloud DWH deep diveCloud DWH deep dive
Cloud DWH deep dive
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Optimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 MinutesOptimize Your Reporting In Less Than 10 Minutes
Optimize Your Reporting In Less Than 10 Minutes
 
Jan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester MeetupJan 2015 - Cassandra101 Manchester Meetup
Jan 2015 - Cassandra101 Manchester Meetup
 

Kürzlich hochgeladen

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Kürzlich hochgeladen (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Go Big Quick" document provides overview of pricing and text analytics platform

  • 1. Go Big Quick Jason Scheller Platform & Content Analytics, Eikon
  • 2. Pricing & Text Analytics Platform • Mission - Ingest, enrich, store, analyze everything. Provide a single platform for search and analytics capabilities over any hosted content. Serve as a platform for future innovation. • Content • Twitter (~675 Tweets/sec, 15 days history) • News (~40 articles/sec, 18 months history) • Research (40 million docs, 3 million/year) • Filings (29 million docs, 2.5 million/year) • Trade data (500k RICS, 30K/sec, 10 years) • Various metadata and derived content sets
  • 3. Pricing & Text Analytics Platform
  • 4. Pricing & Text Analytics Platform
  • 7. Max Shard Index Shard 0 Data JMeter Max Shard • Disk space • Request load • RAM usage
  • 8. Maximum Shard Size • This same experiment will also give you the ratio of data to index size, which is great for planning. Just make sure you’re using your real analyzer settings. • The rest is just math! • Don’t forget to account for: • Memory required to facet & sort • Replica shards • Data compression Max Total Index Size / Max Shard Size = # Nodes
  • 9. SPREADSHEET But do I always use Max Shards?
  • 11. Cluster Allocation • Elasticsearch will figure out which node should host which shard. Let it! Its better than you at figuring this out and moving shards around. • Well mostly…. • Let’s say you have indices A – D, 4 shards each, 0 replicas, 4 nodes. Elasticsearch might arrange your shards like this based on the size of each shard. A1 C1 B1 C4D4C3 B3A3B4A4B2A2 D2C2D3D1
  • 12. Cluster Allocation • But what about other considerations? • Hot spotting • Access frequency • Connectivity for River-based ingestion • Heterogeneous hardware A1 C1 B1 C4D4C3 B3A3B4A4B2A2 D2C2D3D1
  • 13. Cluster Allocation – Heterogeneous Hardware • Suppose you know that indices A and B get queried 1000s of times per second, but C and D are only hit ~1 a second. Maybe bought some better hardware to host A and B and don’t want to waste those machines on C and D. • Is this a good allocation? Slow HW Slow HW Fast HW Fast HW A1 C1 B1 C4D4C2 B3A1B4A4B2A2 D2C3D3D1
  • 14. Cluster Allocation – Heterogeneous Hardware • Suppose you know that indices A and B get queried 1000s of times per second, but C and D are only hit ~1 a second. Maybe bought some better hardware to host A and B and don’t want to waste those machines on C and D. • Is this a good allocation? • Not really. The slower machines will slow all queries to A & B. And I’m not getting my money’s worth from that better hardware! Slow HW Slow HW Fast HW Fast HW A1 C1 B1 C4D4C2 B3A1B4A4B2A2 D2C3D3D1
  • 15. Cluster Allocation – Heterogeneous Hardware • Wouldn’t this be better? • Shard allocation settings allow us to “control” which nodes host which indices without ever specifying specific machines or IPs. Slow HW Slow HW Fast HW Fast HW A1C1 B1 C4 D4C2 B3A1B4A4 B2A2 D2C3D3 D1
  • 16. Cluster Allocation – Heterogeneous Hardware Slow HW Slow HW Fast HW Fast HW A1C1 B1 C4 D4C2 B3A1B4A4 B2A2 D2C3D3 D1 node.hardware: slow node.hardware: fast Index.routing.allocation.require.hardware: fast Node Settings Node Settings Index Settings: A & B
  • 17. Cluster Allocation – Heterogeneous Hardware Slow HW Fast HW Fast HW Fast HW A1C1 B1 C4 D4 C2 B3A1 B4 A4 B2A2 D2C3D3 D1 • Is this ok? …Sure, why not?!
  • 18. Cluster Allocation – Archive Example • We can use the same feature for large data sets of a time-based feed. Say we keep an index for all news ever. People are generally searching the most recent 12 months, not the last 30 years. Slow HW Slow HW Slow HW Slow HW Slow HWSlow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HWSlow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Slow HW Fast HW Fast HW Fast HW Fast HW Fast HW Fast HW Fast HW Fast HW