SlideShare a Scribd company logo
1 of 27
Download to read offline
Confidential Use Only – Do Not Share
David Phillips
Software Engineer
Facebook
Presto: Fast SQL on Everything
What is Presto?
• Open source distributed SQL query engine
• ANSI SQL compliant
• Originally developed by Facebook
• Used in production at many well known companies
Commercial Offerings
Notable Characteristics
• Adaptive multi-tenant system
• Run hundreds of concurrent queries on thousands of nodes
• Extensible, federated design
• Plugins provide connectors, functions, types, security
• Flexible design supports many different use cases
• High performance
• Many optimizations, code generation, long-lived JVM
Use Cases at Facebook
Interactive Analytics
• Facebook has a massive multi-tenant data warehouse
• Employees need to quickly analyze small data (~50GB-3TB)
• Visualizations, dashboards, notebooks, BI tools
• Clusters run 50-100 concurrent queries w/ diverse shapes
• Queries usually execute in seconds or minutes
• Users are latency sensitive
• Fast improves productivity, slow blocks their work
Batch ETL
• Populate and process data in the warehouse
• Jobs are scheduled using a workflow management system
• Similar to Azkaban or Airflow
• Manages dependencies between jobs
• Queries are typically written by data engineers
• More expensive in CPU and data volume than Interactive
• Throughput and efficiency more important than latency
A/B Testing
• Evaluate product changes via statistical hypothesis testing
• Results need to be available in hours (not days)
• Data must be complete and accurate
• Arbitrary slice and dice at interactive latency (~5 -30s)
• Cannot pre-aggregate data, must compute results on the fly
• Producing results requires joining multiple large data sets
• Web interface generates restricted query shapes
App Analytics
• External-user facing custom reporting tools
• Facebook Analytics offers analytics to application developers
• Web interface generates small set of query shapes
• Highly selective queries over large aggregate data volumes
• Application developers can only access their own data
• Very strict latency requirements (~100ms-5s)
• Highly available, hundreds of concurrent queries
System Design
Worker
Data Source APIProcessor
Worker
Coordinator
Planner/Optimizer Scheduler
Metadata API Data Location API
Queue
Processor
Query
Results Data Source APIProcessor
Worker
External
Storage
System
Presto
Architecture
Predicate Pushdown
• Engine provides connectors with a two part constraint:
1. Domain of values: ranges and nullability
2. “Black box” predicate for filtering
• Connectors report the domain they can guarantee
• Engine can elide redundant filtering
• Optimizer can make further use of this information
Data Layouts
• Optimizer takes advantage of physical layout of data
• Properties: partitioning, sorting, grouping, indexes
• Tables can have multiple layouts with different properties
• Layouts can have a subset of columns or data
• Optimizer chooses best layout for query
• Tune queries by adding new physical layouts
LeftJoin
LocalShuffle
Stage 2
Stage 4
partitioned-shuffle
Hash
Filter
Scan
Hash
Scan
AggregateFinal
Hash
Stage 0
Output
Stage 1
Stage 3
collecting-shuffle
partitioned-shuffle partitioned-shuffle
AggregatePartial
Stage 0
LeftJoin
LocalShuffle
Stage 1collecting-shuffle
Hash
Scan
Aggregate
Output
Hash
Filter
Scan
Optimized plan using
data layout properties
Original plan
without any
data layout
properties
Pre-computing Hashes
• Computing hashes can be expensive
• Especially for strings or complex types
• Push computation to the lowest level of the plan tree
• Re-use for aggregations, joins, local or remote shuffles
Intra-node Parallelism
• Use multiple threads on a single node
• More efficient than parallelism across nodes
• Little latency overhead
• Efficiently share state (e.g., hash tables) between threads
• Needed due to skew or table transforms
LookupJoin
HashBuild
LocalShuffle
ScanHashScanFilterHash
HashBuild
Pipeline 0
Pipeline 1
Pipeline 2
Stage 0
Task 0
Stage 1
Task 0 Task 1
Task 3..n
Task 2
HashAggregate
ScanHash
Physical Execution Plan
Pipeline 1 is parallelized
across multiple threads
Stage Scheduling
• Two scheduling policies:
1. All-at-once: minimize latency
2. Phased: minimize resource usage
Split Scheduling
• Splits are enumerated as the query executes, not up front
• For Hive, both partition metadata and discovering files
• Start executing immediately
• Queries often finish early (LIMIT or interactive)
• Reduces metadata memory usage on coordinator
• Splits are assigned to worker with shortest queue
Operating on Compressed Data
• Process dictionaries directly instead of values
• Shared dictionaries can be larger than rows
• Use heuristics to determine if speculation is working
• Hash table creation takes advantage of dictionaries
• Joins can produce dictionary encoded data
Page Layout in Memory
Page 0
partkey returnflag shipinstruct
52470
50600
18866
72387
7429
44077
148102
101228
"F" x 8
0: "IN PERSON"
1: "COD"
2: "RETURN"
3: "NONE"
LongBlock RLEBlock DictionaryBlock
Indices
1
0
1
2
0
2
2
1
Dictionary
Page 1
partkey returnflag
164648
35173
139350
40227
87261
184817
153099
"O" x 7
LongBlock RLEBlock DictionaryBlock
Indices2
2
2
0
1
3
2
Dictionary
shipinstruct
Writer Scaling
• Write performance dominated by concurrency
• Too few writers causes the query to be slow
• Too many writers creates small files
• Expensive to read later (metadata, IO, latency)
• Inefficient for storage system
• Add writers as needed when producer buffers are full, as
long as data written exceeds a configured threshold
Code Generation
• SQL → JVM bytecode → machine code
• Filter, project, sort comparators, aggregations
• Auto-vectorization, branch prediction, register use
• Eliminate virtual calls and allow inlining
• Profile each task independently based on data processed
• Avoid profile pollution across tasks and queries
• Profile can change during execution as data changes
CPU Time Improvements for Bytecode Generation
0
1000
2000
3000
4000
5000
6000
7000
Baseline 1 Transform 2 Transforms 3 Transforms
AvgCPUTime(seconds)
Generated Naïve
Fault Tolerance
• Node crash causes query failure
• In practice, failures are rare, even on large clusters
• Checkpointing or other recovery mechanisms have a cost
• Re-run failures rather than making everything expensive
• Limit runtime to a few hours to reduce waste and latency
• Clients retry on failure
Presto: Fast SQL on Everything

More Related Content

What's hot

Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeBizTalk360
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeDatabricks
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Databricks
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise WideDatabricks
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scaledatamantra
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeTom Kerkhove
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics SingleStore
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudDatabricks
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsSingleStore
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Building a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public HealthBuilding a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public HealthDatabricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...Databricks
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FutureDataWorks Summit
 
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Fwdays
 
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...HostedbyConfluent
 

What's hot (20)

Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta Lake
 
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Converging Database Transactions and Analytics
Converging Database Transactions and Analytics Converging Database Transactions and Analytics
Converging Database Transactions and Analytics
 
Northwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to CloudNorthwestern Mutual Journey – Transform BI Space to Cloud
Northwestern Mutual Journey – Transform BI Space to Cloud
 
How Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and AnalyticsHow Kafka and Modern Databases Benefit Apps and Analytics
How Kafka and Modern Databases Benefit Apps and Analytics
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Building a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public HealthBuilding a Federated Data Directory Platform for Public Health
Building a Federated Data Directory Platform for Public Health
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
 
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
 
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
Sub-Second SQL Search, Aggregations and Joins with Kafka and Rockset | Dhruba...
 

Similar to Presto: Fast SQL on Everything

Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersLucidworks
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Brian Culver
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionNguyen Tung
 
Open Source SQL Databases
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL DatabasesEmanuel Calvo
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Anubhav Kale
 
Capacity planning for your data stores
Capacity planning for your data storesCapacity planning for your data stores
Capacity planning for your data storesColin Charles
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Petter Skodvin-Hvammen
 
Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Remy Rosenbaum
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017Roy Russo
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2GokulD
 
Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017Casey Kinsey
 
Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Wen-Tien Chang
 
World-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftWorld-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftLars Kamp
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2Thomas Segismont
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 

Similar to Presto: Fast SQL on Everything (20)

Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
(ATS6-PLAT02) Accelrys Catalog and Protocol Validation
 
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, LucidworksngineersSQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
SQL Analytics for Search Engineers - Timothy Potter, Lucidworksngineers
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
Architecture Patterns - Open Discussion
Architecture Patterns - Open DiscussionArchitecture Patterns - Open Discussion
Architecture Patterns - Open Discussion
 
Open Source SQL Databases
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL Databases
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
Capacity planning for your data stores
Capacity planning for your data storesCapacity planning for your data stores
Capacity planning for your data stores
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)Jethro for tableau webinar (11 15)
Jethro for tableau webinar (11 15)
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017Data Pipelines with Python - NWA TechFest 2017
Data Pipelines with Python - NWA TechFest 2017
 
Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3
 
World-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftWorld-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon Redshift
 
What's new in JBoss ON 3.2
What's new in JBoss ON 3.2What's new in JBoss ON 3.2
What's new in JBoss ON 3.2
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 

Recently uploaded

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Presto: Fast SQL on Everything

  • 1. Confidential Use Only – Do Not Share David Phillips Software Engineer Facebook Presto: Fast SQL on Everything
  • 2. What is Presto? • Open source distributed SQL query engine • ANSI SQL compliant • Originally developed by Facebook • Used in production at many well known companies
  • 3.
  • 5. Notable Characteristics • Adaptive multi-tenant system • Run hundreds of concurrent queries on thousands of nodes • Extensible, federated design • Plugins provide connectors, functions, types, security • Flexible design supports many different use cases • High performance • Many optimizations, code generation, long-lived JVM
  • 6. Use Cases at Facebook
  • 7. Interactive Analytics • Facebook has a massive multi-tenant data warehouse • Employees need to quickly analyze small data (~50GB-3TB) • Visualizations, dashboards, notebooks, BI tools • Clusters run 50-100 concurrent queries w/ diverse shapes • Queries usually execute in seconds or minutes • Users are latency sensitive • Fast improves productivity, slow blocks their work
  • 8. Batch ETL • Populate and process data in the warehouse • Jobs are scheduled using a workflow management system • Similar to Azkaban or Airflow • Manages dependencies between jobs • Queries are typically written by data engineers • More expensive in CPU and data volume than Interactive • Throughput and efficiency more important than latency
  • 9. A/B Testing • Evaluate product changes via statistical hypothesis testing • Results need to be available in hours (not days) • Data must be complete and accurate • Arbitrary slice and dice at interactive latency (~5 -30s) • Cannot pre-aggregate data, must compute results on the fly • Producing results requires joining multiple large data sets • Web interface generates restricted query shapes
  • 10. App Analytics • External-user facing custom reporting tools • Facebook Analytics offers analytics to application developers • Web interface generates small set of query shapes • Highly selective queries over large aggregate data volumes • Application developers can only access their own data • Very strict latency requirements (~100ms-5s) • Highly available, hundreds of concurrent queries
  • 12. Worker Data Source APIProcessor Worker Coordinator Planner/Optimizer Scheduler Metadata API Data Location API Queue Processor Query Results Data Source APIProcessor Worker External Storage System Presto Architecture
  • 13. Predicate Pushdown • Engine provides connectors with a two part constraint: 1. Domain of values: ranges and nullability 2. “Black box” predicate for filtering • Connectors report the domain they can guarantee • Engine can elide redundant filtering • Optimizer can make further use of this information
  • 14. Data Layouts • Optimizer takes advantage of physical layout of data • Properties: partitioning, sorting, grouping, indexes • Tables can have multiple layouts with different properties • Layouts can have a subset of columns or data • Optimizer chooses best layout for query • Tune queries by adding new physical layouts
  • 15. LeftJoin LocalShuffle Stage 2 Stage 4 partitioned-shuffle Hash Filter Scan Hash Scan AggregateFinal Hash Stage 0 Output Stage 1 Stage 3 collecting-shuffle partitioned-shuffle partitioned-shuffle AggregatePartial Stage 0 LeftJoin LocalShuffle Stage 1collecting-shuffle Hash Scan Aggregate Output Hash Filter Scan Optimized plan using data layout properties Original plan without any data layout properties
  • 16. Pre-computing Hashes • Computing hashes can be expensive • Especially for strings or complex types • Push computation to the lowest level of the plan tree • Re-use for aggregations, joins, local or remote shuffles
  • 17. Intra-node Parallelism • Use multiple threads on a single node • More efficient than parallelism across nodes • Little latency overhead • Efficiently share state (e.g., hash tables) between threads • Needed due to skew or table transforms
  • 18. LookupJoin HashBuild LocalShuffle ScanHashScanFilterHash HashBuild Pipeline 0 Pipeline 1 Pipeline 2 Stage 0 Task 0 Stage 1 Task 0 Task 1 Task 3..n Task 2 HashAggregate ScanHash Physical Execution Plan Pipeline 1 is parallelized across multiple threads
  • 19. Stage Scheduling • Two scheduling policies: 1. All-at-once: minimize latency 2. Phased: minimize resource usage
  • 20. Split Scheduling • Splits are enumerated as the query executes, not up front • For Hive, both partition metadata and discovering files • Start executing immediately • Queries often finish early (LIMIT or interactive) • Reduces metadata memory usage on coordinator • Splits are assigned to worker with shortest queue
  • 21. Operating on Compressed Data • Process dictionaries directly instead of values • Shared dictionaries can be larger than rows • Use heuristics to determine if speculation is working • Hash table creation takes advantage of dictionaries • Joins can produce dictionary encoded data
  • 22. Page Layout in Memory Page 0 partkey returnflag shipinstruct 52470 50600 18866 72387 7429 44077 148102 101228 "F" x 8 0: "IN PERSON" 1: "COD" 2: "RETURN" 3: "NONE" LongBlock RLEBlock DictionaryBlock Indices 1 0 1 2 0 2 2 1 Dictionary Page 1 partkey returnflag 164648 35173 139350 40227 87261 184817 153099 "O" x 7 LongBlock RLEBlock DictionaryBlock Indices2 2 2 0 1 3 2 Dictionary shipinstruct
  • 23. Writer Scaling • Write performance dominated by concurrency • Too few writers causes the query to be slow • Too many writers creates small files • Expensive to read later (metadata, IO, latency) • Inefficient for storage system • Add writers as needed when producer buffers are full, as long as data written exceeds a configured threshold
  • 24. Code Generation • SQL → JVM bytecode → machine code • Filter, project, sort comparators, aggregations • Auto-vectorization, branch prediction, register use • Eliminate virtual calls and allow inlining • Profile each task independently based on data processed • Avoid profile pollution across tasks and queries • Profile can change during execution as data changes
  • 25. CPU Time Improvements for Bytecode Generation 0 1000 2000 3000 4000 5000 6000 7000 Baseline 1 Transform 2 Transforms 3 Transforms AvgCPUTime(seconds) Generated Naïve
  • 26. Fault Tolerance • Node crash causes query failure • In practice, failures are rare, even on large clusters • Checkpointing or other recovery mechanisms have a cost • Re-run failures rather than making everything expensive • Limit runtime to a few hours to reduce waste and latency • Clients retry on failure