SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
​Presto SQL Engine: what’s new?
​Strata Hadoop 2016 San Jose, CA
2
What is Presto?
100% open source distributed SQL query engine
Originally developed by Facebook
Key Differentiators:
Performance & Scale
Cross platform query capability, not only SQL on Hadoop
Apache licensed, hosted on GitHub
Certified distro & support from Teradata
3
Brief history of Presto
FALL 2012
6 developers
start Presto
development
FALL 2014
88 Releases
41 Contributors
3943 Commits
SPRING 2016
141 Releases
116 Contributors
6879 Commits
SPRING 2013
Presto rolled out
within Facebook
FALL 2013
Facebook open
sources Presto
FALL 2008
Facebook open
sources Hive
4
• Facebook
– Multiple production clusters (100s of nodes total)
- Massive 300PB Hadoop data warehouse
- Very large sharded MySQL installation
- Growing usage of Raptor SSD-based storage
– 1000s of internal daily active users
– 10-100s of concurrent queries
• Netflix
– Over 200-node production cluster on EC2
– Over 25 PB in S3 (Parquet format)
– Over 350 active users and 3K queries daily
Presto in Production
5
Presto Architecture
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer
Planner Scheduler
Worker
Client
Data location
API
Pluggable
6
Presto Extensibility – connectors
Parser/
analyzer
Planner
Worker
Data location API
HDFS/S3
NoSQL
DBMS
Custom
…
Metadata API
HDFS/S3
NoSQL
DBMS
Custom
…
Data stream API
HDFS/S3
NoSQL
DBMS
Custom
…
Scheduler
Coordinator
7
• Hadoop/Hive connector & file formats:
– HDFS & S3 + HCatalog
– ORC, RCFile, Parquet, SequenceFile, Text
• Open source data stores:
– MySQL & PostgreSQL (non-parallel)
– Cassandra
– Kafka
– Redis
• In development by community:
– MongoDB
– ElasticSearch
– HBase
Supported data sources & file formats
8
• In-memory processing
• Pipelined execution across nodes MPP-style
• Vectorized columnar processing
• Multithreaded execution keeps all CPU cores busy
• Presto is written in highly tuned Java
– Efficient flat-memory data structures (minimizes GC)
– Very careful coding of inner loops
– Runtime bytecode generation
• Optimized ORC & Parquet readers
• Excellent performance with interactive SQL analytics
Presto – Query Execution Performance
9
[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expr [, ...]
[ FROM table1 [[ INNER | OUTER ] JOIN table2 ON (…)]
[ WHERE condition ]
[ GROUP BY expression [, ...] ]
[ HAVING condition]
[ UNION [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count | ALL ] ]
In addition:
• Windowing functions
• Statistical and approximate aggregate functions
• UNNEST, TABLESAMPLE
In development:
• Complex subqueries
• EXISTS, INTERSECT, EXCEPT
• ROLLUP, CUBE
ANSI SQL Support
10
• Cluster deployment models for Presto:
– on premise (appliance or commodity clusters)
– VM (OpenStack, etc.)
– cloud (Amazon, etc)
• Types of Hadoop deployments:
– on Hadoop/YARN cluster (all or subset of nodes)
– on a dedicated cluster
– mixed
Deployment models
11
Open source initiative
• Announced in June 2015 at Hadoop Summit
– Growing interest and adoption
• Collaboration with Facebook and Presto community
– Joint development, conference talks, meetups and webinars
• Major commitment from Teradata Labs:
– 20 full-time engineers
– Free and open source contributions
– Enterprise-ready distribution
"A special shout out goes to Teradata — which joined the Presto community this year
with a focus on enhancing enterprise features and providing support — for having
seven of our top 10 external contributors."
— Facebook
12
Implement Integrate Proliferate
• Installer
• Documentation
• Monitoring & Support
Tools
• Management Tool
Integration
• YARN Integration
• ODBC Driver
• JDBC Driver
• BI Certification
• Security
• Cloud features
Commercial Support
Phase 1 Phase 2 Phase 3
June 8, 2015 Q4 2015 2016
Expanding ANSI SQL Coverage
Teradata Contributions to Presto
13
Recent developments and roadmap
• Q1 release:
– Fully-featured ODBC & JDBC drivers
– Kerberos support
– DECIMAL support
• Later 2016:
– BI tools certification
– TPC-H and TPC-DS unmodified
– Spill to disk
14
BI Tools certifications
15
Presto
Connectors
Teradata Certified Community Supported
Teradata QueryGrid™ - Multi-System Analytics
Targets
Entry Points
TERADATA
DATABASE
ASTER
ANALYTICS
PRESTO
HADOOP
HIVE /
HDFS
HADOOP
OTHER
DATABASE
S
NOSQL
DATABASE
S
TERADATA
DATABASE
ASTER
ANALYTICS
PRESTO
HADOOP
Non-Relational DBsMulti-Genre
Advanced Analytics™
Integrated Data
Warehouses
3rd Party Relational
DBs
Multiple Hadoop SQL Query
Engines and Distributions
APACHE
KAFKA
APACHE
CASSANDRA
MYSQL POSTGRESQL PRESTO APIREDIS
16
Certified Distro: www.teradata.com/presto
Website: www.prestodb.io
Presto Users Group: www.groups.google.com/group/presto-users
GitHub:
www.github.com/prestodb/presto
www.github.com/Teradata/presto
www.github.com/prestodb
More information
17
www.teradata.com/presto

Weitere ähnliche Inhalte

Was ist angesagt?

Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
DataWorks Summit
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
Taro L. Saito
 

Was ist angesagt? (20)

Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
Presto - SQL on anything
Presto  - SQL on anythingPresto  - SQL on anything
Presto - SQL on anything
 
Presto
PrestoPresto
Presto
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Presto updates to 0.178
Presto updates to 0.178Presto updates to 0.178
Presto updates to 0.178
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15
 
Presto in my_use_case
Presto in my_use_casePresto in my_use_case
Presto in my_use_case
 
20140120 presto meetup_en
20140120 presto meetup_en20140120 presto meetup_en
20140120 presto meetup_en
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
Presto@Uber
Presto@UberPresto@Uber
Presto@Uber
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 

Ähnlich wie Presto Strata Hadoop SJ 2016 short talk

WSO2 Quarterly Technical Update
WSO2 Quarterly Technical UpdateWSO2 Quarterly Technical Update
WSO2 Quarterly Technical Update
WSO2
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
DataWorks Summit
 

Ähnlich wie Presto Strata Hadoop SJ 2016 short talk (20)

SQL Server 2019 CTP2.4
SQL Server 2019 CTP2.4SQL Server 2019 CTP2.4
SQL Server 2019 CTP2.4
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
How SQL Server 2016 SP1 Changes the Game
How SQL Server 2016 SP1 Changes the GameHow SQL Server 2016 SP1 Changes the Game
How SQL Server 2016 SP1 Changes the Game
 
Apache drill
Apache drillApache drill
Apache drill
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
 
UNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business Intelligence
UNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business IntelligenceUNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business Intelligence
UNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business Intelligence
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
WSO2 Quarterly Technical Update
WSO2 Quarterly Technical UpdateWSO2 Quarterly Technical Update
WSO2 Quarterly Technical Update
 
Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?Open Source SQL for Hadoop: Where are we and Where are we Going?
Open Source SQL for Hadoop: Where are we and Where are we Going?
 

Mehr von kbajda

Mehr von kbajda (11)

Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubole
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
 
Presto Summit 2018 - 06 - Facebook Geospatial
Presto Summit 2018 - 06 - Facebook GeospatialPresto Summit 2018 - 06 - Facebook Geospatial
Presto Summit 2018 - 06 - Facebook Geospatial
 
Presto Summit 2018 - 05 - Uber Elasticsearch
Presto Summit 2018 - 05 - Uber ElasticsearchPresto Summit 2018 - 05 - Uber Elasticsearch
Presto Summit 2018 - 05 - Uber Elasticsearch
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 
Presto Summit 2018 - 02 - LinkedIn
Presto Summit 2018  - 02 - LinkedInPresto Summit 2018  - 02 - LinkedIn
Presto Summit 2018 - 02 - LinkedIn
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 

Presto Strata Hadoop SJ 2016 short talk

  • 1. ​Presto SQL Engine: what’s new? ​Strata Hadoop 2016 San Jose, CA
  • 2. 2 What is Presto? 100% open source distributed SQL query engine Originally developed by Facebook Key Differentiators: Performance & Scale Cross platform query capability, not only SQL on Hadoop Apache licensed, hosted on GitHub Certified distro & support from Teradata
  • 3. 3 Brief history of Presto FALL 2012 6 developers start Presto development FALL 2014 88 Releases 41 Contributors 3943 Commits SPRING 2016 141 Releases 116 Contributors 6879 Commits SPRING 2013 Presto rolled out within Facebook FALL 2013 Facebook open sources Presto FALL 2008 Facebook open sources Hive
  • 4. 4 • Facebook – Multiple production clusters (100s of nodes total) - Massive 300PB Hadoop data warehouse - Very large sharded MySQL installation - Growing usage of Raptor SSD-based storage – 1000s of internal daily active users – 10-100s of concurrent queries • Netflix – Over 200-node production cluster on EC2 – Over 25 PB in S3 (Parquet format) – Over 350 active users and 3K queries daily Presto in Production
  • 5. 5 Presto Architecture Data stream API Worker Data stream API Worker Coordinator Metadata API Parser/ analyzer Planner Scheduler Worker Client Data location API Pluggable
  • 6. 6 Presto Extensibility – connectors Parser/ analyzer Planner Worker Data location API HDFS/S3 NoSQL DBMS Custom … Metadata API HDFS/S3 NoSQL DBMS Custom … Data stream API HDFS/S3 NoSQL DBMS Custom … Scheduler Coordinator
  • 7. 7 • Hadoop/Hive connector & file formats: – HDFS & S3 + HCatalog – ORC, RCFile, Parquet, SequenceFile, Text • Open source data stores: – MySQL & PostgreSQL (non-parallel) – Cassandra – Kafka – Redis • In development by community: – MongoDB – ElasticSearch – HBase Supported data sources & file formats
  • 8. 8 • In-memory processing • Pipelined execution across nodes MPP-style • Vectorized columnar processing • Multithreaded execution keeps all CPU cores busy • Presto is written in highly tuned Java – Efficient flat-memory data structures (minimizes GC) – Very careful coding of inner loops – Runtime bytecode generation • Optimized ORC & Parquet readers • Excellent performance with interactive SQL analytics Presto – Query Execution Performance
  • 9. 9 [ WITH with_query [, ...] ] SELECT [ ALL | DISTINCT ] select_expr [, ...] [ FROM table1 [[ INNER | OUTER ] JOIN table2 ON (…)] [ WHERE condition ] [ GROUP BY expression [, ...] ] [ HAVING condition] [ UNION [ ALL | DISTINCT ] select ] [ ORDER BY expression [ ASC | DESC ] [, ...] ] [ LIMIT [ count | ALL ] ] In addition: • Windowing functions • Statistical and approximate aggregate functions • UNNEST, TABLESAMPLE In development: • Complex subqueries • EXISTS, INTERSECT, EXCEPT • ROLLUP, CUBE ANSI SQL Support
  • 10. 10 • Cluster deployment models for Presto: – on premise (appliance or commodity clusters) – VM (OpenStack, etc.) – cloud (Amazon, etc) • Types of Hadoop deployments: – on Hadoop/YARN cluster (all or subset of nodes) – on a dedicated cluster – mixed Deployment models
  • 11. 11 Open source initiative • Announced in June 2015 at Hadoop Summit – Growing interest and adoption • Collaboration with Facebook and Presto community – Joint development, conference talks, meetups and webinars • Major commitment from Teradata Labs: – 20 full-time engineers – Free and open source contributions – Enterprise-ready distribution "A special shout out goes to Teradata — which joined the Presto community this year with a focus on enhancing enterprise features and providing support — for having seven of our top 10 external contributors." — Facebook
  • 12. 12 Implement Integrate Proliferate • Installer • Documentation • Monitoring & Support Tools • Management Tool Integration • YARN Integration • ODBC Driver • JDBC Driver • BI Certification • Security • Cloud features Commercial Support Phase 1 Phase 2 Phase 3 June 8, 2015 Q4 2015 2016 Expanding ANSI SQL Coverage Teradata Contributions to Presto
  • 13. 13 Recent developments and roadmap • Q1 release: – Fully-featured ODBC & JDBC drivers – Kerberos support – DECIMAL support • Later 2016: – BI tools certification – TPC-H and TPC-DS unmodified – Spill to disk
  • 15. 15 Presto Connectors Teradata Certified Community Supported Teradata QueryGrid™ - Multi-System Analytics Targets Entry Points TERADATA DATABASE ASTER ANALYTICS PRESTO HADOOP HIVE / HDFS HADOOP OTHER DATABASE S NOSQL DATABASE S TERADATA DATABASE ASTER ANALYTICS PRESTO HADOOP Non-Relational DBsMulti-Genre Advanced Analytics™ Integrated Data Warehouses 3rd Party Relational DBs Multiple Hadoop SQL Query Engines and Distributions APACHE KAFKA APACHE CASSANDRA MYSQL POSTGRESQL PRESTO APIREDIS
  • 16. 16 Certified Distro: www.teradata.com/presto Website: www.prestodb.io Presto Users Group: www.groups.google.com/group/presto-users GitHub: www.github.com/prestodb/presto www.github.com/Teradata/presto www.github.com/prestodb More information