SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Query Anything - Data Engineer’s perspective
Kamil Bajda-Pawlikowski
Co-founder / CTO
@prestosql @starburstdata
Data Orchestration Summit
Nov 2019 @ Mountain View
Martin Traverso
Creator of Presto
Why Presto?
Community-driven
open source project
High performance ANSI SQL engine
• Cost-Based Query Optimizer
• Proven scalability
• High concurrency
Separation of compute
and storage
• Scale storage and compute
independently
• No ETL or data integration
necessary to get to insights
• SQL-on-anything
No vendor lock-in
• No Hadoop distro vendor lock-in
• No storage engine vendor lock-in
• No cloud vendor lock-in
Built for Performance
● MPP-style pipelined in-memory execution
● Multi-threaded multi-core execution
● Columnar and vectorized data processing
● Runtime query bytecode compilation
● Memory efficient data structures
● Optimized readers for columnar formats (ORC and Parquet)
● Predicate and column projection pushdown
● Cost-Based Optimizer
Presto: SQL-on-Anything
Deploy Anywhere, Query Anything
Example - Join multiple sources
SELECT
country,
approx_percentile(date_diff('year', birthdate, now()), array[0.25, 0.5, 0.75])
FROM
elasticsearch.default."movies: overview:space~ +fiction" movies
JOIN hive.default.views USING (movie_id)
JOIN mysql.default.users USING (user_id)
GROUP BY ROLLUP(country)
Per country age distribution of people that watched space fiction movies
Example - Join historical with recent data
CREATE VIEW visits AS
TABLE hive.visits_historical
UNION ALL
TABLE mysql.visits_recent
SELECT city, count(*) total
FROM visits
GROUP BY city
ORDER BY total DESC
Community
See more at our Wiki
Presto Software Foundation
“An independent, non-profit organization with the mission of supporting a community
of passionate users and developers devoted to the advancement of the Presto
distributed SQL query engine for big data.”
“It is dedicated to preserving the vision of high quality, performant, and dependable
software.”
“Ensuring the project remains open, collaborative and independent for decades to
come.”
Presto Community
● Github: https://github.com/prestosql
● Website: https://prestosql.io
● Blog: https://prestosql.io/blog
● Twitter: @prestosql
● Slack: https://prestosql.io/slack.html
○ #troubleshooting channel
○ #dev channel
Recent Improvements (last ~10 months)
● FETCH FIRST … WITH TIES syntax
● OFFSET syntax
● COMMENT ON <table> IS …
● [LEFT/RIGHT/FULL] JOIN LATERAL (…) ON
● IGNORE NULLS for window functions
● .* for ROW expressions
● Pass-through security (client provided
credentials)
● Impersonation for Hive Metastore
● Kerberos security improvements
● Support for Hadoop KMS
● Role-based security
● Secure query results in client API
● Current user security mode for views
● Support for Azure Data Lake
● Hive Bucketing V2
● Docker image
● Spill-to-disk improvements
● CLI output formats
● Syntax highlighting in CLI
● UUID type and functions
● format(), combinations() functions
● ORC bloom filters (non-legacy)
● Connector-provided view definitions
● Elasticsearch Connector
● Google Sheets Connector
● Amazon Kinesis Connector
● Apache Phoenix Connector
● LZ4/ZSTD support for ORC/Parquet
● More type mappings for various connectors
● Performance improvements for GCS and S3
● Performance improvements for UNNEST
… and more! https://prestosql.io/docs/current/release.htm
Starburst
© 2019
Enterprise edition
© 2019 12
Founded by Presto committers:
● Many years of contributions to Presto
● Presto distro for on-prem and cloud env
● Supporting large customers in production
● Enterprise subscription add-ons (ODBC,
Ranger, Sentry, Oracle, Teradata, K8S)
Notable features contributed:
● ANSI SQL syntax enhancements
● Execution engine improvements
● Security integrations
● Spill to disk
● Cost-Based Optimizer
https://www.starburstdata.com/presto-enterprise/
Starburst: SQL on Anything, Anywhere
Data Orchestration with caching, even with remote data
A dozen more
orchestrated cloud data
sources
Available Soon: Starburst Presto + Alluxio on
▪ AWS AMI pre-configured to speed up
Presto queries using Alluxio caching
▪ Start in minutes: AWS CloudFormation
Template to create a Presto Alluxio
cluster
▪ Seamless Hive Metastore / AWS Glue
integration, no location / path changes
needed
▪ Tutorial:
https://www.alluxio.io/products/aws/s
tarburst-alluxio-cft-tutorial/
+
Administrative challenges
● Configuring and managing clusters
● Autotuning properties based on the hardware provisioned
● High Availability for Presto Coordinator
● Scaling cluster elastically based on query load
● Gracefully decommissioning Presto Workers to avoid killing queries
● Monitoring of hardware and software layers
https://www.starburstdata.com/technical-blog/presto-on-kubernetes/
https://docs.starburstdata.com/latest/kubernetes.html
Presto on Kubernetes (K8S)
Presto Worker
Pod
Presto Worker
Pod
16
Presto Coordinator
Pod
Presto Worker
Pod
Horizontal Pod
Autoscaler (HPA)
Presto Operator
K8s Operator
Presto
Service
Hive Metastore Service
Pod
Hadoop / Hive
RDBMS
● RedHat OpenShift
● Google (GKE)
● Azure (AKS)
● Amazon (EKS)
https://www.starburstdata.com/2019-nyc-presto-summit/
Thank You!
18
Twitter: @starburstdata @prestosql
Blog: www.starburstdata.com/technical-blog/
Newsletter: www.starburstdata.com/newsletter
© 2019

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Using Alluxio as a Fault Tolerant Pluggable Optimization Component to Compute...
Using Alluxio as a Fault Tolerant Pluggable Optimization Component to Compute...Using Alluxio as a Fault Tolerant Pluggable Optimization Component to Compute...
Using Alluxio as a Fault Tolerant Pluggable Optimization Component to Compute...
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Alluxio Innovations for Structured Data
Alluxio Innovations for Structured DataAlluxio Innovations for Structured Data
Alluxio Innovations for Structured Data
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data StoresPresto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
 
Exploring Alluxio for Daily Tasks at Robinhood
Exploring Alluxio for Daily Tasks at RobinhoodExploring Alluxio for Daily Tasks at Robinhood
Exploring Alluxio for Daily Tasks at Robinhood
 
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak DataClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
ClickHouse on Plug-n-Play Cloud, by Som Sikdar, Kodiak Data
 
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + FluidSpeeding Up Atlas Deep Learning Platform with Alluxio + Fluid
Speeding Up Atlas Deep Learning Platform with Alluxio + Fluid
 
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
Enterprise Distributed Query Service powered by Presto & Alluxio across cloud...
 
Building Cloud Native Analytical Pipelines on AWS
Building Cloud Native Analytical Pipelines on AWS Building Cloud Native Analytical Pipelines on AWS
Building Cloud Native Analytical Pipelines on AWS
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubole
 
Orchestrate a Data Symphony
Orchestrate a Data SymphonyOrchestrate a Data Symphony
Orchestrate a Data Symphony
 
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
The Practice of Presto & Alluxio in E-Commerce Big Data PlatformThe Practice of Presto & Alluxio in E-Commerce Big Data Platform
The Practice of Presto & Alluxio in E-Commerce Big Data Platform
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
How to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data PlatformsHow to Develop and Operate Cloud First Data Platforms
How to Develop and Operate Cloud First Data Platforms
 
Presto on Alluxio Hands-On Lab
Presto on Alluxio Hands-On LabPresto on Alluxio Hands-On Lab
Presto on Alluxio Hands-On Lab
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 

Ähnlich wie Presto: Query Anything - Data Engineer’s perspective

Ähnlich wie Presto: Query Anything - Data Engineer’s perspective (20)

Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Query Anything, Anywhere with Kubernetes
Query Anything, Anywhere with KubernetesQuery Anything, Anywhere with Kubernetes
Query Anything, Anywhere with Kubernetes
 
Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020Presto @ Zalando - Big Data Tech Warsaw 2020
Presto @ Zalando - Big Data Tech Warsaw 2020
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Microservices with kubernetes @190316
Microservices with kubernetes @190316Microservices with kubernetes @190316
Microservices with kubernetes @190316
 
SpringBoot and Spring Cloud Service for MSA
SpringBoot and Spring Cloud Service for MSASpringBoot and Spring Cloud Service for MSA
SpringBoot and Spring Cloud Service for MSA
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuSpring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
Stargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data APIStargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data API
 
Best Practices with Sitecore
Best Practices with SitecoreBest Practices with Sitecore
Best Practices with Sitecore
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 

Mehr von Alluxio, Inc.

Mehr von Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Kürzlich hochgeladen

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Kürzlich hochgeladen (20)

Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 

Presto: Query Anything - Data Engineer’s perspective

  • 1. Query Anything - Data Engineer’s perspective Kamil Bajda-Pawlikowski Co-founder / CTO @prestosql @starburstdata Data Orchestration Summit Nov 2019 @ Mountain View Martin Traverso Creator of Presto
  • 2. Why Presto? Community-driven open source project High performance ANSI SQL engine • Cost-Based Query Optimizer • Proven scalability • High concurrency Separation of compute and storage • Scale storage and compute independently • No ETL or data integration necessary to get to insights • SQL-on-anything No vendor lock-in • No Hadoop distro vendor lock-in • No storage engine vendor lock-in • No cloud vendor lock-in
  • 3. Built for Performance ● MPP-style pipelined in-memory execution ● Multi-threaded multi-core execution ● Columnar and vectorized data processing ● Runtime query bytecode compilation ● Memory efficient data structures ● Optimized readers for columnar formats (ORC and Parquet) ● Predicate and column projection pushdown ● Cost-Based Optimizer
  • 5. Example - Join multiple sources SELECT country, approx_percentile(date_diff('year', birthdate, now()), array[0.25, 0.5, 0.75]) FROM elasticsearch.default."movies: overview:space~ +fiction" movies JOIN hive.default.views USING (movie_id) JOIN mysql.default.users USING (user_id) GROUP BY ROLLUP(country) Per country age distribution of people that watched space fiction movies
  • 6. Example - Join historical with recent data CREATE VIEW visits AS TABLE hive.visits_historical UNION ALL TABLE mysql.visits_recent SELECT city, count(*) total FROM visits GROUP BY city ORDER BY total DESC
  • 8. Presto Software Foundation “An independent, non-profit organization with the mission of supporting a community of passionate users and developers devoted to the advancement of the Presto distributed SQL query engine for big data.” “It is dedicated to preserving the vision of high quality, performant, and dependable software.” “Ensuring the project remains open, collaborative and independent for decades to come.”
  • 9. Presto Community ● Github: https://github.com/prestosql ● Website: https://prestosql.io ● Blog: https://prestosql.io/blog ● Twitter: @prestosql ● Slack: https://prestosql.io/slack.html ○ #troubleshooting channel ○ #dev channel
  • 10. Recent Improvements (last ~10 months) ● FETCH FIRST … WITH TIES syntax ● OFFSET syntax ● COMMENT ON <table> IS … ● [LEFT/RIGHT/FULL] JOIN LATERAL (…) ON ● IGNORE NULLS for window functions ● .* for ROW expressions ● Pass-through security (client provided credentials) ● Impersonation for Hive Metastore ● Kerberos security improvements ● Support for Hadoop KMS ● Role-based security ● Secure query results in client API ● Current user security mode for views ● Support for Azure Data Lake ● Hive Bucketing V2 ● Docker image ● Spill-to-disk improvements ● CLI output formats ● Syntax highlighting in CLI ● UUID type and functions ● format(), combinations() functions ● ORC bloom filters (non-legacy) ● Connector-provided view definitions ● Elasticsearch Connector ● Google Sheets Connector ● Amazon Kinesis Connector ● Apache Phoenix Connector ● LZ4/ZSTD support for ORC/Parquet ● More type mappings for various connectors ● Performance improvements for GCS and S3 ● Performance improvements for UNNEST … and more! https://prestosql.io/docs/current/release.htm
  • 12. Enterprise edition © 2019 12 Founded by Presto committers: ● Many years of contributions to Presto ● Presto distro for on-prem and cloud env ● Supporting large customers in production ● Enterprise subscription add-ons (ODBC, Ranger, Sentry, Oracle, Teradata, K8S) Notable features contributed: ● ANSI SQL syntax enhancements ● Execution engine improvements ● Security integrations ● Spill to disk ● Cost-Based Optimizer https://www.starburstdata.com/presto-enterprise/
  • 13. Starburst: SQL on Anything, Anywhere Data Orchestration with caching, even with remote data A dozen more orchestrated cloud data sources
  • 14. Available Soon: Starburst Presto + Alluxio on ▪ AWS AMI pre-configured to speed up Presto queries using Alluxio caching ▪ Start in minutes: AWS CloudFormation Template to create a Presto Alluxio cluster ▪ Seamless Hive Metastore / AWS Glue integration, no location / path changes needed ▪ Tutorial: https://www.alluxio.io/products/aws/s tarburst-alluxio-cft-tutorial/ +
  • 15. Administrative challenges ● Configuring and managing clusters ● Autotuning properties based on the hardware provisioned ● High Availability for Presto Coordinator ● Scaling cluster elastically based on query load ● Gracefully decommissioning Presto Workers to avoid killing queries ● Monitoring of hardware and software layers https://www.starburstdata.com/technical-blog/presto-on-kubernetes/
  • 16. https://docs.starburstdata.com/latest/kubernetes.html Presto on Kubernetes (K8S) Presto Worker Pod Presto Worker Pod 16 Presto Coordinator Pod Presto Worker Pod Horizontal Pod Autoscaler (HPA) Presto Operator K8s Operator Presto Service Hive Metastore Service Pod Hadoop / Hive RDBMS ● RedHat OpenShift ● Google (GKE) ● Azure (AKS) ● Amazon (EKS)
  • 18. Thank You! 18 Twitter: @starburstdata @prestosql Blog: www.starburstdata.com/technical-blog/ Newsletter: www.starburstdata.com/newsletter © 2019