Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch and More!)

Presto: Fast SQL-on-Anything
including Delta Lake, Snowflake, Elasticsearch and more!
Kamil Bajda-Pawlikowski
Co-founder/CTO @ Starburst

Agenda
▪ Presto & Starburst
▪ Delta Lake Integration
▪ Data Platform Architecture
▪ Use Cases

What is Presto?
High performance MPP SQL
engine
•Interactive ANSI SQL queries
•Proven scalability
•High concurrency
Separation of compute & storage
•Scale storage & compute independently
•SQL-on-anything
•Federated queries
Community-driven open
source project
Deploy Anywhere
•Kubernetes
•Cloud
•On premises

Presto Users
Facebook: 10,000+ of nodes, 1000s of users
Uber 2,000+ nodes, 160K+ queries daily
LinkedIn: 500+ nodes, 200K+ queries daily
Lyft: 400+ nodes, 100K+ queries daily

Starburst
6
Enterprise
Grade Security
On-Prem,
or Cloud
Rapid Time to
Insights
Low Cost of
Ownership
24x7 Expert
Support
ANSI SQL MPP
Query Engine
High
Concurrency
Our Platform
Named Open Source
Startup to Watch 2020
600% Growth YoY
100+
Enterprise Customers
NPS Score
80+
Massive
Scale

Starburst Enterprise Presto
Performance Connectivity Security Management
30+ supported enterprise
connectors
High performance parallel
connectors for Oracle,
Teradata, Snowflake and
more
Support
From petabytes to exabytes
– query data from disparate
sources using SQL – with
high concurrency
Control your
price/performance with the
latest cost-based optimizer
Caching available for
frequently accessed data
Kerberos & LDAP
integration
Global Security for fine-
grained Access Control
Data encryption
Data masking
Query auditing
Configuration
Autoscaling
High availability
Monitoring
Deploy anywhere
The largest team of Presto
experts in the world
Fully-tested, stable
releases, curated by the
Presto creators
Hot fixes & security
patches
24x7 support, 365 – we’ve
got your back
7

Starburst Customers
Tech
Retail Media & Telco
Finance & Insurance
Healthcare & Pharma Other

Why Delta Lake?
▪ ACID properties over data lake
▪ Open source table format
▪ Stored as Parquet files
▪ Object storage support
▪ Schema evolution
▪ Time travel feature
▪ Metadata & statistics
▪ Data skipping & z-ordering

Native Presto Delta Lake Reader
Supports data skipping & dynamic filtering
Optimizes query using file statistics
Supports reading the Delta transaction log
Native connector written from scratch

Native Delta Lake Reader Performance
▪ 2x average speedup across 22 queries
▪ 6x best query speedup
▪ “What we have here is game changing for
our industry. Especially now that the native
Delta reader works as fast as it does. We
have people lining up to now use this data”
▪ “We have queries that were running in 10
minutes that are now running in 47
seconds"
Feedback from customers:Standard TPC-H benchmark:

Starburst Platform
Data Scientists Data AnalystsFinance Marketers
The Data Consumption Layer
Existing analytics tools
Data Masking Global Security
Column + Row-
level permissions
Query Auditing Fine-grained
access control
Data Encryption
Data Lakes Relational Databases NoSQL Stores Publish/Subscribe
Azure Event Hub

Different SQL Technologies In Your Toolbelt
Streaming Ingestion
Machine Learning
Data Investigation
Large Batch Jobs
Fast Federated Queries
High Concurrency SQL Engine
High Performance Ad Hoc
Reporting/Analytics
Optionality
Cloud Data Warehouse
Rapid Ad Hoc Reporting/Analytics
Fast, but everything must live in
Snowflake (ETL/ELT is required)
Vendor and data lock in

Data Flow Diagram
Using a combination of Databricks and Starburst Presto to
bring a full data ingestion and analytical environment to life

Data Ingestion and Transformation
● Real-time ingestion of event data into
Delta tables
● Customer and inventory data ingested
every hour
● Modified customer information merged
into Delta Lake table
● Data marts created using streaming and
batch data

Query-time Data Federation
● Single point of access to numerous data
sources
● Query Delta Lake and federate with
legacy databases as well as many
NoSQL data stores
● Enforce table, column and row level
policies to ensure maximum data
security
● Mask column data for different groups
and users

Data Consumption & Analytics BI Reporting Tools
SQL Query Tools
• Connect using a variety of BI and SQL
tools including Looker, Tableau, Power
BI and DBeaver
• JDBC, ODBC and many libraries
including Python, R and Java
SELECT id, COUNT(*), SUM(active_seconds)
FROM delta.iot.events e
JOIN snowflake.sales.customer c ON (e.customer_id = c.id)
WHERE e.event_date >= current_date
AND c.region = 'US'
AND c.id IN
(SELECT l.customer_id
FROM elastic.web.logs l
WHERE l.visit_date >= date '2020-01-01')
GROUP BY id;

Thank You!
Try Presto with Delta:
www.starburstdata.com/delta-lake-reader

Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch and More!)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch and More!)

Ähnlich wie Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch and More!) (20)

Mehr von Databricks

Mehr von Databricks (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch and More!)