SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Taro L. Saito, Dongmin Yu
Arm Treasure Data
Presto Conference Tokyo 2019
June 11th, 2019
Reading Source Code of Presto
1
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
About Me: Taro L. Saito (Leo)
2
● Principal Software Engineer at Arm
Treasure Data
● Building distributed query engine service
● Living in US for 4 years
● DBMS & Data Science Background
● Ph.D. of Computer Science
● OSS Projects around DBMS
● snappy-java: a compression library used
in Spark, Parquet, etc.
● sqlite-jdbc
● msgpack-java
■ MsgPack implementation for
Java
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
New Release from O’Reilly Japan
● “Designing Data-Intensive Applications”
● By Martin Kleppman
● Techniques and concepts around distributed
data processing systems
● A Japanese-translation will be available soon
● on July 18, 2019
● Pre-order at:
■ Amazon.co.jp
■ O’Reilly Japan
3
分散データシステム入門の決定版の翻訳が来月発売

Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Today’s Goals
● Learn How To Start Reading Presto’s Source Code
● GitHub
■ prestosql: https://github.com/prestosql/presto
● Note: prestodb is an old repo maintained by Facebook
● Find Your Own Interests And Learn Where To Look At:
● SQL on Everything
■ Using Presto as an SQL interface to your own data sources (connectors)
● Query Engine Core
■ Learn how to implement query engines
● Distributed Systems
■ Learn how to implement HTTP-based distributed systems
● Using Presto
■ presto clients, Presto’s REST protocol
● Extending Presto
■ e.g., Adding new UDFs
4
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Presto: SQL On Everything
● ICDE 2019 Paper
● Architecture overview and the details of the system design
5
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 6
Navigating Code
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Setting Up IntelliJ IDEA
● Learn Useful Shortcuts
● Source Code Navigation
● shift x 2
■ Search everything
● Go to declaration
■ Ctrl + Click
● Quick definition
■ Ctrl + Shift + I
● Find Usage of functions, classes
● Type Hierarchies
■ Ctrl + H
● Bookmarks
7
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Type Hierarchy (Ctrl + H)
8
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Bookmarks
9
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 10
Connector: SQL on Everything
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Connector: SQL on Everything
● Presto Connectors (plug-ins)
● Enable processing SQL queries for
various data sources
● Implement presto-spi interfaces
● Connector interface
● presto-hive
● A full-fledged connector using
almost all SPI features
● Difficult to understand for beginners
● presto-base-jdbc
● Relatively easier connector to read
● Base of various DBMS adapters
■ presto-postgresql,
presto-mysql, etc.
11
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
presto-base-jdbc connector
12
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Google Guice: Dependency Injection Library
● xxxModule classes define bindings to use at constructors with @Inject annotation
13
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Presto Coordinator Module
● You can learn what
classes are used for the
coordinator
14
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Reading Data From Data Sources
● Record/Page based readers
● RecordCursor interface
● isNull
● getType(field)
● getXXX(field)
● Mapping to Presto Data Types
● boolean
● long
● double
● Slice (utf8 string)
● Object
■ array, map, etc.
15
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Example
● JDBCRecordCursor
● Steps
● Connect to JDBC
● Prepare Column Readers
● Build SQL to run with JDBC
● Read JDBC ResultSets
16
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
TupleDomain
● Build SELECT statements for
JDBC queries
● Presto provides:
● Projection
■ columns to select
● TupleDomain
● ColumnDomain
■ predicates
○ ==
○ <, <=, >=, >
○ in (....)
○ null / not null
○ all
17
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Reading Column Data
● Convert External JDBC Results into Presto Column Data
18
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Writing Column Data (PageSink)
● Page
● Presto’s internal data format
on memory
● Used for sending
intermediate query results
(table structure = releation)
● Page has multiple Blocks
■ columnar format
● Block
● column data of the same
type
● 0 until position
● PageSink
● Receives Page
● appendPage(page)
● presto-base-jdbc
● Page -> insert into SQL
statements
19
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. 20
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 21
Query Engine Core
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Query Engine Core: Query Execution Flow
22
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Query Engine Core: Parsing SQL
● ANTLR4 Grammar (SqlBase.g4)
● SQL-92 syntax
● Used also in SparkSQL
● SqlBaseLexer/Parser:
● Generated by ANTLR4
● SQL -> ANTLR parse tree
● SqlParser
● AstBuilder
■ Visitor pattern for ANTLR parse tree
■ Generates SQL tree for Presto: Statement
23
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Analyzer
● Traverse Statement
structure
● Resolve actual column
names and types in SQL
● Using Metadata (table
schema provider)
● e.g., find actual column
names accessed in
SELECT *
24
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
SqlQueryExecution
● Analyze
● Generates a logical
SQL plan (Plan)
● Apply logical plan
optimizers
● DistributedPlan
● Split query stages
into multiple tasks
● Assign worker nodes
to use
25
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
LocalExecutionPlanner
● Running at worker nodes
● Optmization
● Create a compiled operator (Java Byte Code)
● Example:
● Generates predicate/projection evaluation code during table scan
26
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Further Reading: Anatomy of Presto
● By Dongmin Yu (Arm Treasure Data)
● https://www.slideshare.net/dongminyu/presto-anatomy
● How presto generates byte-codes for query processing
27
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 28
Using Presto
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Using Presto: Presto REST Protocol (v1)
● POST /v1/statement
● body: SQL query text
● receive: QyeryResults data with nextUri
● Headers
■ X-Presto-User, X-Presto-Schema,
X-Presto-Session, X-Presto-Client-Tags
● GET /v1/statement/(query_id)/(page token)
● nextUri, table data, query stage stats
● Keep reading until nextUri becomes null
● QueryResults model class
● Represented in JSON
■ Jackson JSON object mapper
● Error Handling
● Standard errors (e.g., SQL syntax errors)
■ 200: Error Response
■ 503: (Server slowdown), retry in 50 ~100
ms
29
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Presto UDFs
● User-Defined Functions
● Mapping Java functions to SQL functions
● FunctionRegistry
30
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 31
Presto As A Distributed System
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
● Airlift
● Presto’s internal framework for
building REST services
● https://github.com/airlift/airlift
● REST API definitions
● xxxResource classes
● JAX-RS annotations
■ @Path, @GET, @POST
● JSON protocol (jackson)
● HTTP Services
● coordinator/worker
● discovery service
● JMX - JSON server
● Utilities
● Guice extension
■ bootstrap, configuration
● logger, units
Presto As A Distributed System Implementation
32
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Summary
● Learned various flavors of Presto and the corresponding code locations
● SQL on Everything
■ presto connectors
● Query Engine Core
■ presto-main
● Distributed Systems
■ airlift modules
● Presto as a REST service (presto client)
■ query protocol
● Extending Presto
■ e.g., Adding new UDFs
● Enjoy Reading Presto’s Code For Your Own Interest!
33
Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017
Thank You!
Danke!
Merci!
谢谢!
ありがとう!
Gracias!
Kiitos!
34

Weitere ähnliche Inhalte

Was ist angesagt?

Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 

Was ist angesagt? (20)

Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEOTricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
Tricks every ClickHouse designer should know, by Robert Hodges, Altinity CEO
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
Grafana 7.0
Grafana 7.0Grafana 7.0
Grafana 7.0
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 

Ähnlich wie Reading The Source Code of Presto

Ähnlich wie Reading The Source Code of Presto (20)

Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
 
Airframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpecAirframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpec
 
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure DataHow To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
 
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure Data
 
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
 
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
 
MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?
 
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Doc store
Doc storeDoc store
Doc store
 
Containerized MySQL OpenWorld talk
Containerized MySQL OpenWorld talkContainerized MySQL OpenWorld talk
Containerized MySQL OpenWorld talk
 
Oracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration HustleOracle to Postgres Schema Migration Hustle
Oracle to Postgres Schema Migration Hustle
 
Z Data Tools and APIs Overview
Z Data Tools and APIs OverviewZ Data Tools and APIs Overview
Z Data Tools and APIs Overview
 
Basic MySQL Troubleshooting for Oracle DBAs
Basic MySQL Troubleshooting for Oracle DBAsBasic MySQL Troubleshooting for Oracle DBAs
Basic MySQL Troubleshooting for Oracle DBAs
 
MySQL 8.0 GIS Overview
MySQL 8.0 GIS OverviewMySQL 8.0 GIS Overview
MySQL 8.0 GIS Overview
 
Migration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQLMigration From Oracle to PostgreSQL
Migration From Oracle to PostgreSQL
 
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
ML Best Practices: Prepare Data, Build Models, and Manage Lifecycle (AIM396-S...
 

Mehr von Taro L. Saito

Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
Taro L. Saito
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
Taro L. Saito
 

Mehr von Taro L. Saito (18)

Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
 
Airframe RPC
Airframe RPCAirframe RPC
Airframe RPC
 
Tips For Maintaining OSS Projects
Tips For Maintaining OSS ProjectsTips For Maintaining OSS Projects
Tips For Maintaining OSS Projects
 
Learning Silicon Valley Culture
Learning Silicon Valley CultureLearning Silicon Valley Culture
Learning Silicon Valley Culture
 
Scala at Treasure Data
Scala at Treasure DataScala at Treasure Data
Scala at Treasure Data
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. Tokyo
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例
 
JNuma Library
JNuma LibraryJNuma Library
JNuma Library
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編
 
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoWeaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
 
Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014
 
Silkによる並列分散ワークフロープログラミング
Silkによる並列分散ワークフロープログラミングSilkによる並列分散ワークフロープログラミング
Silkによる並列分散ワークフロープログラミング
 
2011年度 生物データベース論 2日目 木構造データ
2011年度 生物データベース論 2日目 木構造データ2011年度 生物データベース論 2日目 木構造データ
2011年度 生物データベース論 2日目 木構造データ
 
Relational-Style XML Query @ SIGMOD-J 2008 Dec.
Relational-Style XML Query @ SIGMOD-J 2008 Dec.Relational-Style XML Query @ SIGMOD-J 2008 Dec.
Relational-Style XML Query @ SIGMOD-J 2008 Dec.
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Reading The Source Code of Presto

  • 1. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Taro L. Saito, Dongmin Yu Arm Treasure Data Presto Conference Tokyo 2019 June 11th, 2019 Reading Source Code of Presto 1
  • 2. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. About Me: Taro L. Saito (Leo) 2 ● Principal Software Engineer at Arm Treasure Data ● Building distributed query engine service ● Living in US for 4 years ● DBMS & Data Science Background ● Ph.D. of Computer Science ● OSS Projects around DBMS ● snappy-java: a compression library used in Spark, Parquet, etc. ● sqlite-jdbc ● msgpack-java ■ MsgPack implementation for Java
  • 3. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. New Release from O’Reilly Japan ● “Designing Data-Intensive Applications” ● By Martin Kleppman ● Techniques and concepts around distributed data processing systems ● A Japanese-translation will be available soon ● on July 18, 2019 ● Pre-order at: ■ Amazon.co.jp ■ O’Reilly Japan 3 分散データシステム入門の決定版の翻訳が来月発売

  • 4. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Today’s Goals ● Learn How To Start Reading Presto’s Source Code ● GitHub ■ prestosql: https://github.com/prestosql/presto ● Note: prestodb is an old repo maintained by Facebook ● Find Your Own Interests And Learn Where To Look At: ● SQL on Everything ■ Using Presto as an SQL interface to your own data sources (connectors) ● Query Engine Core ■ Learn how to implement query engines ● Distributed Systems ■ Learn how to implement HTTP-based distributed systems ● Using Presto ■ presto clients, Presto’s REST protocol ● Extending Presto ■ e.g., Adding new UDFs 4
  • 5. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto: SQL On Everything ● ICDE 2019 Paper ● Architecture overview and the details of the system design 5
  • 6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 6 Navigating Code
  • 7. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Setting Up IntelliJ IDEA ● Learn Useful Shortcuts ● Source Code Navigation ● shift x 2 ■ Search everything ● Go to declaration ■ Ctrl + Click ● Quick definition ■ Ctrl + Shift + I ● Find Usage of functions, classes ● Type Hierarchies ■ Ctrl + H ● Bookmarks 7
  • 8. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Type Hierarchy (Ctrl + H) 8
  • 9. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Bookmarks 9
  • 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 10 Connector: SQL on Everything
  • 11. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Connector: SQL on Everything ● Presto Connectors (plug-ins) ● Enable processing SQL queries for various data sources ● Implement presto-spi interfaces ● Connector interface ● presto-hive ● A full-fledged connector using almost all SPI features ● Difficult to understand for beginners ● presto-base-jdbc ● Relatively easier connector to read ● Base of various DBMS adapters ■ presto-postgresql, presto-mysql, etc. 11
  • 12. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. presto-base-jdbc connector 12
  • 13. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Google Guice: Dependency Injection Library ● xxxModule classes define bindings to use at constructors with @Inject annotation 13
  • 14. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto Coordinator Module ● You can learn what classes are used for the coordinator 14
  • 15. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Reading Data From Data Sources ● Record/Page based readers ● RecordCursor interface ● isNull ● getType(field) ● getXXX(field) ● Mapping to Presto Data Types ● boolean ● long ● double ● Slice (utf8 string) ● Object ■ array, map, etc. 15
  • 16. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Example ● JDBCRecordCursor ● Steps ● Connect to JDBC ● Prepare Column Readers ● Build SQL to run with JDBC ● Read JDBC ResultSets 16
  • 17. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. TupleDomain ● Build SELECT statements for JDBC queries ● Presto provides: ● Projection ■ columns to select ● TupleDomain ● ColumnDomain ■ predicates ○ == ○ <, <=, >=, > ○ in (....) ○ null / not null ○ all 17
  • 18. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Reading Column Data ● Convert External JDBC Results into Presto Column Data 18
  • 19. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Writing Column Data (PageSink) ● Page ● Presto’s internal data format on memory ● Used for sending intermediate query results (table structure = releation) ● Page has multiple Blocks ■ columnar format ● Block ● column data of the same type ● 0 until position ● PageSink ● Receives Page ● appendPage(page) ● presto-base-jdbc ● Page -> insert into SQL statements 19
  • 20. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. 20
  • 21. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 21 Query Engine Core
  • 22. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Query Engine Core: Query Execution Flow 22
  • 23. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Query Engine Core: Parsing SQL ● ANTLR4 Grammar (SqlBase.g4) ● SQL-92 syntax ● Used also in SparkSQL ● SqlBaseLexer/Parser: ● Generated by ANTLR4 ● SQL -> ANTLR parse tree ● SqlParser ● AstBuilder ■ Visitor pattern for ANTLR parse tree ■ Generates SQL tree for Presto: Statement 23
  • 24. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Analyzer ● Traverse Statement structure ● Resolve actual column names and types in SQL ● Using Metadata (table schema provider) ● e.g., find actual column names accessed in SELECT * 24
  • 25. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. SqlQueryExecution ● Analyze ● Generates a logical SQL plan (Plan) ● Apply logical plan optimizers ● DistributedPlan ● Split query stages into multiple tasks ● Assign worker nodes to use 25
  • 26. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. LocalExecutionPlanner ● Running at worker nodes ● Optmization ● Create a compiled operator (Java Byte Code) ● Example: ● Generates predicate/projection evaluation code during table scan 26
  • 27. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Further Reading: Anatomy of Presto ● By Dongmin Yu (Arm Treasure Data) ● https://www.slideshare.net/dongminyu/presto-anatomy ● How presto generates byte-codes for query processing 27
  • 28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 28 Using Presto
  • 29. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Using Presto: Presto REST Protocol (v1) ● POST /v1/statement ● body: SQL query text ● receive: QyeryResults data with nextUri ● Headers ■ X-Presto-User, X-Presto-Schema, X-Presto-Session, X-Presto-Client-Tags ● GET /v1/statement/(query_id)/(page token) ● nextUri, table data, query stage stats ● Keep reading until nextUri becomes null ● QueryResults model class ● Represented in JSON ■ Jackson JSON object mapper ● Error Handling ● Standard errors (e.g., SQL syntax errors) ■ 200: Error Response ■ 503: (Server slowdown), retry in 50 ~100 ms 29
  • 30. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto UDFs ● User-Defined Functions ● Mapping Java functions to SQL functions ● FunctionRegistry 30
  • 31. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 31 Presto As A Distributed System
  • 32. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. ● Airlift ● Presto’s internal framework for building REST services ● https://github.com/airlift/airlift ● REST API definitions ● xxxResource classes ● JAX-RS annotations ■ @Path, @GET, @POST ● JSON protocol (jackson) ● HTTP Services ● coordinator/worker ● discovery service ● JMX - JSON server ● Utilities ● Guice extension ■ bootstrap, configuration ● logger, units Presto As A Distributed System Implementation 32
  • 33. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Summary ● Learned various flavors of Presto and the corresponding code locations ● SQL on Everything ■ presto connectors ● Query Engine Core ■ presto-main ● Distributed Systems ■ airlift modules ● Presto as a REST service (presto client) ■ query protocol ● Extending Presto ■ e.g., Adding new UDFs ● Enjoy Reading Presto’s Code For Your Own Interest! 33
  • 34. Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017 Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! 34