How to Troubleshoot Apps for the Modern Connected Worker
Reading The Source Code of Presto
1. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Taro L. Saito, Dongmin Yu
Arm Treasure Data
Presto Conference Tokyo 2019
June 11th, 2019
Reading Source Code of Presto
1
2. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
About Me: Taro L. Saito (Leo)
2
● Principal Software Engineer at Arm
Treasure Data
● Building distributed query engine service
● Living in US for 4 years
● DBMS & Data Science Background
● Ph.D. of Computer Science
● OSS Projects around DBMS
● snappy-java: a compression library used
in Spark, Parquet, etc.
● sqlite-jdbc
● msgpack-java
■ MsgPack implementation for
Java
3. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
New Release from O’Reilly Japan
● “Designing Data-Intensive Applications”
● By Martin Kleppman
● Techniques and concepts around distributed
data processing systems
● A Japanese-translation will be available soon
● on July 18, 2019
● Pre-order at:
■ Amazon.co.jp
■ O’Reilly Japan
3
分散データシステム入門の決定版の翻訳が来月発売
4. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Today’s Goals
● Learn How To Start Reading Presto’s Source Code
● GitHub
■ prestosql: https://github.com/prestosql/presto
● Note: prestodb is an old repo maintained by Facebook
● Find Your Own Interests And Learn Where To Look At:
● SQL on Everything
■ Using Presto as an SQL interface to your own data sources (connectors)
● Query Engine Core
■ Learn how to implement query engines
● Distributed Systems
■ Learn how to implement HTTP-based distributed systems
● Using Presto
■ presto clients, Presto’s REST protocol
● Extending Presto
■ e.g., Adding new UDFs
4
5. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Presto: SQL On Everything
● ICDE 2019 Paper
● Architecture overview and the details of the system design
5
7. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Setting Up IntelliJ IDEA
● Learn Useful Shortcuts
● Source Code Navigation
● shift x 2
■ Search everything
● Go to declaration
■ Ctrl + Click
● Quick definition
■ Ctrl + Shift + I
● Find Usage of functions, classes
● Type Hierarchies
■ Ctrl + H
● Bookmarks
7
8. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Type Hierarchy (Ctrl + H)
8
10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 10
Connector: SQL on Everything
11. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Connector: SQL on Everything
● Presto Connectors (plug-ins)
● Enable processing SQL queries for
various data sources
● Implement presto-spi interfaces
● Connector interface
● presto-hive
● A full-fledged connector using
almost all SPI features
● Difficult to understand for beginners
● presto-base-jdbc
● Relatively easier connector to read
● Base of various DBMS adapters
■ presto-postgresql,
presto-mysql, etc.
11
12. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
presto-base-jdbc connector
12
13. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Google Guice: Dependency Injection Library
● xxxModule classes define bindings to use at constructors with @Inject annotation
13
14. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Presto Coordinator Module
● You can learn what
classes are used for the
coordinator
14
15. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Reading Data From Data Sources
● Record/Page based readers
● RecordCursor interface
● isNull
● getType(field)
● getXXX(field)
● Mapping to Presto Data Types
● boolean
● long
● double
● Slice (utf8 string)
● Object
■ array, map, etc.
15
16. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Example
● JDBCRecordCursor
● Steps
● Connect to JDBC
● Prepare Column Readers
● Build SQL to run with JDBC
● Read JDBC ResultSets
16
17. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
TupleDomain
● Build SELECT statements for
JDBC queries
● Presto provides:
● Projection
■ columns to select
● TupleDomain
● ColumnDomain
■ predicates
○ ==
○ <, <=, >=, >
○ in (....)
○ null / not null
○ all
17
18. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Reading Column Data
● Convert External JDBC Results into Presto Column Data
18
19. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Writing Column Data (PageSink)
● Page
● Presto’s internal data format
on memory
● Used for sending
intermediate query results
(table structure = releation)
● Page has multiple Blocks
■ columnar format
● Block
● column data of the same
type
● 0 until position
● PageSink
● Receives Page
● appendPage(page)
● presto-base-jdbc
● Page -> insert into SQL
statements
19
21. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 21
Query Engine Core
22. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Query Engine Core: Query Execution Flow
22
23. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Query Engine Core: Parsing SQL
● ANTLR4 Grammar (SqlBase.g4)
● SQL-92 syntax
● Used also in SparkSQL
● SqlBaseLexer/Parser:
● Generated by ANTLR4
● SQL -> ANTLR parse tree
● SqlParser
● AstBuilder
■ Visitor pattern for ANTLR parse tree
■ Generates SQL tree for Presto: Statement
23
24. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Analyzer
● Traverse Statement
structure
● Resolve actual column
names and types in SQL
● Using Metadata (table
schema provider)
● e.g., find actual column
names accessed in
SELECT *
24
25. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
SqlQueryExecution
● Analyze
● Generates a logical
SQL plan (Plan)
● Apply logical plan
optimizers
● DistributedPlan
● Split query stages
into multiple tasks
● Assign worker nodes
to use
25
26. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
LocalExecutionPlanner
● Running at worker nodes
● Optmization
● Create a compiled operator (Java Byte Code)
● Example:
● Generates predicate/projection evaluation code during table scan
26
27. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Further Reading: Anatomy of Presto
● By Dongmin Yu (Arm Treasure Data)
● https://www.slideshare.net/dongminyu/presto-anatomy
● How presto generates byte-codes for query processing
27
29. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Using Presto: Presto REST Protocol (v1)
● POST /v1/statement
● body: SQL query text
● receive: QyeryResults data with nextUri
● Headers
■ X-Presto-User, X-Presto-Schema,
X-Presto-Session, X-Presto-Client-Tags
● GET /v1/statement/(query_id)/(page token)
● nextUri, table data, query stage stats
● Keep reading until nextUri becomes null
● QueryResults model class
● Represented in JSON
■ Jackson JSON object mapper
● Error Handling
● Standard errors (e.g., SQL syntax errors)
■ 200: Error Response
■ 503: (Server slowdown), retry in 50 ~100
ms
29
30. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Presto UDFs
● User-Defined Functions
● Mapping Java functions to SQL functions
● FunctionRegistry
30
31. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 31
Presto As A Distributed System
32. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
● Airlift
● Presto’s internal framework for
building REST services
● https://github.com/airlift/airlift
● REST API definitions
● xxxResource classes
● JAX-RS annotations
■ @Path, @GET, @POST
● JSON protocol (jackson)
● HTTP Services
● coordinator/worker
● discovery service
● JMX - JSON server
● Utilities
● Guice extension
■ bootstrap, configuration
● logger, units
Presto As A Distributed System Implementation
32
33. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Summary
● Learned various flavors of Presto and the corresponding code locations
● SQL on Everything
■ presto connectors
● Query Engine Core
■ presto-main
● Distributed Systems
■ airlift modules
● Presto as a REST service (presto client)
■ query protocol
● Extending Presto
■ e.g., Adding new UDFs
● Enjoy Reading Presto’s Code For Your Own Interest!
33