Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
1. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Taro L. Saito, Ph.D.
GitHub: @xerial
Arm Treasure Data
Airframe
Lightweight Building Blocks for Scala
Treasure Dataを支える技術: Airframe編
October 17th, 2018
Plazma - TD Tech Talk
1
2. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe
● Lightweight Building Blocks for Scala
● Essential for building any applications
● Used in production for 2+ years
● Based on my code collection since 2009
● Initially written in Java
● Gradually migrated to Scala
● Repackaged into wvlet.airframe in 2016
● For maintainability
● 18 Modules
● Simplifying your daily programming in Scala
2
3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe
● Named From A Novel By Michael Crichton (1942-2008)
● The author of Jurassic Park
3
4. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
About Me: Taro L. Saito (Leo)
● An Engineer with Research Background
● Ph.D., University of Tokyo
● DBMS & Genome Science
● Developing Query Engines in TD
● Living in US for 3+ years
● Bay Area, Silicon Valley
● Active OSS Developer
● airframe
● sqlite-jdbc
■ More than 1000 GitHub stars
● snappy-java
■ Compression library used in
Spark, Parquet
● sbt-sonatype
■ Used in 2000+ Scala projects
● ...
4
5. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Personal Goal of Today
● Collect 200 GitHub Stars
● keyword: Airframe + Scala
● https://github.com/wvlet/airframe
5
6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Major Goals
● Providing A Standard Toolkit For Building Reliable Services
● Removing Complexities In Application Development
● Providing Simplicity By Design
6
7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Simplicity By Design
● “Simplicity” by Philippe Dufour
● A clock made by a legendary
watchmaker in Switzerland
● Every part of the clock is built
by himself
● Airframe
● Provides simplicity for
application developers
7
8. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application Development with Airframe
● Bootstrap
● Parsing command-line options
● Reading configuration files
● Reading databases
● Object - Data Mapping
● Mapping data to objects (object mapping)
● Saving objects to files (serialization)
● Debugging
● Logging
● Collecting metrics
● Monitoring
● Building Services
● Creating service objects using dependency injection (DI)
8
9. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
18 Airframe Modules
● Bootstrap
● airframe-config Configuration loader
● airframe-opts Command-line option parser
● Object Serialization
● airframe-codec encoder/decoder SPI + standard codecs
● airframe-msgpack pure-Scala MessagePack implementation
● airframe-tablet CSV/TSV/JSON/JDBC ResultSet <-> Object
● Monitoring & Debugging
● airframe-log Logging
● airframe-metrics Human-readable metrics for time, date, data size, etc
● airframe-jmx Object metrics provider through JMX
● Building Service Objects
● airframe Dependency injection
● airframe-surface Object type inspector
● Misc:
● airframe-control, airframe-jdbc, airframe-json, airframe-http, etc.
9
11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Configuring Applications (airframe-config)
● Embedding static configurations for all environments into a docker image
● Merging YAML + external configurations + object default parameters
YAML
development:
addr: api-dev.com
production:
addr: api.com
Config Object
case class ServerConfig(
addr: String,
port: Int = 8080,
password: String
)
production:
addr: api.com
Select env:production Credentials and Local
Configurations
Merge
Immutable
Object Default Parameters
(e.g., port = 8080)
Object
Mapping
11
12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Reading And Saving Query Results
● Can we standardize this pattern?
RDBMS
JDBC
ResultSet
Seq[A]
Object
Mapping
External
Storage
(Cache)
Object
Serialization
Object
Deserialization
(Reload from cache)
12
13. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Problem: Data Mapping is Everywhere
● How many data readers and object mappers do we need?
● How can we simplify this?
YAML
JDBC
ResultSet
YAML Parser +
Object Mapper
Config
Object
Table
Object
Object-Relation
Mapper
JSON
JSON Parser +
Object Mapper
Object
13
14. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Using MessagePack As An Intermediate Data Format
● Why MessagePack?
● Flexible to support conversions from various types of data format
● Compact and efficient compared to JSON
● Easy to create schema-on-read object mapper
JDBC
ResultSet
MessagePack Object
Pack/Unpack
Unpack
Pack
YAML
JSON
14
15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe Codec: Pack/Unpack Interface
● MessageCodec[A]
● pack: Convert object A into MessagePack
● unpack: Convert MessagePack into object A
Input MessagePack Output
Pack Unpack
PackUnpack
15
16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Pre-defined Codecs in airframe-codec
● Primitive Codecs
● ByteCodec, CharCodec, ShortCodec, IntCodec, LongCodec
● FloatCodec, DoubleCodec
● StringCodec
● BooleanCodec
● TimeStampCodec
● Collection Codec
● ArrayCodec, SeqCodec, ListCodec, IndexSeqCodec, MapCodec, etc.
● OptionCodec
● JsonCodec
● Java-specific Codec
● FileCodec, ZonedDateTimeCodec, ResultSetCodec, etc.
● Adding Custom Codecs
● Implement MessageCodec[X] interface
16
17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
ObjectCodec[A]: Combination of Codecs
● Generate Complex Codecs From The Parameter List of Objects
● MessagePack based serializer/deserializer
class A(
port:Int,
name:String,
timeoutSec:Double
)
Unpack
Pack
IntCodec
StringCodec
DoubleCodec
MessagePack
Array
Map
ObjectCodec[A]
17
18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Power of Schema-On-Read Codec
● MessagePack values describe their own data types (self-describing)
● How to deserialize the data can be determined based on MessagePack types
Int
Float
Boolean
String
Array
Map
Binary
CSV
MessagePack
JDBC
ResultSet
Column
Scala.Int
parseInt
toInt
0 or 1
IntCodec
Pack Unpack
Error or
Zero.of[Int]
“100”
(string)
100
(bigint)
100
(int)
18
19. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application 1/5: Treasure Data = MessagePack DBMS
● Fluentd -> MessagePack -> Treasure Data
● Automatically Generating Schema from Data
● Apply schema–on-read for providing table data to Presto/Hive/Spark, etc.
MessagePack
Fluentd
Mobile SDK
Table Schema
IntCodec
StringCodec
Generate Generate
Table Reader
Presto
Hive
Spark
Schema-free Data
19
20. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application 2/5: Data Transformation
● Airframe Codec Works As A Lightweight Embulk
List[A] MessagePack
Pack
Unpack
TSV
Pack/Unpack
JSON
SQLite3
20
21. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application 3/5: Taking Snapshots of Workflow Tasks
● Frequently Used for Data Analytic Pipelines
● Save Task Results As MessagePack (binary)
● Save the cost of re-computation
Result: Seq[A] MessagePack Storage
Pack
Save
Unpack
Task
Run
Load
Load
Compute
(e.g., 10 min)
First run
Snapshot
21
22. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application 4/5: Scala.js RPC
● Scala.js
● Compiling Scala code into JavaScript for Web Browsers
● Model classes can be shared between Scala and Scala.js
● airframe-msgpack
● Added pure-Scala MessagePack implementation for supporting Scala.js
UserInfo MessagePack UserInfo
Pack Unpack
PackUnpack
Scala
Server Side
Scala.js
Client Side
XML RPC
22
23. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application 5/5: Airframe HTTP Web Service
● Mapping HTTP Responses and Requests to Method Call Argument and Return
Values
● Airframe HTTP: Building Low-Friction Web Services Over Finagle (Medium Blog)
Http
Request
MessagePack
Pack
Request
Handler
Method
Unpack to Function Arguments
Http
Response
MessagePack
Unpack Return value
23
24. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
A Challenge: Type Erasure in Java Compiler (javac)
● Java class files (byte code) removes generic type information (type erasure)
class A (data:List[B])
class A
data: List[java.lang.Object]
javac
compiler
Class Parameter Type List ObjectCodec
Generate
???
Type Erasure
24
25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe Surface
● Reading Type Signatures From ScalaSig
● Scala compiler embeds Scala Type Signatures (ScalaSig) to class files
● Airframe Surface
● A library for inspecting object shapes
class A (data:List[B])
class A
data: List[java.lang.Object]
class A
data: List[java.lang.Object]
ScalaSig: data:List[B]
javac
scalac
Surface.of[A]
data: List[B]
scala.reflect.runtime.
universe.TypeTag
Type Erasure
25
27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
What is Dependency Injection (DI)?
● Many Confusing Articles
● Inversion of Control Containers and the Dependency Injection pattern. Martin
Fowler (2004)
● StackOverflow, Wikipedia, …
● Many Frameworks
● Spring, Google Guice, Scaldi, Macwire, Grafter, Weld, etc.
● No framework approaches do exists (Pure-Scala DI)
● Recent Definition:
● Dependency Injection is the process of creating the static, stateless graph of
service objects, where each service is parameterised by its dependencies.
■ What is Dependency Injection? by Adam Warski (2018)
● However, it’s still difficult to understand what is DI
27
28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Simplifying DI with Airframe
● Airframe Usage
● import wvlet.airframe._
● Simple 3 Step DI
● bind
● design
● build
● To Fully Understand DI …
● Think about what you can simplify
with DI
● Thinking about DI itself doesn’t
make much sense
■ e.g., comparing Guice,
Airframe, etc.
28
29. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
3 Things You Can Forget with Airframe DI
● 1. How to Build Service Objects
● 2. How to Manage Resource Lifecycle
● 3. How to Use DI Itself (!!)
Airframe Gives You A Focus On Application Development
29
30. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
1: Forget How to Build Service Objects
● When coding A and B
● You can focus on only direct dependencies
● You can forget about indirect dependencies
● Airframe DI builds A, B, and direct/indirect dependencies on your behalf.
A
DB
Connection
Pool
DB
Client
DB Monitor
Fluentd
Logger
HttpClient
B
30
You can forget this part
31. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Replacing Modules For Testing The Service
● In Airframe Design
● You can replace DB and FluentdLogger to In-Memory Impl
● How to build A and B differs, but the same code can be used
A
Memory
DB
Connection
Pool
DB
Client
DB Monitor
Fluentd
Logger
In-memory
Logger
B
31
Overriding Design for Testing
32. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
2: Forget How to Manage Resource Lifecycle
● FILO := First-In Last-Out
● Airframe can add onStart and onShutdown lifecycle hooks when creating
instances
● When closing sessions, onShutdown will be called in the reverse order
● Dependencies forms DAG
● Dependencies will be generated when creating new service objects
A
DB
Connection
Pool
DB
Client
DB Monitor
Fluentd
Logger
HttpClient
B
134
56
7
2
8
Shared
Resource
32
33. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
3: Forget How to Use DI
33
34. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Summary: Reducing Code Complexity with Airframe DI
● You can effectively forget about:
● How to build service objects
● How to manage resources in FILO order
● How to use DI itself
A
DB
Connection
Pool
DB
Client
DB Monitor
Fluentd
Logger
HttpClient
B
134
56
7
2
8
34
Implementation Details
36. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Debugging Applications: Airframe Log
● Airframe Log: A Modern Logging Library for Scala (Medium Blog)
● ANSI color, source code location support
36
37. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe Metrics
● Human Readable Data Format (Duration, DataSize, etc.)
● Handy Time Window String Support (Used in TD_INTERVAL)
37
38. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe JMX
● Checking the internal states of remote JVM processes
● JMX clients
● jconsole has JMX metric monitor
● Airframe JMX -> DataDog -> Dashboard
38
40. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Future Work
● Airframe Stream
● Stream Query Processing Engine for MessagePack
● Can be a query engine for various types of data through MessagePack
● Airframe Fluentd
● Metric objects -> MessagePack -> Fluentd
● and more ...
Input MessagePack
Pack
Unpack
Stream SQL MessagePack
Query
Processing
Filter/Aggregation/Join, etc.
40
41. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Current State of Airframe
● Version 0.69 (As of October 2018)
● We already had 35+ releases in 2018
● Automated Release
● Cross building libraries for Scala 2.11, 2.12, 2.13, and Scala.js
● ‘sbt release’ command took 3 hours
■ Sequential steps:
○ compile -> test -> package -> upload x 18 modules x 4 Scala versions
● Now a new version can be released in 10 minutes on Travis CI
● Blog
● 3 Tips for Maintaining Scala Projects
41
42. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Summary
● Airframe
● Simplicity By Design
● 18 modules for simplifying application
development
● Key Technologies
● MessagePack-based codec
● airframe-surface to inspect object shapes
● Dependency injection (DI)
● Think What Can Be Simplified
● How MessagePack can be used to simplify
data transformation
● Airframe DI: 3 things you can forget
Don’t Forget Adding GitHub Star!
wvlet/airframe
42