Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Introducing the Hub for Data Orchestration
Danny Linden, Chapter Lead Software Engineer (Ryte)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
6. 6
Make sure the technical
elements of your
website are flawless.
Create engaging
content that users
will love.
Score top search
rankings with the help
of real Google data.
ryte.com
7. 7
● Founded in 2012
● Currently more than 70 employees and growing
● Over 1 Mio Users
● Headquartered in Munich with offices in Austria & Ho Chi Minh City
More Infos: ryte.com
14. Parquet on AWS S3
Import Application
on AWS ECS
AWS EMR (Hadoop)
w Presto as
Dist-SQL-Engine
REST API
on AWS ECS
15. Parquet on AWS S3
Import Application
on AWS ECS
Alluxio + Presto as
SQL-Engine
REST API
on AWS ECS
16. Parquet on AWS S3
Import Application
on AWS ECS
Alluxio + Presto as
SQL-Engine
REST API
on AWS ECS
17. Crawl Data
on AWS S3
Crawler Application
on AWS ECS
AWS EMR (Hadoop)
w Presto as
Dist-SQL-Engine
REST API
on AWS ECS
Post Processing
with Spark
Parquet on AWS S3
18. Crawl Data
on AWS S3
Crawler Application
on AWS ECS
Presto as
Dist-SQL-Engine
REST API
on AWS ECS
Post Processing
with Spark
Parquet on AWS S3
19.
20. “Your software is
awesome but
extremely slow since
last week. Its not fun to
work with it currently.”
- Ryte Customer
“It takes forever to load
some reports today, its
frustrating”
- Ryte Customer
Without Alluxio
21. “I don’t know what you
did but the Tool is
much faster now 👍”
- Ryte Customer
“Awesome, feels faster
then before and the
new features safe us a
lot of time”
- Ryte Customer
With Alluxio
28. ● How we integrate data Cross-Product?
● How we keep “Data-Ownership” clear?
● How we measure & monitor Performance
● How we handle (breaking-) changes
● How do we not over-engineering the “MVP”
29.
30. SES WES
How we integrate data Cross-Product? Status Quo:
AWS S3
EMR Alluxio
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
AWS S3
EMR Presto Metastore
REST API
Via Presto connector
(HTTP)
31. SES WES
AWS S3
EMR Alluxio
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
AWS S3
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
JDBC
32.
33. SES WES
AWS S3
EMR Alluxio
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
AWS S3
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
34. SES WES
AWS S3
EMR Alluxio
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
AWS S3
EMR Presto
REST API
Via Presto connector
(HTTP)
35. SES WES
AWS S3
EMR Alluxio
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
AWS S3
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
Presto Connector
Unknowns:
- When available
36. SES WES
AWS S3
EMR Alluxio
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
AWS S3
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
Presto Connector
Unknowns:
- When available
AVAILABLE IN THE FUTURE
37.
38.
39. AWS Account 1 AWS Account 2
AWS S3
EMR Alluxio
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
AWS S3
EMR Presto
Metastore
REST API
Via Presto connector
(HTTP)
JDBC
40. ● Easy to implement
● Low effort on the second system
● Consumer Driven Contracts to clarify the Schema between Teams
● Presto Resource Groups to manage Performance Impact
41.
42.
43. ● Bad “push-down” of aggregations lead to high traffic
● Not good in Resource allocation
● Limited JDBC Feature-Set
● Query Performance not “Owned”
45. Works for me us
Benefits Tradeoffs
● Product & Team Ownership of
Data & Data-Schema
● Shared Data-Layer to optimize costs
● No impact on write operations
● No latency impact cross product
● Independent Upgrades &
Configurations for individual workload
● Schema Changes managed by
consumer driven contracts
46. ● Clear “ownership” of Data is important as for Infrastructure & Software
● Maybe go for multiple iterations when you build a new Infrastructure
● Don’t overcomplicate it = Simplification helps
● Data Orchestration & Management Systems like Alluxio can reduce your stress
● Keep going
47. 47
Danny Linden @ Ryte
Chapter Lead Engineering
E-Mail: d.linden@ryte.com
linkedin.com/in/danny-linden/
Twitter: @CodingDanny
jobs.ryte.com