Alluxio Day x APAC Modern Data Stack
September 22, 2022
For more on Alluxio Day: https://www.alluxio.io/alluxio-day/
For more Alluxio events: https://alluxio.io/events/
Speaker: Shiyan Xu (Founding Member, OneHouse & Apache Hudi PMC)
Apache Hudi’s open-source community is very active and healthy. In this talk, an overview of community-driven major features will be presented, followed by a deep-dive into two of those features, metastore and table management service, driven by Bytedance to illustrate Hudi’s platform vision.
5. - Slack Community
Notable newcomers in Q1 - 2022 include:
Total Slack Members:
Engagement Rate:
1.8k
12%
221
Active Members:
40
61
7.7%
8.5%
519
712
+157% YoY
+3.5pts YoY
+262% YoY
6. ● RFC-51: Support Change-Data-Capture (Alibaba)
○ Capture and produce all changes made to the table
○ Implement debezium-like semantics
● RFC-42: Consistent hashing index (Alibaba)
○ Overcome throttling issues for large tables
● RFC-8: Record-level indexing mechanisms (Uber)
○ Improve indexing by direct mapping record keys to files
● RFC-48: Reduce write amplification with Log-Compaction (Uber)
○ Balance write amp and I/O efficiency
Community-Driven RFCs
7. Community-Driven RFCs (cont’d)
● RFC-61: Snapshot view management (Shopee)
● RFC-52: Secondary index to improve query performance (Alibaba)
● RFC-60: Optimized storage layout for cloud object stores (AWS)
● RFC-12: Efficient migration of large parquet tables to Hudi
(AWS/Onehouse)
8. MetaServer &
Table Management Service
MetaServer (RFC-36)
● Centralize the metadata management for tables
● Atomic sync: storage <-> metastore
TMS (RFC-43)
● Centralize the orchestration for running table services
● Decouple compute from the writer cluster