SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Apache Hudi:
Community-Driven
Development
Shiyan Xu
Engineering Manager @ Onehouse, Apache Hudi PMC
Agenda
● Hudi: quick overview - 1 min
● Community-driven features - 5 min
● MetaServer & Table Management Service - 18 min
● Closing - 1 min
Hudi: Typical Use-Cases
1800+
Slack Members
225+
Contributors
1000+
GH Engagers
20+
Committers
Pre-installed on 5 cloud providers
Diverse PMC/Committers
1M DLs/month
(400% YoY)
800B+
Records/Day
(from even just 1
customer!)
Marquee OSS Users
Community Stats
- Slack Community
Notable newcomers in Q1 - 2022 include:
Total Slack Members:
Engagement Rate:
1.8k
12%
221
Active Members:
40
61
7.7%
8.5%
519
712
+157% YoY
+3.5pts YoY
+262% YoY
● RFC-51: Support Change-Data-Capture (Alibaba)
○ Capture and produce all changes made to the table
○ Implement debezium-like semantics
● RFC-42: Consistent hashing index (Alibaba)
○ Overcome throttling issues for large tables
● RFC-8: Record-level indexing mechanisms (Uber)
○ Improve indexing by direct mapping record keys to files
● RFC-48: Reduce write amplification with Log-Compaction (Uber)
○ Balance write amp and I/O efficiency
Community-Driven RFCs
Community-Driven RFCs (cont’d)
● RFC-61: Snapshot view management (Shopee)
● RFC-52: Secondary index to improve query performance (Alibaba)
● RFC-60: Optimized storage layout for cloud object stores (AWS)
● RFC-12: Efficient migration of large parquet tables to Hudi
(AWS/Onehouse)
MetaServer &
Table Management Service
MetaServer (RFC-36)
● Centralize the metadata management for tables
● Atomic sync: storage <-> metastore
TMS (RFC-43)
● Centralize the orchestration for running table services
● Decouple compute from the writer cluster
MetaServer
MetaServer: Storage
● Table
○ Path, Schema, properties, format, table type, etc
● Partition
○ Path, parameters
● Timeline
○ Instant: timestamp, action type, state
○ Instant metadata: updated partitions, files, write
statistics
● Snapshot
○ File name, size, creation timestamp
MetaServer: Write
MetaServer: Read
MetaServer: Event Bus
MetaServer: Other features
● Lock-provider
○ Storage supports Check-and-Set (CAS)
○ Supports multi-writer
● Schema evolution
○ Server stores schemas with versions and bind to
commits
○ Detect conflicts during commit
● Concurrency management
○ Conflict detection strategies: timestamp, table,
partition, filegroup, file
Table Management Service (TMS)
TMS: Plan Generator
TMS: Job Management
TMS: Stateless Design
MetaServer &
Table Management Service
Proposers: Xiaoyu Geng (ByteDance), Zhaojing Yu (ByteDance)
Reviewers: Vinoth (Onehouse), Prasanna (OneHouse), Shiyan Xu (OneHouse)
Community engagements
● Submit RFCs
● Weekly sync meetings
● RFC reviews
● Implementation discussions & planning
Engage With Our Community
User Docs : https://hudi.apache.org
Technical Wiki : https://cwiki.apache.org/confluence/display/HUDI
Github : https://github.com/apache/hudi/
Twitter : https://twitter.com/apachehudi
Mailing list(s) : dev-subscribe@hudi.apache.org (send an empty email to subscribe)
dev@hudi.apache.org (actual mailing list)
Slack : https://join.slack.com/t/apache-hudi/signup
Who are using
Thanks!
Questions?

Weitere ähnliche Inhalte

Ähnlich wie Apache Hudi: Community-Driven Development

Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 

Ähnlich wie Apache Hudi: Community-Driven Development (20)

PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scale
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Dutch Lotus User Group 2009 - Domino Tuning Presentation
Dutch Lotus User Group 2009 - Domino Tuning PresentationDutch Lotus User Group 2009 - Domino Tuning Presentation
Dutch Lotus User Group 2009 - Domino Tuning Presentation
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 
Mongo db admin_20110329
Mongo db admin_20110329Mongo db admin_20110329
Mongo db admin_20110329
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEO
 
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
 
Bquery Reporting & Analytics Architecture
Bquery Reporting & Analytics ArchitectureBquery Reporting & Analytics Architecture
Bquery Reporting & Analytics Architecture
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
Presto – Today and Beyond – The Open Source SQL Engine for Querying all Data...
 
Slack in the Age of Prometheus
Slack in the Age of PrometheusSlack in the Age of Prometheus
Slack in the Age of Prometheus
 
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
Smartsheet’s Transition to Snowflake and Databricks: The Why and Immediate Im...
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
 

Mehr von Alluxio, Inc.

Mehr von Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Kürzlich hochgeladen

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 

Kürzlich hochgeladen (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 

Apache Hudi: Community-Driven Development

  • 2. Agenda ● Hudi: quick overview - 1 min ● Community-driven features - 5 min ● MetaServer & Table Management Service - 18 min ● Closing - 1 min
  • 4. 1800+ Slack Members 225+ Contributors 1000+ GH Engagers 20+ Committers Pre-installed on 5 cloud providers Diverse PMC/Committers 1M DLs/month (400% YoY) 800B+ Records/Day (from even just 1 customer!) Marquee OSS Users Community Stats
  • 5. - Slack Community Notable newcomers in Q1 - 2022 include: Total Slack Members: Engagement Rate: 1.8k 12% 221 Active Members: 40 61 7.7% 8.5% 519 712 +157% YoY +3.5pts YoY +262% YoY
  • 6. ● RFC-51: Support Change-Data-Capture (Alibaba) ○ Capture and produce all changes made to the table ○ Implement debezium-like semantics ● RFC-42: Consistent hashing index (Alibaba) ○ Overcome throttling issues for large tables ● RFC-8: Record-level indexing mechanisms (Uber) ○ Improve indexing by direct mapping record keys to files ● RFC-48: Reduce write amplification with Log-Compaction (Uber) ○ Balance write amp and I/O efficiency Community-Driven RFCs
  • 7. Community-Driven RFCs (cont’d) ● RFC-61: Snapshot view management (Shopee) ● RFC-52: Secondary index to improve query performance (Alibaba) ● RFC-60: Optimized storage layout for cloud object stores (AWS) ● RFC-12: Efficient migration of large parquet tables to Hudi (AWS/Onehouse)
  • 8. MetaServer & Table Management Service MetaServer (RFC-36) ● Centralize the metadata management for tables ● Atomic sync: storage <-> metastore TMS (RFC-43) ● Centralize the orchestration for running table services ● Decouple compute from the writer cluster
  • 10. MetaServer: Storage ● Table ○ Path, Schema, properties, format, table type, etc ● Partition ○ Path, parameters ● Timeline ○ Instant: timestamp, action type, state ○ Instant metadata: updated partitions, files, write statistics ● Snapshot ○ File name, size, creation timestamp
  • 14. MetaServer: Other features ● Lock-provider ○ Storage supports Check-and-Set (CAS) ○ Supports multi-writer ● Schema evolution ○ Server stores schemas with versions and bind to commits ○ Detect conflicts during commit ● Concurrency management ○ Conflict detection strategies: timestamp, table, partition, filegroup, file
  • 19. MetaServer & Table Management Service Proposers: Xiaoyu Geng (ByteDance), Zhaojing Yu (ByteDance) Reviewers: Vinoth (Onehouse), Prasanna (OneHouse), Shiyan Xu (OneHouse) Community engagements ● Submit RFCs ● Weekly sync meetings ● RFC reviews ● Implementation discussions & planning
  • 20. Engage With Our Community User Docs : https://hudi.apache.org Technical Wiki : https://cwiki.apache.org/confluence/display/HUDI Github : https://github.com/apache/hudi/ Twitter : https://twitter.com/apachehudi Mailing list(s) : dev-subscribe@hudi.apache.org (send an empty email to subscribe) dev@hudi.apache.org (actual mailing list) Slack : https://join.slack.com/t/apache-hudi/signup