SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
kafka⽂文件系统设计 
李志涛 
lizhitao@meituan.com 
平台业务部移动后台组
Kafka简要说明 
topic中partition存储分布 
partiton⽂文件存储⽅方式 
partiton中segment⽂文件存储结构 
如何在partition中快速定位segment file 
如何在segment file查找msg chunk 
实际效果
Kafka简要说明 
Producer 1 Producer 2 
Broker 1 Broker 2 Broker 3 Broker 4 
push-group 
Consumer 2 
Zookeeper 
关键特⾊色 
• 可伸缩架构 
• ⾼高吞吐量 
• consumer⾃自动负载均衡 
• ⽀支持集群多副本 
Consumer 3 
topic1/part-0 
/part-4 
topic1/part-1 
/part-5 
topic1/part-2 
/part-6 
topic1/part-3 
/part-7
topic中partition存储分布 
topic名称为:report_push 
report_push-0 
report_push 
report_push-1 
report_push-2 
report_push-3 
• 在kafka⽂文件系统中,同⼀一个topic下有多个不同partition,每个partition创建⼀一个⺫⽬目录。
partiton⽂文件存储⽅方式 
上⾯面数字表⽰示记录条数 
下⾯面表⽰示⽂文件⼤大⼩小 
1 501 1001 1501 
• 每个partion(⺫⽬目录)相当于⼀一个巨型⽂文件被平均分配到多个⼤大⼩小相等的多个segment(段)⽂文件 
中。但每个段segment file消息数量不⼀一定相等,这种特性⽅方便old segment file快速被 
删除。 
• 每个partiton只需要⽀支持顺序读写就⾏行了,segment⽂文件⽣生命周期由服务端配置参数决定。 
• ⼩小结: 
主要⺫⽬目的就是提⾼高磁盘利⽤用率和消息处理性能。 
100GB 
500MB 500MB 
……… 
500MB 
segment 
file-0 
segment 
file-1 
segment 
file-2 
分块存储 
或更⼤大 
partition
partiton中segment⽂文件存储结构 
下⾯面介绍⼀一下partion⽂文件存储中segment file组成 
结构。⼀一个商业化消息队列的性能好坏,其⽂文件系 
统存储结构设计是衡量⼀一个消息队列服务程序最关 
键指标之⼀一,他也是消息队列中最核⼼心且最能体现 
消息队列技术⽔水平的部分。在本节中我们将⾛走进 
segment file内部⼀一探究竟。 
segment file组成:由2⼤大部分组成,分别为 
segment data file和segment index file,此2个⽂文 
件⼀一⼀一对应,成对出现.
segment中 index — data file对应关系 
00000000000000000000.log 
消息位置 
00000000000000000000.index message-chunk1 
记录message-chunk1物理位置 
记录message-chunk3物理位置 
记录message-chunk6物理位置 
记录message-chunk8物理位置 
message-chunk2 
message-chunk3 
message-chunk4 
message-chunk5 
message-chunk6 
message-chunk7 
message-chunk8 
消息位置 
消息位置 
消息位置 
⽂文件命名规则:partion全局的第⼀一个segment从0开 
始,后续的segment⽂文件名为上⼀一个全局的partion 
的offset(偏移记录数)加1. 
index为稀疏索引结构,并不存储每条记录的元数据信息,⽽而是与单条或多 
条消息⼤大⼩小⽐比较,如果总消息⼤大⼩小⼤大于该阀值才写⼀一次index,默认阀值 
4096字节
partiton中segment⽂文件存储结构-index 
00000000000000000000.index 
索引⽂文件存储结构: 
每次记录相应log⽂文件记录的相对条数和物理偏移位置位置,共8bytes 
4 byte 当前segment file offset - last seg file offset记录条数 offset 
4 byte 对应segment file物理偏移地址 position 
………
part中segment⽂文件存储结构-data file 
00000000000000000000.log 
数据⽂文件存储msg chunk记录结构: 
4 byte CRC32 
1 byte “magic" 
1 byte “attributes" 
4 byte key length 
K byte key 可选 
4 byte payload length 
val bytes msgs payload 
………. 
chunk data 
message1 
message2 
message3 
message4 
message5 
message6 
8 byte offset 
4 byte chunk size 
每条消息结构 
index 
record size 
data 
msg chunk内 
每条消息索引 
位置从1开始 
⼀一个消息(message chunk)数据块可能包含多条消息,但同⼀一个数据块中的消息只有⼀一 
个offset(partiions第多少msg chunk),所以当⼀一个消息块有多条数据处理完部分数据发 
⽣生异常时,消费者重新去取数据,就会再次取得这个数据块,然后消费过的数据就会被 
重新消费。
数据库稀疏索引例⼦子 
稀疏索引只为数据⽂文件的每个存储块设⼀一个键-指针对,它⽐比稠密索引节省了更多 
的存储空间,但查找给定值的记录需更多的时间。只有当数据⽂文件是按照某个查 
找键排序时,在该查找键上建⽴立的稀疏索引才能被使⽤用,⽽而稠密索引则可以应⽤用 
在任何的查找键。如图2所⽰示,稀疏索引只为每个存储块设⼀一个键-指针对。键值 
是每个数据块中第⼀一个记录的对应值。
如何在partition中快速定位segment file 
同⼀一个topic下有不同分区,每个分区下⾯面会划分为多个(段)⽂文件,只有⼀一个当前⽂文件在 
写,其他⽂文件只读。当写满⼀一个⽂文件(写满的意思是达到设定值)则切换⽂文件,新建 
⼀一个当前⽂文件⽤用来写,⽼老的当前⽂文件切换为只读。⽂文件的命名以起始偏移量来命名。 
看⼀一个例⼦子,假设report_push这个topic下的0-0分区可能有以下这些⽂文件: 
• 00000000000000000000.index 
• 00000000000000000000.log 
• 00000000000000368769.index 
• 00000000000000368769.log 
• 00000000000000737337.index 
• 00000000000000737337.log 
• 00000000000001105814.index 
• 00000000000001105814.log 
……………….. 
其中 00000000000000000000.index表⽰示最开始的⽂文件,起始偏移量为0.第⼆二个⽂文件 
00000000000000368769.index的消息量起始偏移量为368769.同样,第三个⽂文件 
00000000000000737337.index的起始偏移量为737337.以起始偏移量命名并排序这些 
⽂文件,那么当消费者要拉取某个消息起始偏移量位置的数据变的相当简单,只要根据 
传上来的offset**⼆二分查找**⽂文件列表,定位到具体⽂文件,然后将绝对offset减去⽂文件的 
起始节点转化为相对offset,即可开始传输数据。 
例如,同样以上⾯面的例⼦子为例,假设消费者想抓取从第368969消息位置开始的数据, 
则根据368969⼆二分查找,定位到00000000000000368769.log这个⽂文件(368969在 
368769和737337之间),根据索引⽂文件⼆二分搜索可以确定读取数据最⼤大⼤大⼩小
例⼦子
如何在segment file查找msg chunk 
00000000000000000000.index 
1,0 
3,4597 
6,9807 
8,12345 
00000000000000000000.log 
message-chunk1 
message-chunk2 
message-chunk3 
message-chunk4 
message-chunk5 
message-chunk6 
message-chunk7 
message-chunk8 
offset = 1 
offset = 8 
0 
2039 
4597 
6830 
7912 
9807 
1108 
12345 
message-chunkN position
实际效果
kafka的⽂文件系统结构—>总结 
⾼高效⽂文件系统特点 
⼀一个⼤大⽂文件分成多个⼩小⽂文件段。 
多个⼩小⽂文件段,容易定时清除或删除已经消费完⽂文件,减 
少磁盘占⽤用。 
index全部映射到memory直接操作,避免segment file被交 
换到磁盘增加IO操作次数。 
根据索引元数据信息,可以确定consumer每次批量拉取最 
⼤大msg chunk数量。 
索引⽂文件元数据存储⽤用的是相对前⼀一个segment file的 
offset存储,节省空间⼤大⼩小
参考 
kafka-0.8.1-src源码研究 
http://kafka.apache.org/ 
http://blog.csdn.net/lizhitao
Thank you! 
Any Quest?

Weitere ähnliche Inhalte

Was ist angesagt?

Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)Ontico
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File SystemsManish Chopra
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformancesprdd
 
B tree file system
B tree file systemB tree file system
B tree file systemDinesh Gupta
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemAnand Kulkarni
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaYahoo Developer Network
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemMilad Sobhkhiz
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introductioninjae yeo
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSudheer Kondla
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyMongoDB
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
 

Was ist angesagt? (20)

Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 
Rhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformanceRhel cluster gfs_improveperformance
Rhel cluster gfs_improveperformance
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
B tree file system
B tree file systemB tree file system
B tree file system
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hdfs
HdfsHdfs
Hdfs
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introduction
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data Safety
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 

Ähnlich wie Kafka文件系统设计

Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Yaroslav Tkachenko
 
Netapp Deduplication concepts
Netapp Deduplication conceptsNetapp Deduplication concepts
Netapp Deduplication conceptsSaroj Sahu
 
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty GamesDesigning Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty GamesYaroslav Tkachenko
 
Kafka Fundamentals
Kafka FundamentalsKafka Fundamentals
Kafka FundamentalsKetan Keshri
 
White Paper: Using Perforce 'Attributes' for Managing Game Asset Metadata
White Paper: Using Perforce 'Attributes' for Managing Game Asset MetadataWhite Paper: Using Perforce 'Attributes' for Managing Game Asset Metadata
White Paper: Using Perforce 'Attributes' for Managing Game Asset MetadataPerforce
 
Btrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current StatusBtrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current StatusLukáš Czerner
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackRich Lee
 
Btrfs by Chris Mason
Btrfs by Chris MasonBtrfs by Chris Mason
Btrfs by Chris MasonTerry Wang
 
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례NAVER D2
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...IRJET Journal
 
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...Amazon Web Services
 
Outlook PST Files Serving Role of Outlook Data Files
Outlook PST Files Serving Role of Outlook Data FilesOutlook PST Files Serving Role of Outlook Data Files
Outlook PST Files Serving Role of Outlook Data FilesforensicEmailAnalysis
 
Under the Covers: Segments of Apache Kafka With Kirill Kulikov | Current 2022
Under the Covers: Segments of Apache Kafka With Kirill Kulikov | Current 2022Under the Covers: Segments of Apache Kafka With Kirill Kulikov | Current 2022
Under the Covers: Segments of Apache Kafka With Kirill Kulikov | Current 2022HostedbyConfluent
 
Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Tony Pearson
 
What is Object storage ?
What is Object storage ?What is Object storage ?
What is Object storage ?Nabil Kassi
 

Ähnlich wie Kafka文件系统设计 (20)

Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
 
Kafka overview v0.1
Kafka overview v0.1Kafka overview v0.1
Kafka overview v0.1
 
XFS.ppt
XFS.pptXFS.ppt
XFS.ppt
 
Ext filesystem4
Ext filesystem4Ext filesystem4
Ext filesystem4
 
Netapp Deduplication concepts
Netapp Deduplication conceptsNetapp Deduplication concepts
Netapp Deduplication concepts
 
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty GamesDesigning Scalable and Extendable Data Pipeline for Call Of Duty Games
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
 
Kafka Fundamentals
Kafka FundamentalsKafka Fundamentals
Kafka Fundamentals
 
White Paper: Using Perforce 'Attributes' for Managing Game Asset Metadata
White Paper: Using Perforce 'Attributes' for Managing Game Asset MetadataWhite Paper: Using Perforce 'Attributes' for Managing Game Asset Metadata
White Paper: Using Perforce 'Attributes' for Managing Game Asset Metadata
 
Btrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current StatusBtrfs: Design, Implementation and the Current Status
Btrfs: Design, Implementation and the Current Status
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
Btrfs by Chris Mason
Btrfs by Chris MasonBtrfs by Chris Mason
Btrfs by Chris Mason
 
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
[2C6]SQLite DB 의 입출력 특성분석 : Android 와 Tizen 사례
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
 
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
AWS re:Invent 2016: Real-Time Data Exploration and Analytics with Amazon Elas...
 
Outlook PST Files Serving Role of Outlook Data Files
Outlook PST Files Serving Role of Outlook Data FilesOutlook PST Files Serving Role of Outlook Data Files
Outlook PST Files Serving Role of Outlook Data Files
 
Under the Covers: Segments of Apache Kafka With Kirill Kulikov | Current 2022
Under the Covers: Segments of Apache Kafka With Kirill Kulikov | Current 2022Under the Covers: Segments of Apache Kafka With Kirill Kulikov | Current 2022
Under the Covers: Segments of Apache Kafka With Kirill Kulikov | Current 2022
 
Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4Inter connect2016 yss1841-cloud-storage-options-v4
Inter connect2016 yss1841-cloud-storage-options-v4
 
What is Object storage ?
What is Object storage ?What is Object storage ?
What is Object storage ?
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Kafka文件系统设计

  • 2. Kafka简要说明 topic中partition存储分布 partiton⽂文件存储⽅方式 partiton中segment⽂文件存储结构 如何在partition中快速定位segment file 如何在segment file查找msg chunk 实际效果
  • 3. Kafka简要说明 Producer 1 Producer 2 Broker 1 Broker 2 Broker 3 Broker 4 push-group Consumer 2 Zookeeper 关键特⾊色 • 可伸缩架构 • ⾼高吞吐量 • consumer⾃自动负载均衡 • ⽀支持集群多副本 Consumer 3 topic1/part-0 /part-4 topic1/part-1 /part-5 topic1/part-2 /part-6 topic1/part-3 /part-7
  • 4. topic中partition存储分布 topic名称为:report_push report_push-0 report_push report_push-1 report_push-2 report_push-3 • 在kafka⽂文件系统中,同⼀一个topic下有多个不同partition,每个partition创建⼀一个⺫⽬目录。
  • 5. partiton⽂文件存储⽅方式 上⾯面数字表⽰示记录条数 下⾯面表⽰示⽂文件⼤大⼩小 1 501 1001 1501 • 每个partion(⺫⽬目录)相当于⼀一个巨型⽂文件被平均分配到多个⼤大⼩小相等的多个segment(段)⽂文件 中。但每个段segment file消息数量不⼀一定相等,这种特性⽅方便old segment file快速被 删除。 • 每个partiton只需要⽀支持顺序读写就⾏行了,segment⽂文件⽣生命周期由服务端配置参数决定。 • ⼩小结: 主要⺫⽬目的就是提⾼高磁盘利⽤用率和消息处理性能。 100GB 500MB 500MB ……… 500MB segment file-0 segment file-1 segment file-2 分块存储 或更⼤大 partition
  • 6. partiton中segment⽂文件存储结构 下⾯面介绍⼀一下partion⽂文件存储中segment file组成 结构。⼀一个商业化消息队列的性能好坏,其⽂文件系 统存储结构设计是衡量⼀一个消息队列服务程序最关 键指标之⼀一,他也是消息队列中最核⼼心且最能体现 消息队列技术⽔水平的部分。在本节中我们将⾛走进 segment file内部⼀一探究竟。 segment file组成:由2⼤大部分组成,分别为 segment data file和segment index file,此2个⽂文 件⼀一⼀一对应,成对出现.
  • 7. segment中 index — data file对应关系 00000000000000000000.log 消息位置 00000000000000000000.index message-chunk1 记录message-chunk1物理位置 记录message-chunk3物理位置 记录message-chunk6物理位置 记录message-chunk8物理位置 message-chunk2 message-chunk3 message-chunk4 message-chunk5 message-chunk6 message-chunk7 message-chunk8 消息位置 消息位置 消息位置 ⽂文件命名规则:partion全局的第⼀一个segment从0开 始,后续的segment⽂文件名为上⼀一个全局的partion 的offset(偏移记录数)加1. index为稀疏索引结构,并不存储每条记录的元数据信息,⽽而是与单条或多 条消息⼤大⼩小⽐比较,如果总消息⼤大⼩小⼤大于该阀值才写⼀一次index,默认阀值 4096字节
  • 8. partiton中segment⽂文件存储结构-index 00000000000000000000.index 索引⽂文件存储结构: 每次记录相应log⽂文件记录的相对条数和物理偏移位置位置,共8bytes 4 byte 当前segment file offset - last seg file offset记录条数 offset 4 byte 对应segment file物理偏移地址 position ………
  • 9. part中segment⽂文件存储结构-data file 00000000000000000000.log 数据⽂文件存储msg chunk记录结构: 4 byte CRC32 1 byte “magic" 1 byte “attributes" 4 byte key length K byte key 可选 4 byte payload length val bytes msgs payload ………. chunk data message1 message2 message3 message4 message5 message6 8 byte offset 4 byte chunk size 每条消息结构 index record size data msg chunk内 每条消息索引 位置从1开始 ⼀一个消息(message chunk)数据块可能包含多条消息,但同⼀一个数据块中的消息只有⼀一 个offset(partiions第多少msg chunk),所以当⼀一个消息块有多条数据处理完部分数据发 ⽣生异常时,消费者重新去取数据,就会再次取得这个数据块,然后消费过的数据就会被 重新消费。
  • 10. 数据库稀疏索引例⼦子 稀疏索引只为数据⽂文件的每个存储块设⼀一个键-指针对,它⽐比稠密索引节省了更多 的存储空间,但查找给定值的记录需更多的时间。只有当数据⽂文件是按照某个查 找键排序时,在该查找键上建⽴立的稀疏索引才能被使⽤用,⽽而稠密索引则可以应⽤用 在任何的查找键。如图2所⽰示,稀疏索引只为每个存储块设⼀一个键-指针对。键值 是每个数据块中第⼀一个记录的对应值。
  • 11. 如何在partition中快速定位segment file 同⼀一个topic下有不同分区,每个分区下⾯面会划分为多个(段)⽂文件,只有⼀一个当前⽂文件在 写,其他⽂文件只读。当写满⼀一个⽂文件(写满的意思是达到设定值)则切换⽂文件,新建 ⼀一个当前⽂文件⽤用来写,⽼老的当前⽂文件切换为只读。⽂文件的命名以起始偏移量来命名。 看⼀一个例⼦子,假设report_push这个topic下的0-0分区可能有以下这些⽂文件: • 00000000000000000000.index • 00000000000000000000.log • 00000000000000368769.index • 00000000000000368769.log • 00000000000000737337.index • 00000000000000737337.log • 00000000000001105814.index • 00000000000001105814.log ……………….. 其中 00000000000000000000.index表⽰示最开始的⽂文件,起始偏移量为0.第⼆二个⽂文件 00000000000000368769.index的消息量起始偏移量为368769.同样,第三个⽂文件 00000000000000737337.index的起始偏移量为737337.以起始偏移量命名并排序这些 ⽂文件,那么当消费者要拉取某个消息起始偏移量位置的数据变的相当简单,只要根据 传上来的offset**⼆二分查找**⽂文件列表,定位到具体⽂文件,然后将绝对offset减去⽂文件的 起始节点转化为相对offset,即可开始传输数据。 例如,同样以上⾯面的例⼦子为例,假设消费者想抓取从第368969消息位置开始的数据, 则根据368969⼆二分查找,定位到00000000000000368769.log这个⽂文件(368969在 368769和737337之间),根据索引⽂文件⼆二分搜索可以确定读取数据最⼤大⼤大⼩小
  • 13. 如何在segment file查找msg chunk 00000000000000000000.index 1,0 3,4597 6,9807 8,12345 00000000000000000000.log message-chunk1 message-chunk2 message-chunk3 message-chunk4 message-chunk5 message-chunk6 message-chunk7 message-chunk8 offset = 1 offset = 8 0 2039 4597 6830 7912 9807 1108 12345 message-chunkN position
  • 15. kafka的⽂文件系统结构—>总结 ⾼高效⽂文件系统特点 ⼀一个⼤大⽂文件分成多个⼩小⽂文件段。 多个⼩小⽂文件段,容易定时清除或删除已经消费完⽂文件,减 少磁盘占⽤用。 index全部映射到memory直接操作,避免segment file被交 换到磁盘增加IO操作次数。 根据索引元数据信息,可以确定consumer每次批量拉取最 ⼤大msg chunk数量。 索引⽂文件元数据存储⽤用的是相对前⼀一个segment file的 offset存储,节省空间⼤大⼩小
  • 17. Thank you! Any Quest?