Kafka - A little introduction

•

6 gefällt mir•2,391 views

A brief run through of Kafka and some of it's interesting characteristics that make it a great messaging system for collecting and aggregating data.

Technologie

Disk/Memory Performance
1000M

100M

10M

1M
Read values/second

100,000

10,000

1,000

100

10

1 Disk SSD Memory

Random access
Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874

Disk/Memory Performance
1000M

100M

10M

1M
Read values/second

100,000
Sequential disk read
10,000
faster than random
1,000

100
memory read
10

1 Disk SSD Memory

Random access
Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874

Length Magic Value Checksum Payload

4 bytes 1 byte 4 bytes n bytes

Token
Offset: 0 Input
Broker: kafka.local
Topic: Testing

MR Job
Output Output

Offset: 130098
Broker: kafka.local
Topic: Testing

Sequence File

Useful Things

• http://incubator.apache.org/kafka/
• https://github.com/pingles/clj-kafka

Empfohlen

System Capa Planning_DBA oracle edu엑셈

Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013odnoklassniki.ru

Clase4 (consola linux)Miguel Eduardo Luces

Cassandra Day SV 2014: Basic Operations with Apache CassandraDataStax Academy

unixtoolboxwensheng wei

KCC_Final.pdfOleg Sehelin

Docker 101Josué Neis

Cassandra Performance: Past, present & futureAcunu

Empfohlen

System Capa Planning_DBA oracle edu엑셈

Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013odnoklassniki.ru

Clase4 (consola linux)Miguel Eduardo Luces

Cassandra Day SV 2014: Basic Operations with Apache CassandraDataStax Academy

unixtoolboxwensheng wei

KCC_Final.pdfOleg Sehelin

Docker 101Josué Neis

Cassandra Performance: Past, present & futureAcunu

Instal vnc in cent osManusia Tenan

IscsiMd Shihab

Scaling IO-bound microservicesSalo Shp

ubunturefwensheng wei

Container security: seccomp, network e namespacesKiratech

JavaScript is the new black - Why Node.js is going to rock your world - Web 2...Tom Croucher

Disk suit 4 setup and installationppratish

FreeBSD under DigitalOcean VPSRyo ONODERA

Disruptor 2015-12-22 @ java.ilAmir Langer

UnixtoolboxLILIANA FERNANDEZ

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Slack Application Development 101 Slidespraypatel2

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Artificial Intelligence: Facts and MythsJoaquim Jorge

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Weitere ähnliche Inhalte

Was ist angesagt?

Instal vnc in cent osManusia Tenan

IscsiMd Shihab

Scaling IO-bound microservicesSalo Shp

ubunturefwensheng wei

Container security: seccomp, network e namespacesKiratech

JavaScript is the new black - Why Node.js is going to rock your world - Web 2...Tom Croucher

Disk suit 4 setup and installationppratish

FreeBSD under DigitalOcean VPSRyo ONODERA

Disruptor 2015-12-22 @ java.ilAmir Langer

UnixtoolboxLILIANA FERNANDEZ

Was ist angesagt? (10)

Instal vnc in cent os

Iscsi

Scaling IO-bound microservices

ubunturef

Container security: seccomp, network e namespaces

JavaScript is the new black - Why Node.js is going to rock your world - Web 2...

Disk suit 4 setup and installation

FreeBSD under DigitalOcean VPS

Disruptor 2015-12-22 @ java.il

Unixtoolbox

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Slack Application Development 101 Slidespraypatel2

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Artificial Intelligence: Facts and MythsJoaquim Jorge

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Real Time Object Detection Using Open CVKhem

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Scaling API-first – The story of a global engineering organization

Slack Application Development 101 Slides

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Artificial Intelligence: Facts and Myths

Data Cloud, More than a CDP by Matt Robison

08448380779 Call Girls In Civil Lines Women Seeking Men

A Domino Admins Adventures (Engage 2024)

Automating Google Workspace (GWS) & more with Apps Script

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

GenCyber Cyber Security Day Presentation

Advantages of Hiring UIUX Design Service Providers for Your Business

Boost PC performance: How more available memory can improve productivity

What Are The Drone Anti-jamming Systems Technology?

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Real Time Object Detection Using Open CV

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Kafka - A little introduction

1. Kafka A little introduction

3. Pub-Sub Messaging System

8. Distributed

10. Performance

11. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874

12. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874

13. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874

14. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 Sequential disk read 10,000 faster than random 1,000 100 memory read 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874

15. Persistent

16.

17.

18.

19. Length Magic Value Checksum Payload 4 bytes 1 byte 4 bytes n bytes

20.

21.

22.

23. Token Offset: 0 Input Broker: kafka.local Topic: Testing MR Job Output Output Offset: 130098 Broker: kafka.local Topic: Testing Sequence File

24. Token Offset: 0 Input Broker: kafka.local Topic: Testing MR Job Output Output Offset: 130098 Broker: kafka.local Topic: Testing Sequence File

25.

26. Useful Things • http://incubator.apache.org/kafka/ • https://github.com/pingles/clj-kafka

Hinweis der Redaktion

\n
built by linkedin to process + store high-volume activity stream data, but its really a general use messaging system...\n\n
at it&#x2019;s heart, its a pub-sub messaging system...\n
It starts with a broker\n
Publishers connect to the broker\n
and send their messages, \n
So we connect some consumers and they can pull messages.\n\nnote when they connect, we&#x2019;ll receive all messages for a topic, not just since they&#x2019;ve connected more on that later...\n
but its also distributed, which is to say...\n
we can have multiple brokers in multiple places and aggregate together...\n\ninternally we can also partition within topics to allow parallel consumption, but thats for another talk...\n
before we get into what makes it particularly different (persistence), its useful to understand some of the engineering decisions behind how it works.\n\nperformance is interesting because the behaviour of disks / memory has informed the way kafka has been built to embrace disk persistence\n
research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS&#x2019;s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS&#x2019;s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS&#x2019;s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS&#x2019;s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
\n
it starts with a topic, a text description for the messages contained within. we use it to describe how to deserialize the message bytes\n
so we send a message to the topic, what happens?\n
kafka creates a file\nand it persists the message, which is to say it hands it off to the O/S to write\n\nfiles are just sets of bytes, nothing clever\n\ninternally it abstracts the collection of message bytes into a messageset, which is then backed by a file\n\nso what does each message look like...\n
so, our message length is n - 9 bytes\n\nwith a 91 byte payload we have a 100 byte message.\n\nwhich means our next message would start at offset 100\n
and we can see our offsets at the bottom...\n
so we have the offsets which lets us send all messages to consumers, not just those that were sent after they connected... \n
up to the consumer to remember what they&#x2019;ve consumed, but means you can re-consume an entire set of messages easily, which is very useful when integrating with long-term storage like HDFS...\n\nquick look at the way it works\n
\nour input to the hadoop job is a token file that specifies the offset to read from, the topic etc.\n\nhaving read the token, the mapper connects, and consumes messages from a given offset\n\nthe mapper outputs 2 sets of data- the mapped output, such as the message payloads, and an updated token file with the last read offset.\n\nthis is the key, successful completion of the job results in new metadata for the next run and the output data\n\nmeans that if the job fails we can re-run and it&#x2019;ll consume from the last consumed offset\n
the newly created output becomes the next input\n
and this is why kafka is an interesting messaging system\n\nsuitable for batch and realtime\n
\n