SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Kafka
A little introduction
Pub-Sub Messaging System
Distributed
Performance
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000
                                             Sequential disk read
                      10,000
                                             faster than random
                       1,000

                        100
                                                memory read
                         10

                          1           Disk          SSD                   Memory


                               Random access
                               Sequential Access          Source: http://queue.acm.org/detail.cfm?id=1563874
Persistent
Length    Magic Value Checksum   Payload


4 bytes     1 byte     4 bytes   n bytes
Token
Offset: 0             Input
Broker: kafka.local
Topic: Testing


                                       MR Job
                        Output                  Output


                      Offset: 130098
                      Broker: kafka.local
                      Topic: Testing

                                                 Sequence File
Token
Offset: 0             Input
Broker: kafka.local
Topic: Testing


                                       MR Job
                        Output                  Output


                      Offset: 130098
                      Broker: kafka.local
                      Topic: Testing

                                                 Sequence File
Useful Things


• http://incubator.apache.org/kafka/
• https://github.com/pingles/clj-kafka

Weitere ähnliche Inhalte

Was ist angesagt?

Instal vnc in cent os
Instal vnc in cent osInstal vnc in cent os
Instal vnc in cent osManusia Tenan
 
Scaling IO-bound microservices
Scaling IO-bound microservicesScaling IO-bound microservices
Scaling IO-bound microservicesSalo Shp
 
Container security: seccomp, network e namespaces
Container security: seccomp, network e namespacesContainer security: seccomp, network e namespaces
Container security: seccomp, network e namespacesKiratech
 
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...Tom Croucher
 
Disk suit 4 setup and installation
Disk suit 4 setup and installationDisk suit 4 setup and installation
Disk suit 4 setup and installationppratish
 
FreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPSFreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPSRyo ONODERA
 
Disruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.ilDisruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.ilAmir Langer
 

Was ist angesagt? (10)

Instal vnc in cent os
Instal vnc in cent osInstal vnc in cent os
Instal vnc in cent os
 
Iscsi
IscsiIscsi
Iscsi
 
Scaling IO-bound microservices
Scaling IO-bound microservicesScaling IO-bound microservices
Scaling IO-bound microservices
 
ubunturef
ubunturefubunturef
ubunturef
 
Container security: seccomp, network e namespaces
Container security: seccomp, network e namespacesContainer security: seccomp, network e namespaces
Container security: seccomp, network e namespaces
 
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
 
Disk suit 4 setup and installation
Disk suit 4 setup and installationDisk suit 4 setup and installation
Disk suit 4 setup and installation
 
FreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPSFreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPS
 
Disruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.ilDisruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.il
 
Unixtoolbox
UnixtoolboxUnixtoolbox
Unixtoolbox
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Kafka - A little introduction

  • 2.
  • 4.
  • 5.
  • 6.
  • 7.
  • 9.
  • 11. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 12. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 13. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 14. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 Sequential disk read 10,000 faster than random 1,000 100 memory read 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 16.
  • 17.
  • 18.
  • 19. Length Magic Value Checksum Payload 4 bytes 1 byte 4 bytes n bytes
  • 20.
  • 21.
  • 22.
  • 23. Token Offset: 0 Input Broker: kafka.local Topic: Testing MR Job Output Output Offset: 130098 Broker: kafka.local Topic: Testing Sequence File
  • 24. Token Offset: 0 Input Broker: kafka.local Topic: Testing MR Job Output Output Offset: 130098 Broker: kafka.local Topic: Testing Sequence File
  • 25.
  • 26. Useful Things • http://incubator.apache.org/kafka/ • https://github.com/pingles/clj-kafka

Hinweis der Redaktion

  1. \n
  2. built by linkedin to process + store high-volume activity stream data, but its really a general use messaging system...\n\n
  3. at it’s heart, its a pub-sub messaging system...\n
  4. It starts with a broker\n
  5. Publishers connect to the broker\n
  6. and send their messages, \n
  7. So we connect some consumers and they can pull messages.\n\nnote when they connect, we’ll receive all messages for a topic, not just since they’ve connected more on that later...\n
  8. but its also distributed, which is to say...\n
  9. we can have multiple brokers in multiple places and aggregate together...\n\ninternally we can also partition within topics to allow parallel consumption, but thats for another talk...\n
  10. before we get into what makes it particularly different (persistence), its useful to understand some of the engineering decisions behind how it works.\n\nperformance is interesting because the behaviour of disks / memory has informed the way kafka has been built to embrace disk persistence\n
  11. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  12. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  13. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  14. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  15. \n
  16. it starts with a topic, a text description for the messages contained within. we use it to describe how to deserialize the message bytes\n
  17. so we send a message to the topic, what happens?\n
  18. kafka creates a file\nand it persists the message, which is to say it hands it off to the O/S to write\n\nfiles are just sets of bytes, nothing clever\n\ninternally it abstracts the collection of message bytes into a messageset, which is then backed by a file\n\nso what does each message look like...\n
  19. so, our message length is n - 9 bytes\n\nwith a 91 byte payload we have a 100 byte message.\n\nwhich means our next message would start at offset 100\n
  20. and we can see our offsets at the bottom...\n
  21. so we have the offsets which lets us send all messages to consumers, not just those that were sent after they connected... \n
  22. up to the consumer to remember what they’ve consumed, but means you can re-consume an entire set of messages easily, which is very useful when integrating with long-term storage like HDFS...\n\nquick look at the way it works\n
  23. \nour input to the hadoop job is a token file that specifies the offset to read from, the topic etc.\n\nhaving read the token, the mapper connects, and consumes messages from a given offset\n\nthe mapper outputs 2 sets of data- the mapped output, such as the message payloads, and an updated token file with the last read offset.\n\nthis is the key, successful completion of the job results in new metadata for the next run and the output data\n\nmeans that if the job fails we can re-run and it’ll consume from the last consumed offset\n
  24. the newly created output becomes the next input\n
  25. and this is why kafka is an interesting messaging system\n\nsuitable for batch and realtime\n
  26. \n