SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Michael Kehoe
Senior Site Reliability Engineer
LinkedIn
LinkedIn’s Big Data Pipeline
with Kafka, Hadoop and
Couchbase
3
$ whoami
Michael Kehoe
• Sr Site Reliability Engineer
(SRE)
• Member of CBVT
• B.E. (Electrical Engineering)
from
the University of Queensland,
Australia
4
Kafka @ LinkedIn
• Kafka was created by LinkedIn
• Kafka is a publish-subscribe
system as a distributed commit
log
• Processes 500+ TB/ day (~500
billion messages)
5
LinkedIn’s use of Kafka
• Monitoring
• Pub-Sub Messaging
• Analytics
• Building block for (log) distributed application
• Samza
• Espresso
• Pinot
Kafka to Hadoop (Analytics)
6
Use Case
• LinkedIn tracks data to better understand how members use our
products
• Information such as which page got viewed and which content got
clicked on are sent into a Kafka cluster in each data center
• Some of these events are all centrally collected and pushed onto
our Hadoop grid for analysis and daily report generation
7
Couchbase @ LinkedIn
• About 80 separate services with one or more clusters in multiple data
centers
• Up to ~70 servers in a cluster
• Single & Multi-tenant clusters
8
Hadoop to Couchbase
• Our primary use-case for Hadoop  Couchbase is for building
(warming) / restoring Couchbase buckets
• LinkedIn built it’s own in-house solution to work with our ETL
processes etc
Jobs Cluster
9
Clusters & Numbers
• Used for read-scaling, > 150k QPS, 27 node clusters
• We use Hadoop to pre-build data by partition
• Couchbase average latency is 2-3ms
• 99th percentile is ~8 - 12ms
Questions?
10
Thank You
©2014 LinkedIn Corporation. All Rights Reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
HostedbyConfluent
 
The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...
The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...
The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...
confluent
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
confluent
 
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka EcosystemReal-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
confluent
 

Was ist angesagt? (20)

Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
Kafka at the core of an AIOps pipeline | Sunanda Kommula, Selector.ai and Ala...
 
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
How to Discover, Visualize, Catalog, Share and Reuse your Kafka Streams (Jona...
 
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, MicrosoftAzure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
 
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
 
Apache Kafka: Past, Present and Future
Apache Kafka: Past, Present and FutureApache Kafka: Past, Present and Future
Apache Kafka: Past, Present and Future
 
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
 
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
 
Death of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projectsDeath of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projects
 
Developing custom transformation in the Kafka connect to minimize data redund...
Developing custom transformation in the Kafka connect to minimize data redund...Developing custom transformation in the Kafka connect to minimize data redund...
Developing custom transformation in the Kafka connect to minimize data redund...
 
0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, Porsche
0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, Porsche0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, Porsche
0-330km/h: Porsche's Data Streaming Journey | Sridhar Mamella, Porsche
 
Kafka Summit NYC 2017 - Achieving Predictability and Compliance with BNY Mell...
Kafka Summit NYC 2017 - Achieving Predictability and Compliance with BNY Mell...Kafka Summit NYC 2017 - Achieving Predictability and Compliance with BNY Mell...
Kafka Summit NYC 2017 - Achieving Predictability and Compliance with BNY Mell...
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHow a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
 
The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...
The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...
The Migration to Event-Driven Microservices (Adam Bellemare, Flipp) Kafka Sum...
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
 
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
Streaming data in the cloud with Confluent and MongoDB Atlas | Robert Waters,...
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
Real-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka EcosystemReal-Time Dynamic Data Export Using the Kafka Ecosystem
Real-Time Dynamic Data Export Using the Kafka Ecosystem
 
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kaf...
 
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
 

Andere mochten auch

How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the day
Pooja Tangi
 

Andere mochten auch (19)

SRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level TalentSRECon USA 2016: Growing your Entry Level Talent
SRECon USA 2016: Growing your Entry Level Talent
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
 
SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
Software reliability tools and common software errors
Software reliability tools and common software errorsSoftware reliability tools and common software errors
Software reliability tools and common software errors
 
How TPM saves the day
How TPM saves the dayHow TPM saves the day
How TPM saves the day
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
 
Software Reliability Engineering
Software Reliability EngineeringSoftware Reliability Engineering
Software Reliability Engineering
 
Software reliability growth model
Software reliability growth modelSoftware reliability growth model
Software reliability growth model
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Feedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potentialFeedback loops: How SREs benefit and what is needed to realize their potential
Feedback loops: How SREs benefit and what is needed to realize their potential
 
Load balancing in the SRE way
Load balancing in the SRE wayLoad balancing in the SRE way
Load balancing in the SRE way
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 

Ähnlich wie Couchbase Meetup Jan 2016

LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
kawamuray
 

Ähnlich wie Couchbase Meetup Jan 2016 (20)

CA Technologies Customer Presentation
CA Technologies Customer PresentationCA Technologies Customer Presentation
CA Technologies Customer Presentation
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Ruslan Belkin And Sean Dawson on LinkedIn's Network Updates Uncovered
Ruslan Belkin And Sean Dawson on LinkedIn's Network Updates UncoveredRuslan Belkin And Sean Dawson on LinkedIn's Network Updates Uncovered
Ruslan Belkin And Sean Dawson on LinkedIn's Network Updates Uncovered
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Unlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQLUnlocking big data with Hadoop + MySQL
Unlocking big data with Hadoop + MySQL
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Music streams
Music streamsMusic streams
Music streams
 
Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks Customer Feedback Analytics for Starbucks
Customer Feedback Analytics for Starbucks
 
Microservices Journey Summer 2017
Microservices Journey Summer 2017Microservices Journey Summer 2017
Microservices Journey Summer 2017
 
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL DatabaseModern ETL: Azure Data Factory, Data Lake, and SQL Database
Modern ETL: Azure Data Factory, Data Lake, and SQL Database
 
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)LinkedIn Infrastructure (analytics@webscale, at fb 2013)
LinkedIn Infrastructure (analytics@webscale, at fb 2013)
 
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
Billions of Messages in Real Time: Why Paypal & LinkedIn Trust an Engagement ...
 
LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...LINE's messaging service architecture underlying more than 200 million monthl...
LINE's messaging service architecture underlying more than 200 million monthl...
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
MySQL en el mundo real. Evolución desde la compra por Oracle
MySQL en el mundo real. Evolución desde la compra por OracleMySQL en el mundo real. Evolución desde la compra por Oracle
MySQL en el mundo real. Evolución desde la compra por Oracle
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 

Mehr von Michael Kehoe

Mehr von Michael Kehoe (17)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scaleVelocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Couchbase Meetup Jan 2016

  • 1.
  • 2. Michael Kehoe Senior Site Reliability Engineer LinkedIn LinkedIn’s Big Data Pipeline with Kafka, Hadoop and Couchbase
  • 3. 3 $ whoami Michael Kehoe • Sr Site Reliability Engineer (SRE) • Member of CBVT • B.E. (Electrical Engineering) from the University of Queensland, Australia
  • 4. 4 Kafka @ LinkedIn • Kafka was created by LinkedIn • Kafka is a publish-subscribe system as a distributed commit log • Processes 500+ TB/ day (~500 billion messages)
  • 5. 5 LinkedIn’s use of Kafka • Monitoring • Pub-Sub Messaging • Analytics • Building block for (log) distributed application • Samza • Espresso • Pinot
  • 6. Kafka to Hadoop (Analytics) 6 Use Case • LinkedIn tracks data to better understand how members use our products • Information such as which page got viewed and which content got clicked on are sent into a Kafka cluster in each data center • Some of these events are all centrally collected and pushed onto our Hadoop grid for analysis and daily report generation
  • 7. 7 Couchbase @ LinkedIn • About 80 separate services with one or more clusters in multiple data centers • Up to ~70 servers in a cluster • Single & Multi-tenant clusters
  • 8. 8 Hadoop to Couchbase • Our primary use-case for Hadoop  Couchbase is for building (warming) / restoring Couchbase buckets • LinkedIn built it’s own in-house solution to work with our ETL processes etc
  • 9. Jobs Cluster 9 Clusters & Numbers • Used for read-scaling, > 150k QPS, 27 node clusters • We use Hadoop to pre-build data by partition • Couchbase average latency is 2-3ms • 99th percentile is ~8 - 12ms
  • 11. ©2014 LinkedIn Corporation. All Rights Reserved.