SlideShare ist ein Scribd-Unternehmen logo
1 von 25
MyHeritage and Kafka

Author: Ran Levy
Feb 2014
Agenda

• MyHeritage use cases

• Possible solutions
• Kafka overview
• Actual implementation @MyHeritage
• Summary
Use cases

•

Two major use case:

– Indexing to SuperSearch and Record Matching.
– Stats reporting to BI.
Use case 1

•

Indexing to SuperSearch and Record Matching
Use case 1 – con’t

•

Custom and non-scalable solution that involved changes processing and
updating SuperSearch (SOLR over Lucene).

•

Required solution should support:
– Continuous mode.
– High throughput.
– Scaling up.
– Repeating the process from some point.
– Guaranteed order of processed items.
– Reliable.
– Multiple consumers.
Use case 2

•

Statistics reporting to BI system
Use case 2 – con’t

•

Required solution should support:
•
•
•
•

High scale (~500GB of data / day).
Scale up – few hundred millions per day.
Repeating the process from some point.
Multiple consumers.
Agenda

 MyHeritage use cases

•

Possible solutions

•

Kafka overview

•

Actual implementation @MyHeritage

•

Summary
Possible Solutions

•

So what we have considered ….
– DB

•

Queues
Possible Solutions

•

Key point about queues
– Messages are deleted after consumed.
– Messages are duplicated to support multiple readers.
Agenda

 MyHeritage use cases

 Possible solutions
•

Kafka overview

•

Actual implementation @MyHeritage

•

Summary
Kafka Overview

•

A high throughput distributed messaging system

–
–
–
–
–

Fast
Scalable
Durable
Distributed by design
Simplicity (over functionality)
Kafka Overview

•

Fast (very fast) – both for producer and consumer

Reference: http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
Kafka Overview

•

Main entities
– Producer – push data.
– Consumer – pull data.
– Brokers – load balance producers by partition.
– Topic – feeds of messages belongs to the same logical category.
Kafka Overview – some internals

•

Communication between the clients and the servers is done with a simple,
high-performance TCP protocol.

•

For each topic, the Kafka cluster maintains a partitioned log which is a
commit-log (appends only).
Kafka Overview – some internals

•

Messages stay on disk when consumed, deleted after defined TTL.

•

The partitions of the log are distributed over the servers in the Kafka cluster
with each server handling data and requests for a share of the partitions.

•

Each partition is replicated across a configurable number of servers for fault
tolerance.
Agenda

 MyHeritage use cases
 Possible solutions
 Kafka overview
•

Actual implementation @MyHeritage

•

Summary
High Level Overview

…

Daemons

Family Tree
changes Topic

Family Tree
changes Topic

part 1

part 1

part 2

part 2
DRBD
replica
Of
Broker
2

part 32

Consumers

Activity Topic

Indexing

part 1

part 1

RecordMatching

part 2

part 2

…

part 32

…

Face recog.

Broker 2

…

Web

Broker 1

…

Producers

Logstash reader

part 32

part 32

Activity Topic

DRBD
replica
Of
Broker
1
Kafka @Myheritage - producers

App
App
Module
App
Module
Module

Subscriber
Dispatch event

Events
System

Notify

Subscriber
EventLogger
Subscriber

Activity
Manage
r

ILogWrite
Kafka @Myheritage - producers

Topic

BrokersConfig

IStats
KafkaWriter

ISelector

ILogger

ISerializer
Kafka @Myheritage - producers

App
App
Module
App
Module
Module

Subscriber
Dispatch event

Events
System

Notify

Subscriber
EventLogger
Subscriber

KafkaWriter
(if failed) Attempt 2nd
broker

Broker

Attempt 1st broker

Broker
Kafka @Myheritage – Consumers (Indexing)
1 Per consumer
type, reader per
partition

KafkaWatermark
Get/update watermark

Broker 1

EventProcessor
EventProcessor
EventProcessor
Broker 2

Add event to queue

IndexingQueue
Fetch work

IndexingWorkers
IndexingWorkers
IndexingWorkers

Update item

SOLR
Agenda

 MyHeritage use cases

 Possible solutions
 Kafka overview
 Actual implementation @MyHeritage
•

Summary
Summary

Kafka is very fast and scalable system, that
is extensively used at MyHeritage, and you
would want to consider it for high scale
systems you are using.
Thank you and questions

ranl@myheritage.com

Weitere ähnliche Inhalte

Was ist angesagt?

How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...HostedbyConfluent
 
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentIntroducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentHostedbyConfluent
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...confluent
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxyconfluent
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancingconfluent
 
Upleveling Brownfield Integration
Upleveling Brownfield IntegrationUpleveling Brownfield Integration
Upleveling Brownfield IntegrationAsanka Abeyweera
 
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHow a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHostedbyConfluent
 
Relational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the CloudRelational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the CloudHossein Riasati
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGwen (Chen) Shapira
 
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...confluent
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIconfluent
 
Kafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless EnvironmentsKafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless Environmentsconfluent
 
Monitoring docker container and dockerized applications
Monitoring docker container and dockerized applicationsMonitoring docker container and dockerized applications
Monitoring docker container and dockerized applicationsAnanth Padmanabhan
 
Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!Paolo Castagna
 
Samza tech talk_2015 - strata
Samza tech talk_2015 - strataSamza tech talk_2015 - strata
Samza tech talk_2015 - strataYi Pan
 
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon CrosbyEasily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon CrosbyStreamNative
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentRakuten Group, Inc.
 
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...Flink Forward
 

Was ist angesagt? (19)

How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
 
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentIntroducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
A Marriage of Lambda and Kappa: Supporting Iterative Development of an Event ...
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
 
Upleveling Brownfield Integration
Upleveling Brownfield IntegrationUpleveling Brownfield Integration
Upleveling Brownfield Integration
 
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, GlooHow a Data Mesh is Driving our Platform | Trey Hicks, Gloo
How a Data Mesh is Driving our Platform | Trey Hicks, Gloo
 
Relational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the CloudRelational cloud, A Database-as-a-Service for the Cloud
Relational cloud, A Database-as-a-Service for the Cloud
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa (...
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
Kafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless EnvironmentsKafka for Real-Time Event Processing in Serverless Environments
Kafka for Real-Time Event Processing in Serverless Environments
 
Monitoring docker container and dockerized applications
Monitoring docker container and dockerized applicationsMonitoring docker container and dockerized applications
Monitoring docker container and dockerized applications
 
Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!
 
Samza tech talk_2015 - strata
Samza tech talk_2015 - strataSamza tech talk_2015 - strata
Samza tech talk_2015 - strata
 
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon CrosbyEasily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
 
Achieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environmentAchieving scale and performance using cloud native environment
Achieving scale and performance using cloud native environment
 
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
Flink Forward Berlin 2018: Andrew Torson - "Using a sharded Akka distributed ...
 

Ähnlich wie MyHeritage Kakfa use cases - Feb 2014 Meetup

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpJosé Román Martín Gil
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaJay Kreps
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleApache Kafka TLV
 
Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache KafkaSaumitra Srivastav
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?Jagadish Venkatraman
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scaleRan Levy
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaRicardo Bravo
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...StreamNative
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...Timothy McCormick
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexityPaolo Platter
 
Megastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storageMegastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storageNiels Claeys
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.Data Con LA
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of stateYoni Farin
 

Ähnlich wie MyHeritage Kakfa use cases - Feb 2014 Meetup (20)

Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Timesten Architecture
Timesten ArchitectureTimesten Architecture
Timesten Architecture
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
 
MyHeritage backend group - build to scale
MyHeritage backend group - build to scaleMyHeritage backend group - build to scale
MyHeritage backend group - build to scale
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
 
Megastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storageMegastore: Providing scalable and highly available storage
Megastore: Providing scalable and highly available storage
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
 

Kürzlich hochgeladen

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Kürzlich hochgeladen (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

MyHeritage Kakfa use cases - Feb 2014 Meetup

  • 1. MyHeritage and Kafka Author: Ran Levy Feb 2014
  • 2. Agenda • MyHeritage use cases • Possible solutions • Kafka overview • Actual implementation @MyHeritage • Summary
  • 3. Use cases • Two major use case: – Indexing to SuperSearch and Record Matching. – Stats reporting to BI.
  • 4. Use case 1 • Indexing to SuperSearch and Record Matching
  • 5. Use case 1 – con’t • Custom and non-scalable solution that involved changes processing and updating SuperSearch (SOLR over Lucene). • Required solution should support: – Continuous mode. – High throughput. – Scaling up. – Repeating the process from some point. – Guaranteed order of processed items. – Reliable. – Multiple consumers.
  • 6. Use case 2 • Statistics reporting to BI system
  • 7. Use case 2 – con’t • Required solution should support: • • • • High scale (~500GB of data / day). Scale up – few hundred millions per day. Repeating the process from some point. Multiple consumers.
  • 8. Agenda  MyHeritage use cases • Possible solutions • Kafka overview • Actual implementation @MyHeritage • Summary
  • 9. Possible Solutions • So what we have considered …. – DB • Queues
  • 10. Possible Solutions • Key point about queues – Messages are deleted after consumed. – Messages are duplicated to support multiple readers.
  • 11. Agenda  MyHeritage use cases  Possible solutions • Kafka overview • Actual implementation @MyHeritage • Summary
  • 12. Kafka Overview • A high throughput distributed messaging system – – – – – Fast Scalable Durable Distributed by design Simplicity (over functionality)
  • 13. Kafka Overview • Fast (very fast) – both for producer and consumer Reference: http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
  • 14. Kafka Overview • Main entities – Producer – push data. – Consumer – pull data. – Brokers – load balance producers by partition. – Topic – feeds of messages belongs to the same logical category.
  • 15. Kafka Overview – some internals • Communication between the clients and the servers is done with a simple, high-performance TCP protocol. • For each topic, the Kafka cluster maintains a partitioned log which is a commit-log (appends only).
  • 16. Kafka Overview – some internals • Messages stay on disk when consumed, deleted after defined TTL. • The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions. • Each partition is replicated across a configurable number of servers for fault tolerance.
  • 17. Agenda  MyHeritage use cases  Possible solutions  Kafka overview • Actual implementation @MyHeritage • Summary
  • 18. High Level Overview … Daemons Family Tree changes Topic Family Tree changes Topic part 1 part 1 part 2 part 2 DRBD replica Of Broker 2 part 32 Consumers Activity Topic Indexing part 1 part 1 RecordMatching part 2 part 2 … part 32 … Face recog. Broker 2 … Web Broker 1 … Producers Logstash reader part 32 part 32 Activity Topic DRBD replica Of Broker 1
  • 19. Kafka @Myheritage - producers App App Module App Module Module Subscriber Dispatch event Events System Notify Subscriber EventLogger Subscriber Activity Manage r ILogWrite
  • 20. Kafka @Myheritage - producers Topic BrokersConfig IStats KafkaWriter ISelector ILogger ISerializer
  • 21. Kafka @Myheritage - producers App App Module App Module Module Subscriber Dispatch event Events System Notify Subscriber EventLogger Subscriber KafkaWriter (if failed) Attempt 2nd broker Broker Attempt 1st broker Broker
  • 22. Kafka @Myheritage – Consumers (Indexing) 1 Per consumer type, reader per partition KafkaWatermark Get/update watermark Broker 1 EventProcessor EventProcessor EventProcessor Broker 2 Add event to queue IndexingQueue Fetch work IndexingWorkers IndexingWorkers IndexingWorkers Update item SOLR
  • 23. Agenda  MyHeritage use cases  Possible solutions  Kafka overview  Actual implementation @MyHeritage • Summary
  • 24. Summary Kafka is very fast and scalable system, that is extensively used at MyHeritage, and you would want to consider it for high scale systems you are using.
  • 25. Thank you and questions ranl@myheritage.com