MyHeritage Kakfa use cases - Feb 2014 Meetup

MyHeritage and Kafka

Author: Ran Levy
Feb 2014

Agenda

• MyHeritage use cases

• Possible solutions
• Kafka overview
• Actual implementation @MyHeritage
• Summary

Use cases

•

Two major use case:

– Indexing to SuperSearch and Record Matching.
– Stats reporting to BI.

Use case 1

•

Indexing to SuperSearch and Record Matching

Use case 1 – con’t

•

Custom and non-scalable solution that involved changes processing and
updating SuperSearch (SOLR over Lucene).

•

Required solution should support:
– Continuous mode.
– High throughput.
– Scaling up.
– Repeating the process from some point.
– Guaranteed order of processed items.
– Reliable.
– Multiple consumers.

Use case 2

•

Statistics reporting to BI system

Use case 2 – con’t

•

Required solution should support:
•
•
•
•

High scale (~500GB of data / day).
Scale up – few hundred millions per day.
Repeating the process from some point.
Multiple consumers.

Agenda

 MyHeritage use cases

•

Possible solutions

•

Kafka overview

•

Actual implementation @MyHeritage

•

Summary

Possible Solutions

•

So what we have considered ….
– DB

•

Queues

Possible Solutions

•

Key point about queues
– Messages are deleted after consumed.
– Messages are duplicated to support multiple readers.

Agenda


 Possible solutions
•

Kafka overview

•


•

Summary

Kafka Overview

•

A high throughput distributed messaging system

–
–
–
–
–

Fast
Scalable
Durable
Distributed by design
Simplicity (over functionality)

Kafka Overview

•

Fast (very fast) – both for producer and consumer

Reference: http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf

Kafka Overview

•

Main entities
– Producer – push data.
– Consumer – pull data.
– Brokers – load balance producers by partition.
– Topic – feeds of messages belongs to the same logical category.

Kafka Overview – some internals

•

Communication between the clients and the servers is done with a simple,
high-performance TCP protocol.

•

For each topic, the Kafka cluster maintains a partitioned log which is a
commit-log (appends only).

Kafka Overview – some internals

•

Messages stay on disk when consumed, deleted after defined TTL.

•

The partitions of the log are distributed over the servers in the Kafka cluster
with each server handling data and requests for a share of the partitions.

•

Each partition is replicated across a configurable number of servers for fault
tolerance.

Agenda

 Kafka overview
•


•

Summary

High Level Overview

…

Daemons

Family Tree
changes Topic

Family Tree
changes Topic

part 1

part 1

part 2

part 2
DRBD
replica
Of
Broker
2

part 32

Consumers

Activity Topic

Indexing

part 1

part 1

RecordMatching

part 2

part 2

…

part 32

…

Face recog.

Broker 2

…

Web

Broker 1

…

Producers

Logstash reader

part 32

part 32

Activity Topic

DRBD
replica
Of
Broker
1

Kafka @Myheritage - producers

App
App
Module
App
Module
Module

Subscriber
Dispatch event

Events
System

Notify

Subscriber
EventLogger
Subscriber

Activity
Manage
r

ILogWrite


Topic

BrokersConfig

IStats
KafkaWriter

ISelector

ILogger

ISerializer


App
App
Module
App
Module
Module

Subscriber
Dispatch event

Events
System

Notify

Subscriber
EventLogger
Subscriber

KafkaWriter
(if failed) Attempt 2nd
broker

Broker

Attempt 1st broker

Broker

Kafka @Myheritage – Consumers (Indexing)
1 Per consumer
type, reader per
partition

KafkaWatermark
Get/update watermark

Broker 1

EventProcessor
EventProcessor
EventProcessor
Broker 2

Add event to queue

IndexingQueue
Fetch work

IndexingWorkers
IndexingWorkers
IndexingWorkers

Update item

SOLR

Agenda


 Kafka overview
 Actual implementation @MyHeritage
•

Summary

Summary

Kafka is very fast and scalable system, that
is extensively used at MyHeritage, and you
would want to consider it for high scale
systems you are using.

Thank you and questions

ranl@myheritage.com

MyHeritage Kakfa use cases - Feb 2014 Meetup

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Ähnlich wie MyHeritage Kakfa use cases - Feb 2014 Meetup

Ähnlich wie MyHeritage Kakfa use cases - Feb 2014 Meetup (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

MyHeritage Kakfa use cases - Feb 2014 Meetup