The document discusses the challenges of handling billions of events per minute at Twitter. It describes Twitter's event log pipeline architecture using microservices, event aggregation with Scribe, and event processing with MapReduce. The pipeline was later modified to use streaming technologies to reduce latency, including Apache Beam and Google Cloud services. Latency for different dataset types ranges from seconds to minutes. Future challenges include scaling further for volume and spikes while maintaining fast failure recovery times.
3. Example of
slide
● User interactions generate events
● Events are grouped as Datasets
● Datasets for Data processing & Analytics
Event Log Pipeline
Event Log Pipeline
4. 1B 5B
~500 datasets
● Scale
● Resource requirement
~800 datasets
Events per
minute
Events per
minute
Challenges
9. Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
● Per aggregator scaling challenges
● No multi tenancy for datasets
● Eco system integration
● Difficult to maintain and improve
Event Aggregation
Problem
10. Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
● Moved to open source Flume
● Per aggregator improvements
● Microbatch, memory management,...
● Optimize resource usage
● Streaming at network speed
Event Aggregation
11. Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
● Memory spikes before and after
Event Aggregation
12. Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
● Data tiers with priorities
● Dataset groups
● Aggregator groups
● Dynamic scaling
Event Aggregation
13. Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
● Scaling Aggregation
Event Aggregation
Dataset 1
Dataset 2
Dataset 3
Aggregator Group 1
Aggregator Group 2
Service
Discovery
Dataset Storage
14. Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
Event Processing
Problem
● Thousands of process jobs
● Multitenancy
● Long tail reducers
● Unpredictable run time (traffic surge)
16. Example of
slide
Event Processing
● Apache Tez based processor
● Data tier processors
● Dynamic Hash based partition
Raw Events
Partition
Partition
Partition
17. Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
Migration
● Transparent migration
● More than 5 Billion events per
minute
● Per dataset scaling
● Reliable and fault tolerant framework
18. Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
Migration
● Transparent migration
● More than 5 Billion events per
minute
● Per dataset scaling
● Reliable and fault tolerant framework
But…….
Latency!!!
19. Example of
slide
Near real time
Near real time
event streaming
● Analytics on real time user interactions
● End to end latency of minutes
● Data storage with real time insertion
24. Example of
slide
Latency
Dataset Type Event Latency
p90 p95 p99
Dataset 1 (very small datasets) 0.42 sec 0.48 sec 0.59 sec
Dataset 2 (Smaller datasets) 2.6 sec 2.8 sec 8.4 sec
Dataset 3 (Large datasets) 22.1 sec 29.9 sec 67.7 sec
Dataset 4 (Large datasets 5 min micro batch) 6.3 min 6.5 min 7.2 min
● Promising results
● Independent pipelines
25. Example of
slide
Example of Statement
Slide. Lorem ipsum dolor
sit amet, consectetur
adipiscing elit. Text size
should be between 80 pt to
140 pt depending on
paragraph length.
New challenges and future
● Scale for volume and spikes
● Stream processing and ingestion at
scale
● Faster catch up on failures
● Change Event Log Pipeline to Event
Streaming Pipelines
Follow @TwitterEng
@TwitterCareers
What are events and what does Event Log Pipeline look like?User interactions generate events.
These flow throw our Event Log Pipeline which produce datasets
These are available for data processing, data analytics
What was our challenge
Used to see YoY growth. We had projected to grow from 1B events per minute to 5B events per minute
At the high level architecture
Client emit event, More than 100K instance
Event Aggregation framework at about 3K instances built on Scribe
Once they generate raw dataset, these are further curated and optimized in processor
Generating hourly batch
At the high level architecture
Client emit event, More than 100K instance
Event Aggregation framework at about 3K instances built on Scribe
Once they generate raw dataset, these are further curated and optimized in processor
Generating hourly batch
At the high level architecture
Client emit event, More than 100K instance
Event Aggregation framework at about 3K instances built on Scribe
Once they generate raw dataset, these are further curated and optimized in processor
Generating hourly batch
At the high level architecture
Client emit event, More than 100K instance
Event Aggregation framework at about 3K instances built on Scribe
Once they generate raw dataset, these are further curated and optimized in processor
Generating hourly batch
Per aggregator scaling challenge. Stress testing showed that there is a cap on how many events can be handled because of CPU bottleneck
There was no multi tenancy. Bad dataset could cause problems to