SlideShare ist ein Scribd-Unternehmen logo
1 von 93
Downloaden Sie, um offline zu lesen
1
ProcessingProcessing
“BIG-DATA”“BIG-DATA”
InIn Real TimeReal Time
Yanai Franchi , TikalYanai Franchi , Tikal
2
Two years ago...Two years ago...
3
4
Vacation to BarcelonaVacation to Barcelona
5
After a Long Travel DayAfter a Long Travel Day
6
Going to a Salsa Club
7
Best Salsa Club
NOW
● Good Music
● Crowded –
Now!
8
Same Problem in “gogobot”
9
10
gogobot checkin
Heat Map Service
Lets' Develop
“Gogobot Checkins Heat-Map”
11
Key Notes
● Collector Service - Collects checkins as text addresses
– We need to use GeoLocation ServiceWe need to use GeoLocation Service
● Upon elapsed interval, the last locations list will be
displayed as Heat-Map in GUI.
● Web Scale service – 10Ks checkins/seconds all over the
world (imaginary, but lets do it for the exercise).
● Accuracy – Sample data, NOT critical data.
– Proportionately representative
– Data volume is large enough tois large enough to compensate for data loss.compensate for data loss.
12
Heat-Map Context
Text-Address
Checkins Heat-Map
Service
Gogobot System
Gogobot
Micro Service
Gogobot
Micro Service
Gogobot
Micro Service
Geo Location
Service
Get-GeoCode(Address)
Heat-Map
Last Interval Locations
13
Database
Persist Checkin
Intervals
Processing
Checkins
Read
Text Address
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
Check-in #6
Check-in #7
Check-in #8
Check-in #9
...
Simulate Checkins with a File
Plan A
GET Geo
Location
Geo Location
Service
14
Tons of Addresses
Arriving Every Second
15
Architect - First Reaction...
16
Second Reaction...
17
Developer
First
Reaction
18
Second
Reaction
19
Problems ?
● Tedious: Spend time conf iguring where to send
messages, deploying workers, and deploying
intermediate queues.
● Brittle: There's little fault-tolerance.
● Painful to scale: Partition of running worker/s is
complicated.
20
What We Want ?
● Horizontal scalability
● Fault-tolerance
● No intermediate message brokers!
● Higher level abstraction than message
passing
● “Just works”
● Guaranteed data processing (not in this
case)
21
Apache Storm
✔Horizontal scalability
✔Fault-tolerance
✔No intermediate message brokers!
✔Higher level abstraction than message
passing
✔“Just works”
✔Guaranteed data processing
22
Anatomy of Storm
23
What is Storm ?
● CEP - Open source and distributed realtime
computation system.
– Makes it easy toMakes it easy to reliably process unboundedreliably process unbounded streamsstreams ofof
tuplestuples
– Doing for realtime processing what Hadoop did for batchDoing for realtime processing what Hadoop did for batch
processing.processing.
● Fast - 1M Tuples/sec per node.
– It is scalable,fault-tolerant, guarantees your data will beIt is scalable,fault-tolerant, guarantees your data will be
processed, and is easy to set up and operate.processed, and is easy to set up and operate.
24
Streams
Tuple Tuple Tuple Tuple Tuple Tuple
Unbounded sequence of tuples
25
Spouts
Tuple
Tuple
Sources of Streams
Tuple Tuple
26
Bolts
Tuple
TupleTuple
Processes input streams and produces
new streams
Tuple
TupleTupleTuple
Tuple TupleTuple
27
Storm Topology
Network of spouts and bolts
Tuple
TupleTuple
TupleTuple TupleTuple
Tuple TupleTupleTuple
Tuple
Tuple
Tuple
Tuple TupleTupleTuple
28
Guarantee for Processing
● Storm guarantees the full processing of a tuple by
tracking its state
● In case of failure, Storm can re-process it.
● Source tuples with full “acked” trees are removed
from the system
29
Tasks (Bolt/Spout Instance)
Spouts and bolts execute as
many tasks across the cluster
30
Stream Grouping
When a tuple is emitted, which task
(instance) does it go to?
31
Stream Grouping
● Shuff le grouping: pick a random task
● Fields grouping: consistent hashing on a subset of
tuple f ields
● All grouping: send to all tasks
● Global grouping: pick task with lowest id
32
Tasks , Executors , Workers
Task Task Task
Worker Process
Sput /
Bolt
Sput /
Bolt
Sput /
Bolt
=
Executor Thread
JVM
Executor Thread
33
Bolt B Bolt B
Worker Process
Executor
Spout A
Executor
Node
Supervisor
Bolt C Bolt C
Executor
Bolt B Bolt B
Worker Process
Executor
Spout A
Executor
Node
Supervisor
Bolt C Bolt C
Executor
34
Nimbus
Supervisor Supervisor
Supervisor Supervisor
Supervisor Supervisor
Upload/Rebalance
Heat-Map Topology
Zoo Keeper
Nodes
Storm Architecture
Master Node
(similar to Hadoop JobTracker)
NOT critical
for running topology
35
Nimbus
Supervisor Supervisor
Supervisor Supervisor
Supervisor Supervisor
Upload/Rebalance
Heat-Map Topology
Zoo Keeper
Storm Architecture
Used For Cluster Coordination
A few
nodes
36
Nimbus
Supervisor Supervisor
Supervisor Supervisor
Supervisor Supervisor
Upload/Rebalance
Heat-Map Topology
Zoo Keeper
Storm Architecture
Run Worker Processes
37
Assembling Heatmap Topology
38
HeatMap Input/Output Tuples
● Input Tuples: Timestamp and Text Address :
– (9:00:07 PM , “287 Hudson St New York NY 10013”)(9:00:07 PM , “287 Hudson St New York NY 10013”)
● Output Tuple: Time interval, and a list of points for
it:
– (9:00:00 PM to 9:00:15 PM,(9:00:00 PM to 9:00:15 PM,
ListList((((40.719,-73.98740.719,-73.987),(40.726,-74.001),(),(40.726,-74.001),(40.719,-73.98740.719,-73.987))))
39
Checkins
Spout
Geocode
Lookup
Bolt
Heatmap
Builder
Bolt
Persistor
Bolt
(9:01 PM @ 287 Hudson st)
(9:01 PM , (40.736, -74,354)))
Heat Map
Storm
Topology
(9:00 PM – 9:15 PM , List((40.73, -74,34),
(51.36, -83,33),(69.73, -34,24))
Upon
Elapsed Interval
40
Checkins Spout
public class CheckinsSpout extends BaseRichSpout {
private List<String> sampleLocations;
private int nextEmitIndex;
private SpoutOutputCollector outputCollector;
@Override
public void open(Map map, TopologyContext topologyContext,
SpoutOutputCollector spoutOutputCollector) {
this.outputCollector = spoutOutputCollector;
this.nextEmitIndex = 0;
sampleLocations = IOUtils.readLines(
ClassLoader.getSystemResourceAsStream("sanple-locations.txt"));
}
@Override
public void nextTuple() {
String address = checkins.get(nextEmitIndex);
String checkin = new Date().getTime()+"@ADDRESS:"+address;
outputCollector.emit(new Values(checkin));
nextEmitIndex = (nextEmitIndex + 1) % sampleLocations.size();
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("str"));
}
We hold state
No need for thread safety
Declare
output fields
Been called
iteratively by Storm
41
Geocode Lookup Bolt
public class GeocodeLookupBolt extends BaseBasicBolt {
private LocatorService locatorService;
@Override
public void prepare(Map stormConf, TopologyContext context) {
locatorService = new GoogleLocatorService();
}
@Override
public void execute(Tuple tuple, BasicOutputCollector outputCollector) {
String str = tuple.getStringByField("str");
String[] parts = str.split("@");
Long time = Long.valueOf(parts[0]);
String address = parts[1];
LocationDTO locationDTO = locatorService.getLocation(address);
String city = locationDTO.getCity();
outputCollector.emit(new Values(city,time,locationDTO) );
}
@Override
public void declareOutputFields(OutputFieldsDeclarer fieldsDeclarer) {
fieldsDeclarer.declare(new Fields("city","time", "location"));
}
}
Get Geocode,
Create DTO
42
Tick Tuple – Repeating Mantra
43
Two Streams to Heat-Map Builder
On tick tuple, we f lush our Heat-Map
Checkin 1 Checkin 4 Checkin 5 Checkin 6
HeatMap-
Builder Bolt
44
Tick Tuple in Action
public class HeatMapBuilderBolt extends BaseBasicBolt {
private Map<String, List<LocationDTO>> heatmaps;
@Override
public Map<String, Object> getComponentConfiguration() {
Config conf = new Config();
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 60 );
return conf;
}
@Override
public void execute(Tuple tuple, BasicOutputCollector outputCollector) {
if (isTickTuple(tuple)) {
// Emit accumulated intervals
} else {
// Add check-in info to the current interval in the Map
}
}
private boolean isTickTuple(Tuple tuple) {
return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
&& tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("time-interval", "city","locationsList"));
}
Tick interval
Hold latest intervals
45
Persister Bolt
public class PersistorBolt extends BaseBasicBolt {
private Jedis jedis;
@Override
public void execute(Tuple tuple, BasicOutputCollector outputCollector) {
Long timeInterval = tuple.getLongByField("time-interval");
String city = tuple.getStringByField("city");
String locationsList = objectMapper.writeValueAsString
( tuple.getValueByField("locationsList"));
String dbKey = "checkins-" + timeInterval+"@"+city;
jedis.setex(dbKey, 3600*24 ,locationsList);
jedis.publish("location-key", dbKey);
}
}
Publish in
Redis channel
for debugging
Persist in Redis
for 24h
46
Shuffle Grouping
Shuffle Grouping
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
Check-in #6
Check-in #7
Check-in #8
Check-in #9
...
Sample Checkins File
Read
Text Addresses
Transforming the Tuples
Checkins
Spout
Geocode
Lookup
Bolt
Heatmap
Builder
Bolt
Database
Persistor
Bolt
Get Geo
Location
Geo Location
Service
Field Grouping(city)
Group by city
47
Heat Map Topology
public class LocalTopologyRunner {
public static void main(String[] args) {
TopologyBuilder builder = buildTopolgy();
StormSubmitter.submitTopology(
"local-heatmap", new Config(), builder.createTopology());
}
private static TopologyBuilder buildTopolgy() {
topologyBuilder builder = new TopologyBuilder();
builder.setSpout("checkins", new CheckinsSpout());
builder.setBolt("geocode-lookup", new GeocodeLookupBolt() )
.shuffleGrouping("checkins");
builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() )
.fieldsGrouping("geocode-lookup", new Fields("city"));
builder.setBolt("persistor", new PersistorBolt() )
.shuffleGrouping("heatmap-builder");
return builder;
}
}
48
Its NOT Scaled
49
50
Scaling the Topology
public class LocalTopologyRunner {
conf.setNumWorkers(20);
public static void main(String[] args) {
TopologyBuilder builder = buildTopolgy();
Config conf = new Config();
conf.setNumWorkers(2);
StormSubmitter.submitTopology(
"local-heatmap", conf, builder.createTopology());
}
private static TopologyBuilder buildTopolgy() {
topologyBuilder builder = new TopologyBuilder();
builder.setSpout("checkins", new CheckinsSpout(), 4 );
builder.setBolt("geocode-lookup", new GeocodeLookupBolt() , 8 )
.shuffleGrouping("checkins").setNumTasks(64);
builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() , 4)
.fieldsGrouping("geocode-lookup", new Fields("city"));
builder.setBolt("persistor", new PersistorBolt() , 2 )
.shuffleGrouping("heatmap-builder").setNumTasks(4);
return builder;
Parallelism hint
Increase Tasks
For Future
Set no. of workers
51
Database
Storm Heat-Map
Topology
Persist Checkin
Intervals
GET Geo
Location
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
Check-in #6
Check-in #7
Check-in #8
Check-in #9
...
Read
Text Address
Sample Checkins File
Recap – Plan A
Geo Location
Service
52
We have
something working
53
Add Kafka Messaging
54
Plan B -
Kafka Spout&Bolt to HeatMap
Geocode
Lookup
Bolt
Heatmap
Builder
Bolt
Kafka
Checkins
Spout
Database
Persistor
Bolt
Geo Location
Service
Read
Text Addresses
Checkin
Kafka
Topic
Publish
Checkins
Locations
Topic
Kafka
Locations
Bolt
55
56
They all are Good
But not for all use-cases
57
Kafka
A little introduction
58
59
60
61
Pub-Sub Messaging System
62
63
64
65
66
Stateless Broker &
Doesn't Fear the File System
67
68
69
70
Topics
● Logical collections of partitions (the physical f iles).
● A broker contains some of the partitions for a topic
71
A partition is Consumed by
Exactly One Group's Consumer
72
Distributed &
Fault-Tolerant
73
Broker 1 Broker 3Broker 2
Zoo Keeper
Consumer 1 Consumer 2
Producer 1 Producer 2
74
Broker 1 Broker 4Broker 3Broker 2
Zoo Keeper
Consumer 1 Consumer 2
Producer 1 Producer 2
75
Broker 1 Broker 4Broker 3Broker 2
Zoo Keeper
Consumer 1 Consumer 2
Producer 1 Producer 2
76
Broker 1 Broker 4Broker 3Broker 2
Zoo Keeper
Consumer 1 Consumer 2
Producer 1 Producer 2
77
Broker 1 Broker 4Broker 3Broker 2
Zoo Keeper
Consumer 1 Consumer 2
Producer 1 Producer 2
78
Broker 1 Broker 4Broker 3Broker 2
Zoo Keeper
Consumer 1 Consumer 2
Producer 1 Producer 2
79
Broker 1 Broker 4Broker 3Broker 2
Zoo Keeper
Consumer 1 Consumer 2
Producer 1 Producer 2
80
Broker 1 Broker 3Broker 2
Zoo Keeper
Consumer 1 Consumer 2
Producer 1 Producer 2
81
Broker 1 Broker 3Broker 2
Zoo Keeper
Consumer 1 Consumer 2
Producer 1 Producer 2
82
Broker 1 Broker 3Broker 2
Zoo Keeper
Consumer 1 Consumer 2
Producer 1 Producer 2
83
Broker 1 Broker 3Broker 2
Zoo Keeper
Consumer 1
Producer 1 Producer 2
84
Broker 1 Broker 3Broker 2
Zoo Keeper
Consumer 1
Producer 1 Producer 2
85
Broker 1 Broker 3Broker 2
Zoo Keeper
Consumer 1
Producer 1 Producer 2
86
Performance Benchmark
3 Brokers
3 Producers
3 Consumers
Cheap Machines
• “Up to 2 million writes/sec on 3 cheap machines”
• Using 3 producers on 3 different machines, 3x async replication,
• Only 1 producer/machine because NIC already saturatedOnly 1 producer/machine because NIC already saturated
• End-to-End Latency is about 10ms for 99.9%
• Sustained throughput as stored data grows
•
•
•
87
88
Add Kafka to our Topology
public class LocalTopologyRunner {
...
private static TopologyBuilder buildTopolgy() {
...
builder.setSpout("checkins", new KafkaSpout(kafkaConfig) , 4);
...
builder.setBolt("kafkaProducer", new KafkaOutputBolt
( "localhost:9092",
"kafka.serializer.StringEncoder",
"locations-topic"))
.shuffleGrouping("persistor");
return builder;
}
}
Kafka Bolt
Kafka Spout
89
Checkin HTTP
Reactor
Publish
Checkins
Database
Checkin
Kafka
Topic
Consume Checkins
Storm Heat-Map
Topology
Locations
Kafka
Topic
Publish
Interval Key
Persist Checkin
Intervals
Geo Location
ServiceGET Geo
Location
Text-Address
90
Demo
91
Summary
When You go out to Salsa Club...
● Good Music
● Crowded
92
More Conclusions..
● BigData – Also refers to Velocity of data (not only
Volume of data)
● Storm – Great for real-time BigData processing.
Complementary for Hadoop batch jobs.
● Kafka – Great messaging for logs/events data, been
served as a good “source” for Storm spout
93
Thanks

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScyllaDB
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an actionGordon Chung
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3Rob Skillington
 
Writing Applications for Scylla
Writing Applications for ScyllaWriting Applications for Scylla
Writing Applications for ScyllaScyllaDB
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...NoSQLmatters
 
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor APIBeyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor APIconfluent
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and MetricsRicardo Lourenço
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
 
How Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferHow Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferScyllaDB
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseDataStax Academy
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemC4Media
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...Chester Chen
 
[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템NAVER D2
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing FrameworksSirKetchup
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingVasia Kalavri
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon
 
An Introduction to Priam
An Introduction to PriamAn Introduction to Priam
An Introduction to PriamJason Brown
 

Was ist angesagt? (20)

Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
 
Mario on spark
Mario on sparkMario on spark
Mario on spark
 
Writing Applications for Scylla
Writing Applications for ScyllaWriting Applications for Scylla
Writing Applications for Scylla
 
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
Ted Dunning – Very High Bandwidth Time Series Database Implementation - NoSQL...
 
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor APIBeyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API
 
Scaling Up Logging and Metrics
Scaling Up Logging and MetricsScaling Up Logging and Metrics
Scaling Up Logging and Metrics
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
How Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferHow Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and Safer
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series Database
 
Mantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing SystemMantis: Netflix's Event Stream Processing System
Mantis: Netflix's Event Stream Processing System
 
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK... SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
SF Big Analytics 2019112: Uncovering performance regressions in the TCP SACK...
 
[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing Frameworks
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processing
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
An Introduction to Priam
An Introduction to PriamAn Introduction to Priam
An Introduction to Priam
 

Andere mochten auch

What Java Can Learn From JavaScript
What Java Can Learn From JavaScriptWhat Java Can Learn From JavaScript
What Java Can Learn From JavaScriptsogrady
 
Text Mining for Second Screen
Text Mining for Second ScreenText Mining for Second Screen
Text Mining for Second ScreenIvan Demin
 
Der Nobelpreis geht an: Vitamin C
Der Nobelpreis geht an: Vitamin CDer Nobelpreis geht an: Vitamin C
Der Nobelpreis geht an: Vitamin CDr Rath
 
Il ricatto è online: Cryptolocker, il virus che rapisce la privacy
Il ricatto è online: Cryptolocker, il virus che rapisce la privacyIl ricatto è online: Cryptolocker, il virus che rapisce la privacy
Il ricatto è online: Cryptolocker, il virus che rapisce la privacynetWork S.a.s
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Chris Shillum
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data MiningCrossref
 
Visual data mining with HeatMiner
Visual data mining with HeatMinerVisual data mining with HeatMiner
Visual data mining with HeatMinerCloudNSci
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With RJahnab Kumar Deka
 

Andere mochten auch (9)

What Java Can Learn From JavaScript
What Java Can Learn From JavaScriptWhat Java Can Learn From JavaScript
What Java Can Learn From JavaScript
 
Text Mining for Second Screen
Text Mining for Second ScreenText Mining for Second Screen
Text Mining for Second Screen
 
Der Nobelpreis geht an: Vitamin C
Der Nobelpreis geht an: Vitamin CDer Nobelpreis geht an: Vitamin C
Der Nobelpreis geht an: Vitamin C
 
Semantische Systeme 3 0
Semantische Systeme 3 0Semantische Systeme 3 0
Semantische Systeme 3 0
 
Il ricatto è online: Cryptolocker, il virus che rapisce la privacy
Il ricatto è online: Cryptolocker, il virus che rapisce la privacyIl ricatto è online: Cryptolocker, il virus che rapisce la privacy
Il ricatto è online: Cryptolocker, il virus che rapisce la privacy
 
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Visual data mining with HeatMiner
Visual data mining with HeatMinerVisual data mining with HeatMiner
Visual data mining with HeatMiner
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With R
 

Ähnlich wie Processing Big Data in Real-Time - Yanai Franchi, Tikal

Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Faster Workflows, Faster
Faster Workflows, FasterFaster Workflows, Faster
Faster Workflows, FasterKen Krugler
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesOleksii Diagiliev
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018Zahari Dichev
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseVictoriaMetrics
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Altinity Ltd
 
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Rakib Hossain
 
QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...QNIB Solutions
 
Scientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceScientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceAngelo Corsaro
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리NAVER D2
 
Webinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlWebinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlScyllaDB
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 

Ähnlich wie Processing Big Data in Real-Time - Yanai Franchi, Tikal (20)

Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Faster Workflows, Faster
Faster Workflows, FasterFaster Workflows, Faster
Faster Workflows, Faster
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
 
QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...QNIBTerminal: Understand your datacenter by overlaying multiple information l...
QNIBTerminal: Understand your datacenter by overlaying multiple information l...
 
Bigdata roundtable-storm
Bigdata roundtable-stormBigdata roundtable-storm
Bigdata roundtable-storm
 
Scientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceScientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution Service
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리
 
Webinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlWebinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under Control
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 

Mehr von Codemotion Tel Aviv

Keynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBMKeynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBMCodemotion Tel Aviv
 
Angular is one fire(base)! - Shmuela Jacobs
Angular is one fire(base)! - Shmuela JacobsAngular is one fire(base)! - Shmuela Jacobs
Angular is one fire(base)! - Shmuela JacobsCodemotion Tel Aviv
 
Demystifying docker networking black magic - Lorenzo Fontana, Kiratech
Demystifying docker networking black magic - Lorenzo Fontana, KiratechDemystifying docker networking black magic - Lorenzo Fontana, Kiratech
Demystifying docker networking black magic - Lorenzo Fontana, KiratechCodemotion Tel Aviv
 
Faster deep learning solutions from training to inference - Amitai Armon & Ni...
Faster deep learning solutions from training to inference - Amitai Armon & Ni...Faster deep learning solutions from training to inference - Amitai Armon & Ni...
Faster deep learning solutions from training to inference - Amitai Armon & Ni...Codemotion Tel Aviv
 
Facts about multithreading that'll keep you up at night - Guy Bar on, Vonage
Facts about multithreading that'll keep you up at night - Guy Bar on, VonageFacts about multithreading that'll keep you up at night - Guy Bar on, Vonage
Facts about multithreading that'll keep you up at night - Guy Bar on, VonageCodemotion Tel Aviv
 
Master the Art of the AST (and Take Control of Your JS!) - Yonatan Mevorach, ...
Master the Art of the AST (and Take Control of Your JS!) - Yonatan Mevorach, ...Master the Art of the AST (and Take Control of Your JS!) - Yonatan Mevorach, ...
Master the Art of the AST (and Take Control of Your JS!) - Yonatan Mevorach, ...Codemotion Tel Aviv
 
Unleash the power of angular Reactive Forms - Nir Kaufman, 500Tech
Unleash the power of angular Reactive Forms - Nir Kaufman, 500TechUnleash the power of angular Reactive Forms - Nir Kaufman, 500Tech
Unleash the power of angular Reactive Forms - Nir Kaufman, 500TechCodemotion Tel Aviv
 
Can we build an Azure IoT controlled device in less than 40 minutes that cost...
Can we build an Azure IoT controlled device in less than 40 minutes that cost...Can we build an Azure IoT controlled device in less than 40 minutes that cost...
Can we build an Azure IoT controlled device in less than 40 minutes that cost...Codemotion Tel Aviv
 
Actors and Microservices - Can two walk together? - Rotem Hermon, Gigya
Actors and Microservices - Can two walk together? - Rotem Hermon, GigyaActors and Microservices - Can two walk together? - Rotem Hermon, Gigya
Actors and Microservices - Can two walk together? - Rotem Hermon, GigyaCodemotion Tel Aviv
 
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...Codemotion Tel Aviv
 
My Minecraft Smart Home: Prototyping the internet of uncanny things - Sascha ...
My Minecraft Smart Home: Prototyping the internet of uncanny things - Sascha ...My Minecraft Smart Home: Prototyping the internet of uncanny things - Sascha ...
My Minecraft Smart Home: Prototyping the internet of uncanny things - Sascha ...Codemotion Tel Aviv
 
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG TorinoDistributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG TorinoCodemotion Tel Aviv
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesCodemotion Tel Aviv
 
Fullstack DDD with ASP.NET Core and Anguar 2 - Ronald Harmsen, NForza
Fullstack DDD with ASP.NET Core and Anguar 2 - Ronald Harmsen, NForzaFullstack DDD with ASP.NET Core and Anguar 2 - Ronald Harmsen, NForza
Fullstack DDD with ASP.NET Core and Anguar 2 - Ronald Harmsen, NForzaCodemotion Tel Aviv
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixCodemotion Tel Aviv
 
SOA Lessons Learnt (or Microservices done Better) - Sean Farmar, Particular S...
SOA Lessons Learnt (or Microservices done Better) - Sean Farmar, Particular S...SOA Lessons Learnt (or Microservices done Better) - Sean Farmar, Particular S...
SOA Lessons Learnt (or Microservices done Better) - Sean Farmar, Particular S...Codemotion Tel Aviv
 
Getting Physical with Web Bluetooth - Uri Shaked, BlackBerry
Getting Physical with Web Bluetooth - Uri Shaked, BlackBerryGetting Physical with Web Bluetooth - Uri Shaked, BlackBerry
Getting Physical with Web Bluetooth - Uri Shaked, BlackBerryCodemotion Tel Aviv
 
Web based virtual reality - Tanay Pant, Mozilla
Web based virtual reality - Tanay Pant, MozillaWeb based virtual reality - Tanay Pant, Mozilla
Web based virtual reality - Tanay Pant, MozillaCodemotion Tel Aviv
 
Material Design Demytified - Ran Nachmany, Google
Material Design Demytified - Ran Nachmany, GoogleMaterial Design Demytified - Ran Nachmany, Google
Material Design Demytified - Ran Nachmany, GoogleCodemotion Tel Aviv
 
All the reasons for choosing react js that you didn't know about - Avi Marcus...
All the reasons for choosing react js that you didn't know about - Avi Marcus...All the reasons for choosing react js that you didn't know about - Avi Marcus...
All the reasons for choosing react js that you didn't know about - Avi Marcus...Codemotion Tel Aviv
 

Mehr von Codemotion Tel Aviv (20)

Keynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBMKeynote: Trends in Modern Application Development - Gilly Dekel, IBM
Keynote: Trends in Modern Application Development - Gilly Dekel, IBM
 
Angular is one fire(base)! - Shmuela Jacobs
Angular is one fire(base)! - Shmuela JacobsAngular is one fire(base)! - Shmuela Jacobs
Angular is one fire(base)! - Shmuela Jacobs
 
Demystifying docker networking black magic - Lorenzo Fontana, Kiratech
Demystifying docker networking black magic - Lorenzo Fontana, KiratechDemystifying docker networking black magic - Lorenzo Fontana, Kiratech
Demystifying docker networking black magic - Lorenzo Fontana, Kiratech
 
Faster deep learning solutions from training to inference - Amitai Armon & Ni...
Faster deep learning solutions from training to inference - Amitai Armon & Ni...Faster deep learning solutions from training to inference - Amitai Armon & Ni...
Faster deep learning solutions from training to inference - Amitai Armon & Ni...
 
Facts about multithreading that'll keep you up at night - Guy Bar on, Vonage
Facts about multithreading that'll keep you up at night - Guy Bar on, VonageFacts about multithreading that'll keep you up at night - Guy Bar on, Vonage
Facts about multithreading that'll keep you up at night - Guy Bar on, Vonage
 
Master the Art of the AST (and Take Control of Your JS!) - Yonatan Mevorach, ...
Master the Art of the AST (and Take Control of Your JS!) - Yonatan Mevorach, ...Master the Art of the AST (and Take Control of Your JS!) - Yonatan Mevorach, ...
Master the Art of the AST (and Take Control of Your JS!) - Yonatan Mevorach, ...
 
Unleash the power of angular Reactive Forms - Nir Kaufman, 500Tech
Unleash the power of angular Reactive Forms - Nir Kaufman, 500TechUnleash the power of angular Reactive Forms - Nir Kaufman, 500Tech
Unleash the power of angular Reactive Forms - Nir Kaufman, 500Tech
 
Can we build an Azure IoT controlled device in less than 40 minutes that cost...
Can we build an Azure IoT controlled device in less than 40 minutes that cost...Can we build an Azure IoT controlled device in less than 40 minutes that cost...
Can we build an Azure IoT controlled device in less than 40 minutes that cost...
 
Actors and Microservices - Can two walk together? - Rotem Hermon, Gigya
Actors and Microservices - Can two walk together? - Rotem Hermon, GigyaActors and Microservices - Can two walk together? - Rotem Hermon, Gigya
Actors and Microservices - Can two walk together? - Rotem Hermon, Gigya
 
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
 
My Minecraft Smart Home: Prototyping the internet of uncanny things - Sascha ...
My Minecraft Smart Home: Prototyping the internet of uncanny things - Sascha ...My Minecraft Smart Home: Prototyping the internet of uncanny things - Sascha ...
My Minecraft Smart Home: Prototyping the internet of uncanny things - Sascha ...
 
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG TorinoDistributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
Distributed Systems explained (with NodeJS) - Bruno Bossola, JUG Torino
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with Kubernetes
 
Fullstack DDD with ASP.NET Core and Anguar 2 - Ronald Harmsen, NForza
Fullstack DDD with ASP.NET Core and Anguar 2 - Ronald Harmsen, NForzaFullstack DDD with ASP.NET Core and Anguar 2 - Ronald Harmsen, NForza
Fullstack DDD with ASP.NET Core and Anguar 2 - Ronald Harmsen, NForza
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 
SOA Lessons Learnt (or Microservices done Better) - Sean Farmar, Particular S...
SOA Lessons Learnt (or Microservices done Better) - Sean Farmar, Particular S...SOA Lessons Learnt (or Microservices done Better) - Sean Farmar, Particular S...
SOA Lessons Learnt (or Microservices done Better) - Sean Farmar, Particular S...
 
Getting Physical with Web Bluetooth - Uri Shaked, BlackBerry
Getting Physical with Web Bluetooth - Uri Shaked, BlackBerryGetting Physical with Web Bluetooth - Uri Shaked, BlackBerry
Getting Physical with Web Bluetooth - Uri Shaked, BlackBerry
 
Web based virtual reality - Tanay Pant, Mozilla
Web based virtual reality - Tanay Pant, MozillaWeb based virtual reality - Tanay Pant, Mozilla
Web based virtual reality - Tanay Pant, Mozilla
 
Material Design Demytified - Ran Nachmany, Google
Material Design Demytified - Ran Nachmany, GoogleMaterial Design Demytified - Ran Nachmany, Google
Material Design Demytified - Ran Nachmany, Google
 
All the reasons for choosing react js that you didn't know about - Avi Marcus...
All the reasons for choosing react js that you didn't know about - Avi Marcus...All the reasons for choosing react js that you didn't know about - Avi Marcus...
All the reasons for choosing react js that you didn't know about - Avi Marcus...
 

Kürzlich hochgeladen

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Kürzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Processing Big Data in Real-Time - Yanai Franchi, Tikal

  • 1. 1 ProcessingProcessing “BIG-DATA”“BIG-DATA” InIn Real TimeReal Time Yanai Franchi , TikalYanai Franchi , Tikal
  • 2. 2 Two years ago...Two years ago...
  • 3. 3
  • 5. 5 After a Long Travel DayAfter a Long Travel Day
  • 6. 6 Going to a Salsa Club
  • 7. 7 Best Salsa Club NOW ● Good Music ● Crowded – Now!
  • 8. 8 Same Problem in “gogobot”
  • 9. 9
  • 10. 10 gogobot checkin Heat Map Service Lets' Develop “Gogobot Checkins Heat-Map”
  • 11. 11 Key Notes ● Collector Service - Collects checkins as text addresses – We need to use GeoLocation ServiceWe need to use GeoLocation Service ● Upon elapsed interval, the last locations list will be displayed as Heat-Map in GUI. ● Web Scale service – 10Ks checkins/seconds all over the world (imaginary, but lets do it for the exercise). ● Accuracy – Sample data, NOT critical data. – Proportionately representative – Data volume is large enough tois large enough to compensate for data loss.compensate for data loss.
  • 12. 12 Heat-Map Context Text-Address Checkins Heat-Map Service Gogobot System Gogobot Micro Service Gogobot Micro Service Gogobot Micro Service Geo Location Service Get-GeoCode(Address) Heat-Map Last Interval Locations
  • 13. 13 Database Persist Checkin Intervals Processing Checkins Read Text Address Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Simulate Checkins with a File Plan A GET Geo Location Geo Location Service
  • 15. 15 Architect - First Reaction...
  • 19. 19 Problems ? ● Tedious: Spend time conf iguring where to send messages, deploying workers, and deploying intermediate queues. ● Brittle: There's little fault-tolerance. ● Painful to scale: Partition of running worker/s is complicated.
  • 20. 20 What We Want ? ● Horizontal scalability ● Fault-tolerance ● No intermediate message brokers! ● Higher level abstraction than message passing ● “Just works” ● Guaranteed data processing (not in this case)
  • 21. 21 Apache Storm ✔Horizontal scalability ✔Fault-tolerance ✔No intermediate message brokers! ✔Higher level abstraction than message passing ✔“Just works” ✔Guaranteed data processing
  • 23. 23 What is Storm ? ● CEP - Open source and distributed realtime computation system. – Makes it easy toMakes it easy to reliably process unboundedreliably process unbounded streamsstreams ofof tuplestuples – Doing for realtime processing what Hadoop did for batchDoing for realtime processing what Hadoop did for batch processing.processing. ● Fast - 1M Tuples/sec per node. – It is scalable,fault-tolerant, guarantees your data will beIt is scalable,fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.processed, and is easy to set up and operate.
  • 24. 24 Streams Tuple Tuple Tuple Tuple Tuple Tuple Unbounded sequence of tuples
  • 26. 26 Bolts Tuple TupleTuple Processes input streams and produces new streams Tuple TupleTupleTuple Tuple TupleTuple
  • 27. 27 Storm Topology Network of spouts and bolts Tuple TupleTuple TupleTuple TupleTuple Tuple TupleTupleTuple Tuple Tuple Tuple Tuple TupleTupleTuple
  • 28. 28 Guarantee for Processing ● Storm guarantees the full processing of a tuple by tracking its state ● In case of failure, Storm can re-process it. ● Source tuples with full “acked” trees are removed from the system
  • 29. 29 Tasks (Bolt/Spout Instance) Spouts and bolts execute as many tasks across the cluster
  • 30. 30 Stream Grouping When a tuple is emitted, which task (instance) does it go to?
  • 31. 31 Stream Grouping ● Shuff le grouping: pick a random task ● Fields grouping: consistent hashing on a subset of tuple f ields ● All grouping: send to all tasks ● Global grouping: pick task with lowest id
  • 32. 32 Tasks , Executors , Workers Task Task Task Worker Process Sput / Bolt Sput / Bolt Sput / Bolt = Executor Thread JVM Executor Thread
  • 33. 33 Bolt B Bolt B Worker Process Executor Spout A Executor Node Supervisor Bolt C Bolt C Executor Bolt B Bolt B Worker Process Executor Spout A Executor Node Supervisor Bolt C Bolt C Executor
  • 34. 34 Nimbus Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Upload/Rebalance Heat-Map Topology Zoo Keeper Nodes Storm Architecture Master Node (similar to Hadoop JobTracker) NOT critical for running topology
  • 35. 35 Nimbus Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Upload/Rebalance Heat-Map Topology Zoo Keeper Storm Architecture Used For Cluster Coordination A few nodes
  • 36. 36 Nimbus Supervisor Supervisor Supervisor Supervisor Supervisor Supervisor Upload/Rebalance Heat-Map Topology Zoo Keeper Storm Architecture Run Worker Processes
  • 38. 38 HeatMap Input/Output Tuples ● Input Tuples: Timestamp and Text Address : – (9:00:07 PM , “287 Hudson St New York NY 10013”)(9:00:07 PM , “287 Hudson St New York NY 10013”) ● Output Tuple: Time interval, and a list of points for it: – (9:00:00 PM to 9:00:15 PM,(9:00:00 PM to 9:00:15 PM, ListList((((40.719,-73.98740.719,-73.987),(40.726,-74.001),(),(40.726,-74.001),(40.719,-73.98740.719,-73.987))))
  • 39. 39 Checkins Spout Geocode Lookup Bolt Heatmap Builder Bolt Persistor Bolt (9:01 PM @ 287 Hudson st) (9:01 PM , (40.736, -74,354))) Heat Map Storm Topology (9:00 PM – 9:15 PM , List((40.73, -74,34), (51.36, -83,33),(69.73, -34,24)) Upon Elapsed Interval
  • 40. 40 Checkins Spout public class CheckinsSpout extends BaseRichSpout { private List<String> sampleLocations; private int nextEmitIndex; private SpoutOutputCollector outputCollector; @Override public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) { this.outputCollector = spoutOutputCollector; this.nextEmitIndex = 0; sampleLocations = IOUtils.readLines( ClassLoader.getSystemResourceAsStream("sanple-locations.txt")); } @Override public void nextTuple() { String address = checkins.get(nextEmitIndex); String checkin = new Date().getTime()+"@ADDRESS:"+address; outputCollector.emit(new Values(checkin)); nextEmitIndex = (nextEmitIndex + 1) % sampleLocations.size(); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("str")); } We hold state No need for thread safety Declare output fields Been called iteratively by Storm
  • 41. 41 Geocode Lookup Bolt public class GeocodeLookupBolt extends BaseBasicBolt { private LocatorService locatorService; @Override public void prepare(Map stormConf, TopologyContext context) { locatorService = new GoogleLocatorService(); } @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { String str = tuple.getStringByField("str"); String[] parts = str.split("@"); Long time = Long.valueOf(parts[0]); String address = parts[1]; LocationDTO locationDTO = locatorService.getLocation(address); String city = locationDTO.getCity(); outputCollector.emit(new Values(city,time,locationDTO) ); } @Override public void declareOutputFields(OutputFieldsDeclarer fieldsDeclarer) { fieldsDeclarer.declare(new Fields("city","time", "location")); } } Get Geocode, Create DTO
  • 42. 42 Tick Tuple – Repeating Mantra
  • 43. 43 Two Streams to Heat-Map Builder On tick tuple, we f lush our Heat-Map Checkin 1 Checkin 4 Checkin 5 Checkin 6 HeatMap- Builder Bolt
  • 44. 44 Tick Tuple in Action public class HeatMapBuilderBolt extends BaseBasicBolt { private Map<String, List<LocationDTO>> heatmaps; @Override public Map<String, Object> getComponentConfiguration() { Config conf = new Config(); conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 60 ); return conf; } @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { if (isTickTuple(tuple)) { // Emit accumulated intervals } else { // Add check-in info to the current interval in the Map } } private boolean isTickTuple(Tuple tuple) { return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID) && tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("time-interval", "city","locationsList")); } Tick interval Hold latest intervals
  • 45. 45 Persister Bolt public class PersistorBolt extends BaseBasicBolt { private Jedis jedis; @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { Long timeInterval = tuple.getLongByField("time-interval"); String city = tuple.getStringByField("city"); String locationsList = objectMapper.writeValueAsString ( tuple.getValueByField("locationsList")); String dbKey = "checkins-" + timeInterval+"@"+city; jedis.setex(dbKey, 3600*24 ,locationsList); jedis.publish("location-key", dbKey); } } Publish in Redis channel for debugging Persist in Redis for 24h
  • 46. 46 Shuffle Grouping Shuffle Grouping Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Sample Checkins File Read Text Addresses Transforming the Tuples Checkins Spout Geocode Lookup Bolt Heatmap Builder Bolt Database Persistor Bolt Get Geo Location Geo Location Service Field Grouping(city) Group by city
  • 47. 47 Heat Map Topology public class LocalTopologyRunner { public static void main(String[] args) { TopologyBuilder builder = buildTopolgy(); StormSubmitter.submitTopology( "local-heatmap", new Config(), builder.createTopology()); } private static TopologyBuilder buildTopolgy() { topologyBuilder builder = new TopologyBuilder(); builder.setSpout("checkins", new CheckinsSpout()); builder.setBolt("geocode-lookup", new GeocodeLookupBolt() ) .shuffleGrouping("checkins"); builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() ) .fieldsGrouping("geocode-lookup", new Fields("city")); builder.setBolt("persistor", new PersistorBolt() ) .shuffleGrouping("heatmap-builder"); return builder; } }
  • 49. 49
  • 50. 50 Scaling the Topology public class LocalTopologyRunner { conf.setNumWorkers(20); public static void main(String[] args) { TopologyBuilder builder = buildTopolgy(); Config conf = new Config(); conf.setNumWorkers(2); StormSubmitter.submitTopology( "local-heatmap", conf, builder.createTopology()); } private static TopologyBuilder buildTopolgy() { topologyBuilder builder = new TopologyBuilder(); builder.setSpout("checkins", new CheckinsSpout(), 4 ); builder.setBolt("geocode-lookup", new GeocodeLookupBolt() , 8 ) .shuffleGrouping("checkins").setNumTasks(64); builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() , 4) .fieldsGrouping("geocode-lookup", new Fields("city")); builder.setBolt("persistor", new PersistorBolt() , 2 ) .shuffleGrouping("heatmap-builder").setNumTasks(4); return builder; Parallelism hint Increase Tasks For Future Set no. of workers
  • 51. 51 Database Storm Heat-Map Topology Persist Checkin Intervals GET Geo Location Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Read Text Address Sample Checkins File Recap – Plan A Geo Location Service
  • 54. 54 Plan B - Kafka Spout&Bolt to HeatMap Geocode Lookup Bolt Heatmap Builder Bolt Kafka Checkins Spout Database Persistor Bolt Geo Location Service Read Text Addresses Checkin Kafka Topic Publish Checkins Locations Topic Kafka Locations Bolt
  • 55. 55
  • 56. 56 They all are Good But not for all use-cases
  • 58. 58
  • 59. 59
  • 60. 60
  • 62. 62
  • 63. 63
  • 64. 64
  • 65. 65
  • 66. 66 Stateless Broker & Doesn't Fear the File System
  • 67. 67
  • 68. 68
  • 69. 69
  • 70. 70 Topics ● Logical collections of partitions (the physical f iles). ● A broker contains some of the partitions for a topic
  • 71. 71 A partition is Consumed by Exactly One Group's Consumer
  • 73. 73 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  • 74. 74 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  • 75. 75 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  • 76. 76 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  • 77. 77 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  • 78. 78 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  • 79. 79 Broker 1 Broker 4Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  • 80. 80 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  • 81. 81 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  • 82. 82 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Consumer 2 Producer 1 Producer 2
  • 83. 83 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Producer 1 Producer 2
  • 84. 84 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Producer 1 Producer 2
  • 85. 85 Broker 1 Broker 3Broker 2 Zoo Keeper Consumer 1 Producer 1 Producer 2
  • 86. 86 Performance Benchmark 3 Brokers 3 Producers 3 Consumers Cheap Machines
  • 87. • “Up to 2 million writes/sec on 3 cheap machines” • Using 3 producers on 3 different machines, 3x async replication, • Only 1 producer/machine because NIC already saturatedOnly 1 producer/machine because NIC already saturated • End-to-End Latency is about 10ms for 99.9% • Sustained throughput as stored data grows • • • 87
  • 88. 88 Add Kafka to our Topology public class LocalTopologyRunner { ... private static TopologyBuilder buildTopolgy() { ... builder.setSpout("checkins", new KafkaSpout(kafkaConfig) , 4); ... builder.setBolt("kafkaProducer", new KafkaOutputBolt ( "localhost:9092", "kafka.serializer.StringEncoder", "locations-topic")) .shuffleGrouping("persistor"); return builder; } } Kafka Bolt Kafka Spout
  • 89. 89 Checkin HTTP Reactor Publish Checkins Database Checkin Kafka Topic Consume Checkins Storm Heat-Map Topology Locations Kafka Topic Publish Interval Key Persist Checkin Intervals Geo Location ServiceGET Geo Location Text-Address
  • 91. 91 Summary When You go out to Salsa Club... ● Good Music ● Crowded
  • 92. 92 More Conclusions.. ● BigData – Also refers to Velocity of data (not only Volume of data) ● Storm – Great for real-time BigData processing. Complementary for Hadoop batch jobs. ● Kafka – Great messaging for logs/events data, been served as a good “source” for Storm spout