Weitere ähnliche Inhalte Ähnlich wie Storm overview & integration (20) Kürzlich hochgeladen (20) Storm overview & integration3. Real-time analytics
WHAT IS IT GOOD FOR?
Online machine learning
Continuous computation
Distributed RPC
ETL (Extract, Transform, Load)
…
6. PRIMITIVES
Field 1 /
Value 1
Field 2 /
Value 2
Field 3 /
Value 3
Field 4 /
Value 4
Field 5 /
Value 5
Tuple
Tuple Tuple Tuple Tuple
Stream
13. FAILURES
Task died – failed tuples replayed
Acker task died – related tuples
timeout and are replayed
Spout task died – source replays, e.g.
pending messages are placed back on
the queue
14. WHAT DO I HAVE TO DO?
Inform about new links in tree
Inform when finished with a tuple
Every tuple must be acked or failed
20. TRANSFORMATION
ORIGINAL
{
id: df45er87c78df,
sender: “Info”,
destination: “39345123456”,
parts: 2,
price: 100,
client: “Demo”,
time: “2014-06-02 14:47:58”,
country: “IT”,
network: “Wind”,
type: “SMS”,
…
}
{
client: “Demo”,
type: “SMS”,
country: “IT”,
network: “Wind”,
bucket: “2014-06-02 14:45:00”,
traffic: 2,
expenses: 200
}
COMPUTED
21. CODE
TridentState tridentState = topology
.newStream("CoreEvents", buildKafkaSpout())
.parallelismHint(4)
.each(
new Fields("bytes"),
new CoreEventMessageParser(),
new Fields("time", "client", "network", "country", "type", "parts", "price"))
.each(
new Fields("time"),
new QuarterTimeBucket(),
new Fields("bucket"))
.project(new Fields("bucket", "client", "network", "country", "type", "traffic", "expenses“))
.groupBy(new Fields("bucket", "client", "network", "country", "type"))
.persistentAggregate(getStateFactory(),
new Fields("traffic", "expenses"),
new Sum(),
new Fields("trafficExpenses"))
.parallelismHint(8);
23. TUNING STORAGE
1st Issue - Storage
Random access – 1.500 w/s limit
Staged approach – 30.000 w/s limit
No locks – isolated
Scalable – each worker it’s stage
Main table indexing nicely
Doesn’t affect reading
25. TUNING TOPOLOGY
2nd Issue - Serialization
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
Raw/s Expanded/s Writes/s
200 KB
1 MB
4 MB
8 MB
16 MB
24 MB
Plateauing