4. What’s going on
•
Mainframes are obsolete, replaced by commodity hardware’s cluster
•
TenG (10Gb/s) links are the new standard
•
RESTful APIs are everywhere
•
Everybody wants to visit Paxos island
•
Firehoses do not only carry water
•
Asynchronous non-blocking functional programming is taught at primary school
•
NoSQL is the new way to store data at scale
•
API management startups are rising (and raising)
•
Hadoop keywords boost your LinkedIn profile by 2000%
•
Public clouds are responsible for more than 50% of the global Internet traffic
•
… and counting …
|
5. A Possible Deployment
|
Source: http://dev.datasift.com/blog/high-scalability
Speaker’s Logo
Note: the diagram is stamped from 2009, it is probably
partially or even completely outdated today
7. Batch Processing
Batch 1 starts
processing
Batch 2 starts
processing
Batch 2 ready
to be served
Batch 1 ready
to be served
Batch 1
Batch 2
t2
t1
Batch 3 starts
processing
t4
t3
Query data from t1
Data gap
Batch 3
Data gap
|
t5
Query data from t3
Time
8. Batch Processing in details
Let some time
for data to finish
upload
Load results
in a data store
Batch with data from
yesterday
Time
New batch
granularity
period
Processing time
Query data from
the day before yesterday?
|
Notify the retrieval system
a new batch is ready
to be served
9. Realtime Query
•
Interactive query
•
REST like request/response query type
And
•
Query the latest version of the data
•
Latest meaning n seconds ago with n known and fixed
|
10. Hybrid Approach
Batch 1 starts
processing
Batch 2 starts
processing
Batch 2 ready
to be served
Batch 1 ready
to be served
Batch 1
t1
Batch 2
t2
t4
t3
Time
Complementary data for batch 1
Complementary data for batch 2
Query data from t1 snapshot
AND complementary data
|
Query data from t2 snapshot
AND complementary data