2. Messaging
Messaging is a form of communication to
exchange information
Messaging is the means to
distribute/share/seek information in software
systems
Alan Kay on Messaging
http://c2.com/cgi/wiki?AlanKayOnMessaging
3. Messaging
● Enables us to build distributed systems
Producer
Messaging
Consumer Consumer
5. Type of processing
● Consider a scenario - counting ERRORs in
access log
● Largely a batch execution
cat access.log | grep ERROR |
sed -e 's/.* (http://.*)/1/g' |
sort | uniq -c > errors.txt
cat grep uniq
sed
6. Type of processing
● Replacing 'batch' with 'real time' processing
tail -f access.log
cat access.log | grep ERROR |
sed -e 's/.* (http://.*)/1/g' |
sort | uniq -c > errors.txt
Is this expression still valid?
7. Type of processing
tail -f access.log | grep ERROR |
sed -e 's/.* (http://.*)/1/g' |
sort | uniq -c > errors.txt
Batch Processors
std
grep
out sed uniq
batch
8. Messaging
● What about processing logs from different
systems?
● What about distributing processing on
multiple systems?
● Can I tap into one of the pipes without
stopping the system?
● How can I add to already running pipeline?
9. Introducing...
Messaging as a Platform
● You never have to write code to exchange
messages
● You only write processes (sed, grep etc) and
string them together; can even provide rich
set of built ins
● Process definition will remained unchanged
even when the nature of the processing
changes!
11. MaaP - Messaging Services
Provides
● Allows access from multiple hosts
○ Enables distributed processing
● Capacity buffering where there is throughput
difference
● Transactions and Durability
○ Enables failures and fault tolerance
● Ordering, Retries ...
12. MaaP - Architecture
Next we will need Message Brokers to enable
dynamic routing
Wrappers to enable
Message Service
Producer Process
Broker
Q Q Q Q
Consumer Processes
13. MaaP - Brokers
● Manages the route from producer to
consumer queues
● Consumers can join or opt out anytime
● Can optionally enable consumers to consume
messages from beginning, or, some point in
past or the live messages
● Enables consumers to move between
different messaging systems
14. MaaP - Architecture
A message batching system that dumps the
messages on a rule basis to enable batch
processing
Q Periodic HDFS
Batch
15. MaaP - Architecture
A mechanism to pump processed data back to
event-based processing
Batch process complete notification
HDFS Pull Listener Push Broker
16. MaaP - Architecture
And finally a Process Manager
● that accepts the process binaries and
distribute it on the hosts allocated to the
platform.
● Load balance based on computation capacity
available
18. MaaP - Uses
● Log processing
○ Real time event filtering, routed to ->
○ Aggregation (batched) and then ->
○ Near real-time monitoring
● Product Feed Processing and Aggregation
○ Crawler, emails, ftp has feeds : routed to ->
○ Extract product info (batched) to ->
○ Pipeline to update into the store and invalidate
caches in near real time
19. @ Flipkart
A lot of messages and a lot of solutions
● Work Queues (Rabbitmq)
● SOA via Restbus
● Event Replication across systems
○ Notification (cache updates)
● Events to batch processing and back
○ mysql <-> hadoop <-> mysql (analytics)
We need to connect them! That's all :P