10. MapReduce Jobs
• Challenge: How to monitor real-time bandwidth usage? No developed protocol
could be utilized(HTTP GET/POST TCP/IP both requires server-cli ack
communication)
• Solution: Map data into (time,data)
• time resolution 1 second, then averageByKey
1 second 1 second
18. I would recommend…
JSON
If your app is
Lag-critical
Light-sized data
Avro
If your app is
Data-heavy
real-time critical
Protobuf
If your app is
Heavily
replying on
Google
Services
Need Perfect
documentatio
n
19. About me
• University of Southern California
• MS Electrical Engineering
Before Insight At Insight
Basic MapReduce Spark, Kafka, Redis
Compression Serialization
Linux C AWS, Bash, tmux…
Basic front-end Full Stack Dev
Think Alone Communication
20. Avro vs. Protobuf
• Why Avro serialization is slightly smaller than Protobuf?
• Avro schema has both attribution name and type.
• Protobuf tags each record with name tag and type. (1 byte more per record)
• Schema Evolution?
• Avro must keep the most recent version(order matters, field matters), or runtime risk
• Protobuf may decode with previous schema without runtime error, overall more flexible.
• Optional Feature?
• Protobuf: decode with validation for required
• Avro: null in a union to indicate optional