4. Examples
● Recommendations
○ Amazon
○ Netflix
● Ad Targeting
○ Hulu
○ YouTube
● Fraud Detection
○ Visa
○ JPMC
● Spam
○ GMail
● Search Personalization
○ Google
5. Overall Requirements
● React to events in near real time.
○ Low latency reads/writes.
○ Event-driven analysis (not just batch).
● Web scale: 100's of millions of users.
○ High throughput reads/writes.
● Reliable.
○ Distributed, fault tolerant, graceful degradation.
● Flexible.
○ Evolvable schema.
○ Support ad-hoc experimentation and analyses.
12. User-centric Data Model
<column>
<name>email</name>
<description>Email address</description>
<schema>"string"</schema>
</column>
Cells have Avro schemas for evolvable storage and retrieval.
14. Analyzing Data: Producers
● produce() generates derived data for a single row:
○ recommend
○ profile
○ classify
○ etc.
15. Analyzing Data: Gatherers
● gather() aggregates data across all rows.
○ build association rules for collaborative filtering.
○ train classifier models.
○ compute prior probabilities for events.
○ etc.
16. Example: Ad Targeting
User Games Interests Recommended Ads
Alex MiniGolf Pro,
Extreme Pond Fishing
Bob Kitten Krash
Carol Apples Everywhere,
Underground Racer
Game Categories
MiniGolf Pro Golf,
Sports
Kitten Krash Cats,
Racing
Apples Everywhere Puzzles
17. Example: Ad Targeting
User Games Interests Recommended Ads
Alex MiniGolf Pro, Golf,
Extreme Pond Fishing Sports
Bob Kitten Krash
Producer
Carol Apples Everywhere,
Underground Racer
Game Categories
MiniGolf Pro Golf,
Sports
Kitten Krash Cats,
Racing
Apples Everywhere Puzzles
18. Example: Ad Targeting
User Games Interests Recommended Ads
Alex MiniGolf Pro, Golf, ESPN.com
Extreme Pond Fishing Sports
Bob Kitten Krash
Carol Apples Everywhere,
Producer
Underground Racer
Category Advertisement
Golf ESPN.com
Animals Petco.com
Racing Nascar.com
19. Example: Ad Targeting
User Games Interests Recommended Ads
Alex MiniGolf Pro, Golf, ESPN.com
Extreme Pond Fishing Sports
Bob Kitten Krash
Carol Apples Everywhere,
Producer
Underground Racer
Category Advertisement
Golf ESPN.com
Wait, where did
Animals Petco.com
this come from?
Racing Nascar.com
20. Example: Gathering Associations
User Games Interests Clicked Ads
Alex MiniGolf Pro, Golf,
Extreme Pond Fishing Sports
Bob Kitten Krash
Carol Apples Everywhere,
Underground Racer
21. Example: Gathering Associations
User Games Interests Clicked Ads
Alex MiniGolf Pro, Golf,
Extreme Pond Fishing Sports
Bob Kitten Krash
Carol Apples Everywhere,
Underground Racer
28. Final Thoughts
● A user-centric data storage model has great advantages:
○ Fast per-user reads and writes.
○ Already pivoted by your most common analysis.
● HBase provides fast, reliable random-access and scans.
○ Billions of rows, millions of columns.
○ Integrates well with MapReduce for analysis.
● Build scalable personalized applications with WibiData.
○ Check out www.wibidata.com
Garrett Wu | gwu@odiago.com