SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Building Personalized
Applications at Scale



         Garrett Wu
   Director of Engineering
        Odiago, Inc.
Personalized Applications
Personalized Applications
Examples

● Recommendations
   ○ Amazon
   ○ Netflix
● Ad Targeting
   ○ Hulu
   ○ YouTube
● Fraud Detection
   ○ Visa
   ○ JPMC
● Spam
   ○ GMail
● Search Personalization
   ○ Google
Overall Requirements

● React to events in near real time.
   ○ Low latency reads/writes.
   ○ Event-driven analysis (not just batch).
● Web scale: 100's of millions of users.
   ○ High throughput reads/writes.
● Reliable.
   ○ Distributed, fault tolerant, graceful degradation.
● Flexible.
   ○ Evolvable schema.
   ○ Support ad-hoc experimentation and analyses.
Data Flow
Data Flow
Datastore Requirements

1. Random writes.
2. Analysis (MapReduce).
3. Random reads.
Datastore Requirements

1. Random writes.
2. Analysis (MapReduce).
3. Random reads.
Data Model Requirements

 1. Write user-centric data.
     ○ "Bob bought the Hunger Games book."
     ○ "Sally viewed product page X."
 2. Query user-centric data.
     ○ "What were Jim's most recent 5 purchases?"
     ○ "What are Sue's top 3 recommendations?"

Given everything we know about John:
   ● Transactions.
   ● Tweets.
   ● Likes.
... recommend, classify, predict, cluster, profile.
User-centric Data Model
User-centric Data Model



             <column>
              <name>email</name>
              <description>Email address</description>
              <schema>"string"</schema>
             </column>




Cells have Avro schemas for evolvable storage and retrieval.
User-centric Data Model




 ● 3-D storage with timestamps.
Analyzing Data: Producers




 ● produce() generates derived data for a single row:
    ○ recommend
    ○ profile
    ○ classify
    ○ etc.
Analyzing Data: Gatherers




● gather() aggregates data across all rows.
   ○ build association rules for collaborative filtering.
   ○ train classifier models.
   ○ compute prior probabilities for events.
   ○ etc.
Example: Ad Targeting
User                Games                  Interests   Recommended Ads
Alex                MiniGolf Pro,
                    Extreme Pond Fishing


Bob                 Kitten Krash



Carol               Apples Everywhere,
                    Underground Racer




Game                        Categories
MiniGolf Pro                Golf,
                            Sports

Kitten Krash                Cats,
                            Racing

Apples Everywhere           Puzzles
Example: Ad Targeting
User                Games                  Interests       Recommended Ads
Alex                MiniGolf Pro,          Golf,
                    Extreme Pond Fishing   Sports


Bob                 Kitten Krash


                                                Producer
Carol               Apples Everywhere,
                    Underground Racer




Game                        Categories
MiniGolf Pro                Golf,
                            Sports

Kitten Krash                Cats,
                            Racing

Apples Everywhere           Puzzles
Example: Ad Targeting
User       Games                   Interests              Recommended Ads
Alex       MiniGolf Pro,           Golf,                  ESPN.com
           Extreme Pond Fishing    Sports


Bob        Kitten Krash



Carol      Apples Everywhere,
                                               Producer
           Underground Racer




Category           Advertisement
Golf               ESPN.com


Animals            Petco.com


Racing             Nascar.com
Example: Ad Targeting
User       Games                   Interests              Recommended Ads
Alex       MiniGolf Pro,           Golf,                  ESPN.com
           Extreme Pond Fishing    Sports


Bob        Kitten Krash



Carol      Apples Everywhere,
                                               Producer
           Underground Racer




Category           Advertisement
Golf               ESPN.com

                                                      Wait, where did
Animals            Petco.com
                                                      this come from?
Racing             Nascar.com
Example: Gathering Associations
User    Games                  Interests   Clicked Ads
Alex    MiniGolf Pro,          Golf,
        Extreme Pond Fishing   Sports

Bob     Kitten Krash


Carol   Apples Everywhere,
        Underground Racer
Example: Gathering Associations
User    Games                  Interests   Clicked Ads
Alex    MiniGolf Pro,          Golf,
        Extreme Pond Fishing   Sports

Bob     Kitten Krash


Carol   Apples Everywhere,
        Underground Racer
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations
Example: Gathering Associations




      Map




      .
      .
      .
Example: Gathering Associations




      Map          Reduce




      .
      .
      .
Final Thoughts

● A user-centric data storage model has great advantages:
   ○ Fast per-user reads and writes.
   ○ Already pivoted by your most common analysis.
● HBase provides fast, reliable random-access and scans.
   ○ Billions of rows, millions of columns.
   ○ Integrates well with MapReduce for analysis.


● Build scalable personalized applications with WibiData.
   ○ Check out www.wibidata.com




                                          Garrett Wu | gwu@odiago.com

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 

Kürzlich hochgeladen (20)

Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 

Building Personalized Applications at Scale

  • 1. Building Personalized Applications at Scale Garrett Wu Director of Engineering Odiago, Inc.
  • 4. Examples ● Recommendations ○ Amazon ○ Netflix ● Ad Targeting ○ Hulu ○ YouTube ● Fraud Detection ○ Visa ○ JPMC ● Spam ○ GMail ● Search Personalization ○ Google
  • 5. Overall Requirements ● React to events in near real time. ○ Low latency reads/writes. ○ Event-driven analysis (not just batch). ● Web scale: 100's of millions of users. ○ High throughput reads/writes. ● Reliable. ○ Distributed, fault tolerant, graceful degradation. ● Flexible. ○ Evolvable schema. ○ Support ad-hoc experimentation and analyses.
  • 8. Datastore Requirements 1. Random writes. 2. Analysis (MapReduce). 3. Random reads.
  • 9. Datastore Requirements 1. Random writes. 2. Analysis (MapReduce). 3. Random reads.
  • 10. Data Model Requirements 1. Write user-centric data. ○ "Bob bought the Hunger Games book." ○ "Sally viewed product page X." 2. Query user-centric data. ○ "What were Jim's most recent 5 purchases?" ○ "What are Sue's top 3 recommendations?" Given everything we know about John: ● Transactions. ● Tweets. ● Likes. ... recommend, classify, predict, cluster, profile.
  • 12. User-centric Data Model <column> <name>email</name> <description>Email address</description> <schema>"string"</schema> </column> Cells have Avro schemas for evolvable storage and retrieval.
  • 13. User-centric Data Model ● 3-D storage with timestamps.
  • 14. Analyzing Data: Producers ● produce() generates derived data for a single row: ○ recommend ○ profile ○ classify ○ etc.
  • 15. Analyzing Data: Gatherers ● gather() aggregates data across all rows. ○ build association rules for collaborative filtering. ○ train classifier models. ○ compute prior probabilities for events. ○ etc.
  • 16. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Extreme Pond Fishing Bob Kitten Krash Carol Apples Everywhere, Underground Racer Game Categories MiniGolf Pro Golf, Sports Kitten Krash Cats, Racing Apples Everywhere Puzzles
  • 17. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Golf, Extreme Pond Fishing Sports Bob Kitten Krash Producer Carol Apples Everywhere, Underground Racer Game Categories MiniGolf Pro Golf, Sports Kitten Krash Cats, Racing Apples Everywhere Puzzles
  • 18. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Golf, ESPN.com Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Producer Underground Racer Category Advertisement Golf ESPN.com Animals Petco.com Racing Nascar.com
  • 19. Example: Ad Targeting User Games Interests Recommended Ads Alex MiniGolf Pro, Golf, ESPN.com Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Producer Underground Racer Category Advertisement Golf ESPN.com Wait, where did Animals Petco.com this come from? Racing Nascar.com
  • 20. Example: Gathering Associations User Games Interests Clicked Ads Alex MiniGolf Pro, Golf, Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Underground Racer
  • 21. Example: Gathering Associations User Games Interests Clicked Ads Alex MiniGolf Pro, Golf, Extreme Pond Fishing Sports Bob Kitten Krash Carol Apples Everywhere, Underground Racer
  • 28. Final Thoughts ● A user-centric data storage model has great advantages: ○ Fast per-user reads and writes. ○ Already pivoted by your most common analysis. ● HBase provides fast, reliable random-access and scans. ○ Billions of rows, millions of columns. ○ Integrates well with MapReduce for analysis. ● Build scalable personalized applications with WibiData. ○ Check out www.wibidata.com Garrett Wu | gwu@odiago.com