SlideShare ist ein Scribd-Unternehmen logo
1 von 46
ANALYZING LARGE-SCALE USER DATA

                    SPEAKER: Aaron Kimball
                             CTO
                             WibiData


Friday, July 27, 2012
Friday, July 27, 2012
Analyzing	
  Large-­‐Scale	
  User	
  Data
                      with	
  Hadoop	
  and	
  HBase

                        Aaron	
  Kimball	
  –	
  CTO



                                                       WibiData,	
  
Friday, July 27, 2012
We	
  can	
  now	
  collect	
  
                        more	
  data	
  than	
  at	
  
                        any	
  Dme	
  in	
  history.


Friday, July 27, 2012
Yesterday’s	
  engineering	
  
    challenge:	
  FiJng	
  the	
  
    problem	
  into	
  the	
  
    hardware.
Friday, July 27, 2012
Today’s	
  constrained	
  
               resource	
  is	
  
               understanding.

Friday, July 27, 2012
How	
  do	
  we	
  best	
  apply	
  
          data




                          …to	
  beMer	
  serving	
  our	
  
Friday, July 27, 2012
The	
  best	
  products	
  are	
  user-­‐
         • IntuiDve	
  UI
         • ConDnuously	
  learning
                   – Guided	
  search
                   – Smarter	
  recommenda1ons
         • More	
  effec1ve	
  service



Friday, July 27, 2012
What	
  are	
  we	
  building	
  




Friday, July 27, 2012
What	
  are	
  we	
  building	
  




Friday, July 27, 2012
What	
  are	
  we	
  building	
  




Friday, July 27, 2012
What	
  are	
  we	
  building	
  




Friday, July 27, 2012
What	
  are	
  we	
  building	
  




Friday, July 27, 2012
What	
  are	
  we	
  building	
  




Friday, July 27, 2012
What	
  are	
  we	
  building	
  




Friday, July 27, 2012
What	
  are	
  we	
  building	
  




Friday, July 27, 2012
What	
  are	
  we	
  building	
  




Friday, July 27, 2012
What	
  are	
  we	
  building	
  




Friday, July 27, 2012
Requirements




                 1.	
  Understand	
  the	
  user	
  
                 populaDon
Friday, July 27, 2012
Requirements

                        2.	
  Respond	
  to	
  
                              users	
  in	
  real	
  
                              Dme



Friday, July 27, 2012
Requirements




                 3.	
  Support	
  graceful	
  data	
  
                 evoluDon
Friday, July 27, 2012
Large-­‐scale	
  data	
  science	
  is	
  
         • What	
  does	
  a	
  user	
  look	
  like?
                   – What	
  data	
  is	
  available	
  about	
  the	
  user?
                   – Which	
  features	
  are	
  important?
                   – Which	
  features	
  are	
  correlated?
         • How	
  do	
  I	
  model	
  this	
  in	
  MapReduce?
         • How	
  do	
  I	
  serve	
  results	
  in	
  a	
  Dmely	
  

Friday, July 27, 2012
Friday, July 27, 2012
Tools	
  of	
  the	
  trade
         • Store	
  all	
  data	
  about	
  a	
  
           user	
  in	
  one	
  place
         • Support	
  real-­‐Dme	
  
           get/put,	
  as	
  well	
  as	
  
           MapReduce



Friday, July 27, 2012
Tools	
  of	
  the	
  trade
                        • Use	
  complex	
  data	
  
                          types	
  to	
  model	
  
                          complex	
  data
                        • Support	
  extended	
  
                          data	
  models	
  over	
  
                          Dme

Friday, July 27, 2012
Tools	
  of	
  the	
  trade
         • Abstract	
  computaDonal	
  
           model	
  away	
  from	
  
           MapReduce
         • Support	
  computaDon	
  
           over	
  all	
  users…	
  or	
  one	
  
           user	
  at	
  a	
  Dme

Friday, July 27, 2012
Friday, July 27, 2012
Friday, July 27, 2012
Friday, July 27, 2012
Friday, July 27, 2012
Friday, July 27, 2012
Friday, July 27, 2012
Friday, July 27, 2012
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  for	
  set-­‐top	
  boxes



                                                Viewing/recording	
  
                                                history




Friday, July 27, 2012
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  for	
  set-­‐top	
  boxes



                                                Viewing/recording	
  
                                                history




Friday, July 27, 2012
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  for	
  set-­‐top	
  boxes


                                                                                                                                        	
  	
  	
  	
  	
  Libraries
                                                                                                                                        Device	
  and	
  User	
  Analysis



                                                Viewing/recording	
  
                                                history

                                                Personalized	
  offers	
  
                                                       and	
  
                                                recommenda=ons




Friday, July 27, 2012
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  for	
  set-­‐top	
  boxes


                                                                                                                                        	
  	
  	
  	
  	
  Libraries
                                                                                                                                        Device	
  and	
  User	
  Analysis



                                                Viewing/recording	
  
                                                history

                                                Personalized	
  offers	
  
                                                       and	
  
                                                recommenda=ons




Friday, July 27, 2012
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  for	
  set-­‐top	
  boxes


                                                                                                                                        	
  	
  	
  	
  	
  Libraries
                                                                                                                                        Device	
  and	
  User	
  Analysis



                                                Viewing/recording	
  
                                                history

                                                Personalized	
  offers	
  
                                                       and	
  
                                                recommenda=ons



                                                  Analysis	
  for	
  
                                                    product	
  
                                                   roadmap
Friday, July 27, 2012
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  for	
  set-­‐top	
  boxes


                                                                                                                                        	
  	
  	
  	
  	
  Libraries
                                                                                                                                        Device	
  and	
  User	
  Analysis



                                                Viewing/recording	
  
                                                history

                                                Personalized	
  offers	
  
                                                       and	
  
                                                recommenda=ons



                                                  Analysis	
  for	
  
                                                    product	
  
                                                   roadmap
Friday, July 27, 2012
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  for	
  set-­‐top	
  boxes


                                                                                                                                        	
  	
  	
  	
  	
  Libraries
                                                                                                                                        Device	
  and	
  User	
  Analysis



                                                Viewing/recording	
  
                                                history

                                                Personalized	
  offers	
  
                                                       and	
  
                                                recommenda=ons



                                                  Analysis	
  for	
  
                                                    product	
                                               Tech	
  support	
  
                                                   roadmap                                                     portal
Friday, July 27, 2012
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  for	
  set-­‐top	
  boxes


                                                                                                                                        	
  	
  	
  	
  	
  Libraries
                                                                                                                                        Device	
  and	
  User	
  Analysis



                                                Viewing/recording	
  
                                                history

                                                Personalized	
  offers	
  
                                                       and	
  
                                                recommenda=ons



                                                  Analysis	
  for	
  
                                                    product	
                                               Tech	
  support	
  
                                                   roadmap                                                     portal
Friday, July 27, 2012
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :	
  for	
  set-­‐top	
  boxes


                                                                                                                                        	
  	
  	
  	
  	
  Libraries
                                                                                                                                        Device	
  and	
  User	
  Analysis



                                                Viewing/recording	
  
                                                history

                                                Personalized	
  offers	
  
                                                       and	
  
                                                recommenda=ons


                                                                                                                                                              Improve
                                                  Analysis	
  for	
  
                                                                                                                                                              d	
  reports	
  
                                                    product	
                                               Tech	
  support	
  
                                                                                                                                                              for	
  
                                                   roadmap                                                     portal
Friday, July 27, 2012                                                                                                                                         adver=se
The	
  future
         • More	
  personalizaDon
         • AdapDve	
  UIs	
  (self	
  arranging	
  
           dashboards)
         • Targeted	
  content,	
  ads
         • More	
  effecDve	
  customer	
  service


Friday, July 27, 2012
Conclusions
         • ApplicaDons	
  are	
  becoming	
  
           increasingly	
  
           user-­‐centric
         • Data	
  drives	
  this	
  capability,	
  but	
  
           harnessing	
  it	
  requires	
  a	
  new	
  
           distributed	
  architecture

Friday, July 27, 2012
www.wibidata.com	
  /	
  
                        Aaron	
  Kimball	
  –	
  aaron@wibidata.com




Friday, July 27, 2012
Friday, July 27, 2012

Weitere ähnliche Inhalte

Ähnlich wie Analyzing Large User Data with Hadoop and HBase

An Analytics Toolkit Tour
An Analytics Toolkit TourAn Analytics Toolkit Tour
An Analytics Toolkit TourRory Winston
 
THE TRILLION ROW SPREADSHEET(tm) from Structure:Data 2012
THE TRILLION ROW SPREADSHEET(tm) from Structure:Data 2012THE TRILLION ROW SPREADSHEET(tm) from Structure:Data 2012
THE TRILLION ROW SPREADSHEET(tm) from Structure:Data 2012Gigaom
 
SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:
SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:  SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:
SPONSORED WORKSHOP by Amplidata from Structure:Data 2012: Gigaom
 
BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...
BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...
BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...Gigaom
 
Lecture 4: Social Web Personalization (2012)
Lecture 4: Social Web Personalization (2012)Lecture 4: Social Web Personalization (2012)
Lecture 4: Social Web Personalization (2012)Lora Aroyo
 
Best Practices - Seeqnce - 23/24-02-2012
Best Practices - Seeqnce - 23/24-02-2012Best Practices - Seeqnce - 23/24-02-2012
Best Practices - Seeqnce - 23/24-02-2012Youssef Chaker
 
REALIZING REAL-TIME VALUE ON THE REAL-TIME WEB from Structure:Data 2012
REALIZING REAL-TIME VALUE ON THE REAL-TIME WEB from Structure:Data 2012REALIZING REAL-TIME VALUE ON THE REAL-TIME WEB from Structure:Data 2012
REALIZING REAL-TIME VALUE ON THE REAL-TIME WEB from Structure:Data 2012Gigaom
 
Alabfi em-20120624
Alabfi em-20120624Alabfi em-20120624
Alabfi em-20120624zepheiraorg
 
E assessment - ZAWF at Aberdeen Workshop
 E assessment - ZAWF at Aberdeen Workshop E assessment - ZAWF at Aberdeen Workshop
E assessment - ZAWF at Aberdeen WorkshopDigit Class
 
eXo Software Factory Overview
eXo Software Factory OvervieweXo Software Factory Overview
eXo Software Factory OverviewArnaud Héritier
 
Enhancing AT through ID Techniques
Enhancing AT through ID TechniquesEnhancing AT through ID Techniques
Enhancing AT through ID Techniquesnorthavorange
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMEGigaom
 
Drupal campmanila 2012 (Responsive Web in Drupal with Omega Theme)
Drupal campmanila 2012 (Responsive Web in Drupal with Omega Theme)Drupal campmanila 2012 (Responsive Web in Drupal with Omega Theme)
Drupal campmanila 2012 (Responsive Web in Drupal with Omega Theme)Rick. Bahague
 
Enhancing AT through ID techniques handouts
Enhancing AT through ID techniques handoutsEnhancing AT through ID techniques handouts
Enhancing AT through ID techniques handoutsnorthavorange
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project GuidanceVarad Meru
 
My fire st petersburg 27 june 2012 (d hladky)
My fire st petersburg 27 june 2012 (d hladky)My fire st petersburg 27 june 2012 (d hladky)
My fire st petersburg 27 june 2012 (d hladky)AI4BD GmbH
 
Guerrilla Usability Testing for Agile/Lean
Guerrilla Usability Testing for Agile/LeanGuerrilla Usability Testing for Agile/Lean
Guerrilla Usability Testing for Agile/LeanInteractive Agency
 
The state of drupal 8 - Drupalcamp Gent
The state of drupal 8  - Drupalcamp GentThe state of drupal 8  - Drupalcamp Gent
The state of drupal 8 - Drupalcamp Gentswentel
 
Choosing a backend for your mobile app? Don’t roll the dice!
Choosing a backend for your mobile app? Don’t roll the dice!Choosing a backend for your mobile app? Don’t roll the dice!
Choosing a backend for your mobile app? Don’t roll the dice!Codemotion
 
Cinemappy: a Context-aware Mobile App for Movie Recommendations boosted by DB...
Cinemappy: a Context-aware Mobile App for Movie Recommendations boosted by DB...Cinemappy: a Context-aware Mobile App for Movie Recommendations boosted by DB...
Cinemappy: a Context-aware Mobile App for Movie Recommendations boosted by DB...Vito Ostuni
 

Ähnlich wie Analyzing Large User Data with Hadoop and HBase (20)

An Analytics Toolkit Tour
An Analytics Toolkit TourAn Analytics Toolkit Tour
An Analytics Toolkit Tour
 
THE TRILLION ROW SPREADSHEET(tm) from Structure:Data 2012
THE TRILLION ROW SPREADSHEET(tm) from Structure:Data 2012THE TRILLION ROW SPREADSHEET(tm) from Structure:Data 2012
THE TRILLION ROW SPREADSHEET(tm) from Structure:Data 2012
 
SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:
SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:  SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:
SPONSORED WORKSHOP by Amplidata from Structure:Data 2012:
 
BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...
BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...
BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...
 
Lecture 4: Social Web Personalization (2012)
Lecture 4: Social Web Personalization (2012)Lecture 4: Social Web Personalization (2012)
Lecture 4: Social Web Personalization (2012)
 
Best Practices - Seeqnce - 23/24-02-2012
Best Practices - Seeqnce - 23/24-02-2012Best Practices - Seeqnce - 23/24-02-2012
Best Practices - Seeqnce - 23/24-02-2012
 
REALIZING REAL-TIME VALUE ON THE REAL-TIME WEB from Structure:Data 2012
REALIZING REAL-TIME VALUE ON THE REAL-TIME WEB from Structure:Data 2012REALIZING REAL-TIME VALUE ON THE REAL-TIME WEB from Structure:Data 2012
REALIZING REAL-TIME VALUE ON THE REAL-TIME WEB from Structure:Data 2012
 
Alabfi em-20120624
Alabfi em-20120624Alabfi em-20120624
Alabfi em-20120624
 
E assessment - ZAWF at Aberdeen Workshop
 E assessment - ZAWF at Aberdeen Workshop E assessment - ZAWF at Aberdeen Workshop
E assessment - ZAWF at Aberdeen Workshop
 
eXo Software Factory Overview
eXo Software Factory OvervieweXo Software Factory Overview
eXo Software Factory Overview
 
Enhancing AT through ID Techniques
Enhancing AT through ID TechniquesEnhancing AT through ID Techniques
Enhancing AT through ID Techniques
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
 
Drupal campmanila 2012 (Responsive Web in Drupal with Omega Theme)
Drupal campmanila 2012 (Responsive Web in Drupal with Omega Theme)Drupal campmanila 2012 (Responsive Web in Drupal with Omega Theme)
Drupal campmanila 2012 (Responsive Web in Drupal with Omega Theme)
 
Enhancing AT through ID techniques handouts
Enhancing AT through ID techniques handoutsEnhancing AT through ID techniques handouts
Enhancing AT through ID techniques handouts
 
Final Year Project Guidance
Final Year Project GuidanceFinal Year Project Guidance
Final Year Project Guidance
 
My fire st petersburg 27 june 2012 (d hladky)
My fire st petersburg 27 june 2012 (d hladky)My fire st petersburg 27 june 2012 (d hladky)
My fire st petersburg 27 june 2012 (d hladky)
 
Guerrilla Usability Testing for Agile/Lean
Guerrilla Usability Testing for Agile/LeanGuerrilla Usability Testing for Agile/Lean
Guerrilla Usability Testing for Agile/Lean
 
The state of drupal 8 - Drupalcamp Gent
The state of drupal 8  - Drupalcamp GentThe state of drupal 8  - Drupalcamp Gent
The state of drupal 8 - Drupalcamp Gent
 
Choosing a backend for your mobile app? Don’t roll the dice!
Choosing a backend for your mobile app? Don’t roll the dice!Choosing a backend for your mobile app? Don’t roll the dice!
Choosing a backend for your mobile app? Don’t roll the dice!
 
Cinemappy: a Context-aware Mobile App for Movie Recommendations boosted by DB...
Cinemappy: a Context-aware Mobile App for Movie Recommendations boosted by DB...Cinemappy: a Context-aware Mobile App for Movie Recommendations boosted by DB...
Cinemappy: a Context-aware Mobile App for Movie Recommendations boosted by DB...
 

Mehr von Gigaom

Structure 2014 - The strategic value of the cloud - Joe Weinman
Structure 2014 - The strategic value of the cloud - Joe WeinmanStructure 2014 - The strategic value of the cloud - Joe Weinman
Structure 2014 - The strategic value of the cloud - Joe WeinmanGigaom
 
Structure 2014 - The right and wrong way to scale - Rackspace
Structure 2014 - The right and wrong way to scale - RackspaceStructure 2014 - The right and wrong way to scale - Rackspace
Structure 2014 - The right and wrong way to scale - RackspaceGigaom
 
Structure 2014 - The future of cloud computing survey results
Structure 2014 - The future of cloud computing survey resultsStructure 2014 - The future of cloud computing survey results
Structure 2014 - The future of cloud computing survey resultsGigaom
 
Structure 2014 - Launchpad Competition
Structure 2014 - Launchpad CompetitionStructure 2014 - Launchpad Competition
Structure 2014 - Launchpad CompetitionGigaom
 
Structure 2014 - Disrupting the data center - Intel sponsor workshop
Structure 2014 - Disrupting the data center - Intel sponsor workshopStructure 2014 - Disrupting the data center - Intel sponsor workshop
Structure 2014 - Disrupting the data center - Intel sponsor workshopGigaom
 
Structure 2014 - Cloud trends - Battery
Structure 2014 - Cloud trends - BatteryStructure 2014 - Cloud trends - Battery
Structure 2014 - Cloud trends - BatteryGigaom
 
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...Gigaom
 
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...Gigaom
 
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit Bendov
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit BendovStructure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit Bendov
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit BendovGigaom
 
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...Gigaom
 
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA,
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA, Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA,
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA, Gigaom
 
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari GesherStructure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari GesherGigaom
 
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris Haddad
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris HaddadStructure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris Haddad
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris HaddadGigaom
 
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...Gigaom
 
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrath
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrathStructure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrath
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrathGigaom
 
Structure Data 2014: IS VIDEO BIG DATA?, Steve Russell
Structure Data 2014: IS VIDEO BIG DATA?, Steve RussellStructure Data 2014: IS VIDEO BIG DATA?, Steve Russell
Structure Data 2014: IS VIDEO BIG DATA?, Steve RussellGigaom
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom
 
How Data is Remaking E-commerce - from Roadmap 2013
How Data is Remaking E-commerce - from Roadmap 2013How Data is Remaking E-commerce - from Roadmap 2013
How Data is Remaking E-commerce - from Roadmap 2013Gigaom
 
25 Favorite Experiences in Tech - from Roadmap 2013
25 Favorite Experiences in Tech - from Roadmap 201325 Favorite Experiences in Tech - from Roadmap 2013
25 Favorite Experiences in Tech - from Roadmap 2013Gigaom
 
How Moore’s Law is Influencing Design - from Roadmap 2013
How Moore’s Law is Influencing Design - from Roadmap 2013How Moore’s Law is Influencing Design - from Roadmap 2013
How Moore’s Law is Influencing Design - from Roadmap 2013Gigaom
 

Mehr von Gigaom (20)

Structure 2014 - The strategic value of the cloud - Joe Weinman
Structure 2014 - The strategic value of the cloud - Joe WeinmanStructure 2014 - The strategic value of the cloud - Joe Weinman
Structure 2014 - The strategic value of the cloud - Joe Weinman
 
Structure 2014 - The right and wrong way to scale - Rackspace
Structure 2014 - The right and wrong way to scale - RackspaceStructure 2014 - The right and wrong way to scale - Rackspace
Structure 2014 - The right and wrong way to scale - Rackspace
 
Structure 2014 - The future of cloud computing survey results
Structure 2014 - The future of cloud computing survey resultsStructure 2014 - The future of cloud computing survey results
Structure 2014 - The future of cloud computing survey results
 
Structure 2014 - Launchpad Competition
Structure 2014 - Launchpad CompetitionStructure 2014 - Launchpad Competition
Structure 2014 - Launchpad Competition
 
Structure 2014 - Disrupting the data center - Intel sponsor workshop
Structure 2014 - Disrupting the data center - Intel sponsor workshopStructure 2014 - Disrupting the data center - Intel sponsor workshop
Structure 2014 - Disrupting the data center - Intel sponsor workshop
 
Structure 2014 - Cloud trends - Battery
Structure 2014 - Cloud trends - BatteryStructure 2014 - Cloud trends - Battery
Structure 2014 - Cloud trends - Battery
 
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...
Structure Data 2014: HOW MICRODATA CAN SAY A LOT ABOUT MACROECONOMICS, David ...
 
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...
Structure Data 2014: QLIK SPONSOR WORKSHOP: ANALYTICS THE WAY NATURE INTENDED...
 
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit Bendov
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit BendovStructure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit Bendov
Structure Data 2014: FIVE MYTHS ABOUT BIG DATA, Amit Bendov
 
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...
Structure Data 2014: AMID BILLIONS OF METRICS, YOUR SOFTWARE IS TRYING TO TEL...
 
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA,
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA, Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA,
Structure Data 2014: SISENSE SPONSOR WORKSHOP: ON BEER, CHIPS AND DATA,
 
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari GesherStructure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
Structure Data 2014: INVERTING 80/20: BEYOND BESPOKE BIG DATA, Ari Gesher
 
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris Haddad
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris HaddadStructure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris Haddad
Structure Data 2014: TRACKING A SOCCER GAME WITH BIG DATA, Chris Haddad
 
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...
Structure Data 2014: TECH AGAINST HUMAN TRAFFICKING AND ILLICIT NETWORKS, Jus...
 
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrath
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrathStructure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrath
Structure Data 2014: DATA DRIVEN DESIGN AT FORMULA ONE SPEED, Geoff McGrath
 
Structure Data 2014: IS VIDEO BIG DATA?, Steve Russell
Structure Data 2014: IS VIDEO BIG DATA?, Steve RussellStructure Data 2014: IS VIDEO BIG DATA?, Steve Russell
Structure Data 2014: IS VIDEO BIG DATA?, Steve Russell
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
How Data is Remaking E-commerce - from Roadmap 2013
How Data is Remaking E-commerce - from Roadmap 2013How Data is Remaking E-commerce - from Roadmap 2013
How Data is Remaking E-commerce - from Roadmap 2013
 
25 Favorite Experiences in Tech - from Roadmap 2013
25 Favorite Experiences in Tech - from Roadmap 201325 Favorite Experiences in Tech - from Roadmap 2013
25 Favorite Experiences in Tech - from Roadmap 2013
 
How Moore’s Law is Influencing Design - from Roadmap 2013
How Moore’s Law is Influencing Design - from Roadmap 2013How Moore’s Law is Influencing Design - from Roadmap 2013
How Moore’s Law is Influencing Design - from Roadmap 2013
 

Kürzlich hochgeladen

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Kürzlich hochgeladen (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Analyzing Large User Data with Hadoop and HBase

  • 1. ANALYZING LARGE-SCALE USER DATA SPEAKER: Aaron Kimball CTO WibiData Friday, July 27, 2012
  • 3. Analyzing  Large-­‐Scale  User  Data with  Hadoop  and  HBase Aaron  Kimball  –  CTO WibiData,   Friday, July 27, 2012
  • 4. We  can  now  collect   more  data  than  at   any  Dme  in  history. Friday, July 27, 2012
  • 5. Yesterday’s  engineering   challenge:  FiJng  the   problem  into  the   hardware. Friday, July 27, 2012
  • 6. Today’s  constrained   resource  is   understanding. Friday, July 27, 2012
  • 7. How  do  we  best  apply   data …to  beMer  serving  our   Friday, July 27, 2012
  • 8. The  best  products  are  user-­‐ • IntuiDve  UI • ConDnuously  learning – Guided  search – Smarter  recommenda1ons • More  effec1ve  service Friday, July 27, 2012
  • 9. What  are  we  building   Friday, July 27, 2012
  • 10. What  are  we  building   Friday, July 27, 2012
  • 11. What  are  we  building   Friday, July 27, 2012
  • 12. What  are  we  building   Friday, July 27, 2012
  • 13. What  are  we  building   Friday, July 27, 2012
  • 14. What  are  we  building   Friday, July 27, 2012
  • 15. What  are  we  building   Friday, July 27, 2012
  • 16. What  are  we  building   Friday, July 27, 2012
  • 17. What  are  we  building   Friday, July 27, 2012
  • 18. What  are  we  building   Friday, July 27, 2012
  • 19. Requirements 1.  Understand  the  user   populaDon Friday, July 27, 2012
  • 20. Requirements 2.  Respond  to   users  in  real   Dme Friday, July 27, 2012
  • 21. Requirements 3.  Support  graceful  data   evoluDon Friday, July 27, 2012
  • 22. Large-­‐scale  data  science  is   • What  does  a  user  look  like? – What  data  is  available  about  the  user? – Which  features  are  important? – Which  features  are  correlated? • How  do  I  model  this  in  MapReduce? • How  do  I  serve  results  in  a  Dmely   Friday, July 27, 2012
  • 24. Tools  of  the  trade • Store  all  data  about  a   user  in  one  place • Support  real-­‐Dme   get/put,  as  well  as   MapReduce Friday, July 27, 2012
  • 25. Tools  of  the  trade • Use  complex  data   types  to  model   complex  data • Support  extended   data  models  over   Dme Friday, July 27, 2012
  • 26. Tools  of  the  trade • Abstract  computaDonal   model  away  from   MapReduce • Support  computaDon   over  all  users…  or  one   user  at  a  Dme Friday, July 27, 2012
  • 34.                                                      :  for  set-­‐top  boxes Viewing/recording   history Friday, July 27, 2012
  • 35.                                                      :  for  set-­‐top  boxes Viewing/recording   history Friday, July 27, 2012
  • 36.                                                      :  for  set-­‐top  boxes          Libraries Device  and  User  Analysis Viewing/recording   history Personalized  offers   and   recommenda=ons Friday, July 27, 2012
  • 37.                                                      :  for  set-­‐top  boxes          Libraries Device  and  User  Analysis Viewing/recording   history Personalized  offers   and   recommenda=ons Friday, July 27, 2012
  • 38.                                                      :  for  set-­‐top  boxes          Libraries Device  and  User  Analysis Viewing/recording   history Personalized  offers   and   recommenda=ons Analysis  for   product   roadmap Friday, July 27, 2012
  • 39.                                                      :  for  set-­‐top  boxes          Libraries Device  and  User  Analysis Viewing/recording   history Personalized  offers   and   recommenda=ons Analysis  for   product   roadmap Friday, July 27, 2012
  • 40.                                                      :  for  set-­‐top  boxes          Libraries Device  and  User  Analysis Viewing/recording   history Personalized  offers   and   recommenda=ons Analysis  for   product   Tech  support   roadmap portal Friday, July 27, 2012
  • 41.                                                      :  for  set-­‐top  boxes          Libraries Device  and  User  Analysis Viewing/recording   history Personalized  offers   and   recommenda=ons Analysis  for   product   Tech  support   roadmap portal Friday, July 27, 2012
  • 42.                                                      :  for  set-­‐top  boxes          Libraries Device  and  User  Analysis Viewing/recording   history Personalized  offers   and   recommenda=ons Improve Analysis  for   d  reports   product   Tech  support   for   roadmap portal Friday, July 27, 2012 adver=se
  • 43. The  future • More  personalizaDon • AdapDve  UIs  (self  arranging   dashboards) • Targeted  content,  ads • More  effecDve  customer  service Friday, July 27, 2012
  • 44. Conclusions • ApplicaDons  are  becoming   increasingly   user-­‐centric • Data  drives  this  capability,  but   harnessing  it  requires  a  new   distributed  architecture Friday, July 27, 2012
  • 45. www.wibidata.com  /   Aaron  Kimball  –  aaron@wibidata.com Friday, July 27, 2012