Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
WINNING<br />WITH<br />BIG <br />DATA<br />Secrets of the Successful<br />Data Scientist<br />Making Data Work<br />June 9...
WHY DATA<br />MATTERS<br />
THE INDUSTRIAL<br />AGE <br />OF <br />DATA<br />
WHAT IS <br />BIG DATA?<br />Data that is distributed.<br />
WHAT IS<br />DATA <br />SCIENCE?<br />
NINE WAYS <br />TO WIN<br />
1.  CHOOSE THE<br />RIGHT TOOL<br />You don’t need a chainsaw to cut butter.<br />
2. COMPRESS  EVERYTHING<br />mysqldump -u myuser -p mypasssourceDB| <br />gzip| sshmike@dataspora.com "cat - | <br />gunzi...
3. SPLIT UP<br />YOUR DATA<br />Split, apply, combine.<br />See  Hadley Wickham’s paper at http://had.co.nz/plyr/plyr-intr...
4. WORK <br />WITH SAMPLES<br />perl -ne "print if (rand() < 0.01)"  <br /> data.csv > sample.csv<br />Big Data is heavy, ...
5.  USE<br />STATISTICS<br />
COPY<br />FROM OTHERS<br />git clone git://github.com/kevinweil/hadoop-lzo<br />Use open source.<br />
7. ESCAPE<br />CHART TYPOLOGIES<br />Charts are compositions,<br />not containers.<br />
8. USE COLOR<br />WISELY<br />Color can enhance <br />or insult.<br />
9. TELL A STORY<br />People are listening.<br />
ONE <br />SUCCESS<br />STORY<br />
WHY DO TELCO CUSTOMERS LEAVE?<br />Sign up<br />Leave<br />Goal:  “less churn.”<br />
DATA:<br />BILLIONS<br />OF CALLS<br />… and millions of callers.<br />
DOES CALL <br />QUALITY<br />MATTER?<br />… a difference,<br />but not significant.<br />
WHAT ABOUT<br />SOCIAL<br />NETWORKS?<br />Hmmm...<br />
BUILD THE <br />CALL GRAPH<br />… but is it predictive?<br />
EVOLUTION OF A CALL GRAPH<br />April<br />
EVOLUTION OF A CALL GRAPH<br />May<br />
EVOLUTION OF A CALL GRAPH<br />June<br />
EVOLUTION OF A CALL GRAPH<br />July<br />
700% INCREASE<br />IN CHURN<br />when a cancellation<br />occurs in a call network.<br />
THANKS!<br />QUESTIONS?<br />Michael Driscoll<br />twitter @dataspora<br />http://www.dataspora.com/blog<br />Making Data ...
Nächste SlideShare
Wird geladen in …5
×

Winning with Big Data: Secrets of the Successful Data Scientist

9.892 Aufrufe

Veröffentlicht am

A new class of professionals, called data scientists, have emerged to address the Big Data revolution. In this talk, I discuss nine skills for munging, modeling, and visualizing Big Data. Then I present a case study of using these skills: the analysis of billions of call records to predict customer churn at a North American telecom.

http://en.oreilly.com/datascience/public/schedule/detail/15316

Veröffentlicht in: Technologie
  • Analysis of telecom using data to predict/stop churn
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Winning with Big Data: Secrets of the Successful Data Scientist

  1. 1. WINNING<br />WITH<br />BIG <br />DATA<br />Secrets of the Successful<br />Data Scientist<br />Making Data Work<br />June 9, 2010<br />Michael Driscoll<br />@dataspora<br />
  2. 2. WHY DATA<br />MATTERS<br />
  3. 3. THE INDUSTRIAL<br />AGE <br />OF <br />DATA<br />
  4. 4. WHAT IS <br />BIG DATA?<br />Data that is distributed.<br />
  5. 5. WHAT IS<br />DATA <br />SCIENCE?<br />
  6. 6. NINE WAYS <br />TO WIN<br />
  7. 7. 1. CHOOSE THE<br />RIGHT TOOL<br />You don’t need a chainsaw to cut butter.<br />
  8. 8. 2. COMPRESS EVERYTHING<br />mysqldump -u myuser -p mypasssourceDB| <br />gzip| sshmike@dataspora.com "cat - | <br />gunzip | mysql-u myuser -p mypasstargetDB"<br />The world is IO-bound.<br />
  9. 9. 3. SPLIT UP<br />YOUR DATA<br />Split, apply, combine.<br />See Hadley Wickham’s paper at http://had.co.nz/plyr/plyr-intro-090510.pdf<br />
  10. 10. 4. WORK <br />WITH SAMPLES<br />perl -ne "print if (rand() < 0.01)" <br /> data.csv > sample.csv<br />Big Data is heavy, <br />samples are light.<br />
  11. 11. 5. USE<br />STATISTICS<br />
  12. 12. COPY<br />FROM OTHERS<br />git clone git://github.com/kevinweil/hadoop-lzo<br />Use open source.<br />
  13. 13. 7. ESCAPE<br />CHART TYPOLOGIES<br />Charts are compositions,<br />not containers.<br />
  14. 14. 8. USE COLOR<br />WISELY<br />Color can enhance <br />or insult.<br />
  15. 15. 9. TELL A STORY<br />People are listening.<br />
  16. 16. ONE <br />SUCCESS<br />STORY<br />
  17. 17. WHY DO TELCO CUSTOMERS LEAVE?<br />Sign up<br />Leave<br />Goal: “less churn.”<br />
  18. 18. DATA:<br />BILLIONS<br />OF CALLS<br />… and millions of callers.<br />
  19. 19. DOES CALL <br />QUALITY<br />MATTER?<br />… a difference,<br />but not significant.<br />
  20. 20. WHAT ABOUT<br />SOCIAL<br />NETWORKS?<br />Hmmm...<br />
  21. 21. BUILD THE <br />CALL GRAPH<br />… but is it predictive?<br />
  22. 22. EVOLUTION OF A CALL GRAPH<br />April<br />
  23. 23. EVOLUTION OF A CALL GRAPH<br />May<br />
  24. 24. EVOLUTION OF A CALL GRAPH<br />June<br />
  25. 25. EVOLUTION OF A CALL GRAPH<br />July<br />
  26. 26. 700% INCREASE<br />IN CHURN<br />when a cancellation<br />occurs in a call network.<br />
  27. 27. THANKS!<br />QUESTIONS?<br />Michael Driscoll<br />twitter @dataspora<br />http://www.dataspora.com/blog<br />Making Data Work<br />June 9, 2010<br />

×