Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Data Science and Culture

1.637 Aufrufe

Veröffentlicht am

Hiring data scientists and deploying Hadoop is not enough. Your company needs a data driven culture, based on values such as honesty, democracy, creativity and strategy. Your company also needs good data engineering and good experimentation practices.

Veröffentlicht in: Daten & Analysen

Data Science and Culture

  1. 1. Data Science & Culture (Or how to stop worrying and love data driven culture) Ícaro Medeiros Data Science Forum São Paulo, Jun 2017
  2. 2. Inspired by (not limited to) refs
  3. 3. Big Data http://www.kdnuggets.com/2017/02/origins-big-data.html ✦ Fundamental blocks: evolutions on CS e.g. distributed systems, databases, massive AI, etc ✦ Fuzzy concept, ill-defined ✦ Popularized by Gartner
 (hype-fueled consulting firm)
  4. 4. ✦ Big Data no longer considered an emerging technology (pervasive in industry) ✦ Entered Trough of Disillusionment in 2013 https://knowledgeimmersion.wordpress.com/2016/06/22/disillusionment-of-big-data/
  5. 5. http://www.mikelnino.com/2016/03/chronology-big-data.html Chronology of antecedents
  6. 6. Data science ✦ Statistics (late 19th century) ✦ Computer Science (1950s) ✦ Machine Learning (1950s) ✦ Data Mining (1990s) ✦ Data Science (2010s) https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century yet another hyped term
  7. 7. Beware: controversy ✦ Data science is not all-science ✴ It’s getting more and more engineering-like, a practice ✴ Data storytelling is a creative endeavor ✦ Hyper-inflated expectations, misunderstood concepts and hurry to get value: a dangerous recipe
  8. 8. A new hope machine learning big data https://trends.google.com/trends/explore?date=today%2012-m&geo=US&q=machine%20learning,big%20data or hype
  9. 9. Hype: not that bad ✦ Haters gonna hate i.e. don’t fully hate the hype ✴ more practitioners = faster tech and processes evolution ✴ Highly skilled professionals and innovation ✦ Academics sometimes look for difficult unwanted problems ✴ industry is more pragmatic, specially in tech https://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science
  10. 10. What we need… ✦ Forget about Big Data pokémons ✴ OH so in Big Data we don’t need people to think schemas? ✦ Forget about misunderstood business expectations ✴ OH in deep learning we don’t need people to train models? ✦ You need PEOPLE ✴ Collaborating with shared values ✴ Awesome in tech but more importantly: CREATIVE
  11. 11. Shared values and practices Culture
  12. 12. Good people ✦ People are more important than ideas ✴ A mediocre team will screw up a good idea ✴ Mediocre idea to great team: they will fix it or rethink it ✦ A good lab: different kinds of autonomous thinkers ✴ Why hire smart people if they can't fix what’s broken? ✦ Prefer a heterogeneous and complimentary team instead of looking for unicorns
  13. 13. The mythical 10x professional https://twitter.com/icaromedeiros/status/838968884023668737
  14. 14. Good communication ✦ Honesty, excellence, originality and self- criticism (values) ✦ Communication structure <> organizational ✦ Be ready to hear the truth ✴ Sincerity is only valuable if people are open and willing to give up on ideas that will not work ✦ Braintrust: Leave ego and Jobs outside the door
  15. 15. Power to the people! ✦ Product quality is everyone’s responsibility ✴ Don’t ask permission to take responsibility ✦ Passion and excellence versus autonomy ✦ Good things might shadow the bad ✴ People struggle to explore bad things to avoid being called “complainers”
  16. 16. Rebels http://qaspire.com/2017/05/19/sketchnote-what-rebels-want-from-their-boss/
  17. 17. Destroy data silos! ✦ Without information about data there is no science ✦ Software and data should be a collective property within the company ✦ Knowledge management matter ✦ Communication between areas must be enforced
  18. 18. Data portals ✦ Self-service platforms to publish datasets ✴ Descriptions, schemas, samples, relations between datasets, etc ✦ Open Data initiatives, mostly governments ✦ OSS platforms: CKAN, AirBNB’s Dataportal ✦ Examples: data.gov.uk, dados.gov.br, etc
  19. 19. “When it comes to creative inspiration, job titles and hierarchy are meaningless”
  20. 20. Data storytelling ✦ Explain what numbers tell in layman, clear terms ✦ Make hidden premises clear ✴ Outside data insights ✦ Convince others about actions ✴ Decreases insights-to-value interval ✦ From data to knowledge https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs
  21. 21. What is creativity ✦ Unexpected connections of concepts and ideas ✦ It's a marathon, it needs rhythm ✦ Creativity must start somewhere and there’s power on healthy feedback in a iterative process
  22. 22. Visual communication ✦ Clean straightforward graphs > visually appealing ✴ Choose dataviz libs wisely ✦ “Don’t make me think” ✦ The right graph for the right audience ✴ Prefer a language everyone understands
  23. 23. Visual communication 101
  24. 24. Stats are not enough https://www.autodeskresearch.com/publications/samestats
  25. 25. Stats are not enough https://www.autodeskresearch.com/publications/samestats
  26. 26. Strateg a
  27. 27. Avoid egotrip data science ✦ “OH my cluster has 10 Petabytes, I’m awesome” ✦ Fancy ML algorithms are not the goal ✦ The most important V in Big Data is value https://twitter.com/amyhoy/status/847097034536554497
  28. 28. KPI versus HiPPO ✦ Tech adoption per se is meaningless ✴ Slide-driven Big Data ✴ KPIs should grow from Big Data and data insights initatives ✦ Poor defined goals -> bad decisions ✦ Define viable but ambitious goals ✦ Data beats opinion
  29. 29. Set goal, plan and GO! ✦ Business questions can't be like “OH we want to detect things related to millennials” ✦ Clear goals must be set, with actionable metrics ✦ Balance perfect models versus time-to-market ✦ Brad Bird: “Sometimes, as a director, you’re guiding. Sometimes you’re letting the car drive” https://hbr.org/2017/02/how-chief-data-officers-can-get-their-companies-to-collect-clean-data
  30. 30. The process ✦ The process is not the goal ✴ It has no agenda or taste, it’s just a tool ✦ Quality is the best business plan ✦ Agile is a mindset: not only kanbans or scrum ✦ If the model will become operational, mix scientists and engineers from start
  31. 31. Build vs Buy ✦ If you buy and your core business is not techie, you can be illiterate in tech ✴ Benchmark before buying ✴ Accelerate results and boost internal knowledge ✦ If you build and have a good-enough techie culture, you’re more or less good to go ✴ Assess pros and cons consciously ✦ If you surf the tech hype AND build good systems you’re awesome
  32. 32. https://twitter.com/Doug_Laney/status/847452219641356288 When data goes to vendors…
  33. 33. http://www.louisdorard.com/machine-learning-canvas/
  35. 35. Big Data vs Great Data ✦ If your logical models do not make sense ✦ Most performed queries are slow ✦ If you have string-only databases ✦ If you have unused expensive data ✦ Maybe your data lake is a swamp
  36. 36. “The data is a mess” ✦ First step: accelerate human understanding of data ✴ Metadata, context, hidden assumptions ✦ Datasets might serve multiple purposes ✴ Define rationale and context ✴ Data portals and understandable datasets > Dashboards https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science https://medium.com/airbnb-engineering/democratizing-data-at-airbnb-852d76c51770
  37. 37. Data lost in translation ✦ Heterogeneous and siloed databases (and people) ✦ Rethink ESB (microservices network) ✦ State-of-the-art: data workflow ✴ Luigi, Airflow (open source), almost every big tech vendor ✴ Transparency, reusability, reproducibility, traceability ✴ Automation and monitoring all the way! https://hbr.org/2016/12/why-youre-not-getting-value-from-your-data-science
  38. 38. Beyond relational models ✦ Not all data problems fits well in traditional SQL or DW models ✴ Key-value, columnar, graph-based, inverted index, etc ✦ Models are a framework for problem-solving ✴ Not the ultimate answer ✴ There’s no one-size-fits-all model
  39. 39. Do not forget fluency ✦ Check the company lingua franca ✦ Make it easy for critical decision-makers ✴ Adhoc SQL queries? ✴ Dashboards? ✴ Reports?
  41. 41. Experiments ✦ Missions to discover facts towards understanding ✴ They don’t fail, any result produces new information ✴ If the initial theory was wrong: good ✴ With new facts you can reformulate the question ✦ Get more modeling questions asked more often ✦ Iterative data science
  42. 42. Product experimentation (A/B) ✦ Product experimentation should be hypothesis- driven (not feature-driven) ✦ Define the proper exposed population ✴ No new users, no heavy users only, no early adopters ✦ Understanding effect is essential https://medium.com/airbnb-engineering/4-principles-for-making-experimentation-count-7a5f1a5268a
  43. 43. 5 stages of A/B tests https://www.linkedin.com/pulse/ab-testing-which-do-i-pick-sahar-heidari
  44. 44. Some other quick tips ✦ Focus on outcomes (not algorithms or methods) ✦ Design the right metric and evaluation ✦ Good experiments don't produce obvious insights ✦ Mix of data and intuition https://twitter.com/mrdatascience/status/869957499662860288
  45. 45. Being data driven ✦ Be BAYESIAN - uncertainty is everywhere ✦ Be CURIOUS - keep learning ✦ Be AGILE - Fail fast, not too fast: evidence comes first https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
  46. 46. Being data driven ✦ Be TRUTHFUL - don’t torture data to please opinions ✦ Be HELPFUL - work across silos, support democracy ✦ Be WISE - know when to be analytical or intuitive https://www.reaktor.com/blog/culture-eats-data-science-for-breakfast/
  47. 47. With the right people, Democracy, Creativity, Strategy, Big Great Data™ and Experiments there's a good chance to do great SCIENCE Take-away message
  48. 48. Ícaro Medeiros Data Scientist icaromedeiros