Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

from_physics_to_data_science

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 27 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie from_physics_to_data_science (20)

Anzeige

from_physics_to_data_science

  1. 1. FROM PHYSICS TO DATA SCIENCE Martina Pugliese 17 December 2015 Scotland Data Science & Technology
  2. 2. An outline of what we will discuss THE PARTS ABOUT ME, MY JOB, MY BACKGROUND WHAT (I LEARNED) IT MEANS TO DO DATA SCIENCE WHAT IS DATA SCIENCE AND ITS (AMBIGUOUS) RELATIONSHIP TO RESEARCH
  3. 3. WHO AM I? Why am I here? What do I want?
  4. 4. THE BORING BACKGROUND ➤ I did a Bachelor’s degree in Physics I thought I wanted to do particle physics ➤ Then I did a Master’s degree in Physics (Statistical Mechanics) I’ve studied the evolution of Influenza virus 0 2 4 6 8 10 12 14 16 18 20 10−3 10−2 10−1 1 10 0 1 2 3 4 5 6 7 S E 0 2 4 6 8 10 12 14 16 18 20 10−3 10−2 10−1 1 10 0 1 2 3 4 5 6 7 S E βM pM0.55 S 0 1 2 3 4 5 6 7 βM pM0.55 S 0 1 2 3 4 5 6 7 Numerical model (using a genetic algorithm) simulating how the pathogen creates new variants
  5. 5. THE BORING BACKGROUND ➤ Then I did a PhD in Physics I’ve explored how Natural Language evolves in time 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum burn 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum dwell 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum hide 0 0.2 0.4 0.6 0.8 1 10−5 10−4 10−3 10−2 10−1 I fsum sing verbs changing inflection in time hide became irregular sing stayed irregular burn stayed regular dwell oscillates Data Mining & Simulations
  6. 6. THE BORING BACKGROUND ➤ I wanted a job in the industry, as a Data Scientist, so … I’ve done a bootcamp in London, S2DS, working on a commercial DS problem [1] Physics gave me: the ability to model reality (mathematically) a brain trained to deal with data ideas about lots of more things to study the scientific method to carry out experiments
  7. 7. DATA SCIENCE Trend or Hype? what do you mean by “science”?
  8. 8. “The key word in “Data Science” is not Data, it is Science. -Jeff Leek
  9. 9. DATA SCIENCE: A BABY COME OF AGE? NGram Viewer data There’s lots of talk these days on several buzzwords containing “data” But the science of extracting information out of raw data is much older than some think
  10. 10. A WEE BIT OF HISTORY ➤ The ‘60s: Data Analysis bashfully starts branching out of Statistics as an empirical science [1] ➤ The ‘70s: Establishing the idea of converting data into knowledge ➤ The ‘80s: G. Piatetsky-Shapiro founds the KDD (Knowledge Discovery in Databases) conferences ➤ The ‘90s: companies have lots of data on customers! The term Data Science is first used in a conference name [2] ➤ the 2000s: Academic endeavours to define the field [3] Statistical models (the “irrelevant theory”) vs. Algorithms ➤ the 2010s: the BOOM! The “sexiest job of the 21st century” [4] Big Data is the new innovation [5] Growth in “analytics” and Data Science educational programs [6] Data Science in Business should be called “Decision Science” [7]
  11. 11. But today, this is what’s happening: [Intel, What happens in an Internet Minute? 2012]
  12. 12. So there came the need to have (many more) specialised people, in the industry, to understand this dirty, variegated, large data and leverage it to provide solutions The data we agree to give to services we use (social networks, apps …) is used to sell us tailored experiences There is a saying in Italian which goes (translated) as: “I know you as my pockets” It should now become something like “I know you as your phone” Where to get all these people from? DS academic programs Research on the rise ???
  13. 13. PhD? No thanks…or maybe Yes
  14. 14. The ugly fact: research has no room for all PhD graduates Growth of PhD graduates in S&E fields in time vs. growth of research positions [8] The academic bottleneck is in the after the PhD PhDs do not have real “transferable” skills (The Economist, [8])
  15. 15. Is this a reason alone to transfer a PhD to the industry? NO A PhD is an academic qualification It is meant to train people for research And for the new challenges ahead, we need lots of scientists to study new solutions climate change ageing of population sustainable energy sources the human brain data science algorithms … Does it mean access to PhD programs should change? MAYBE
  16. 16. Can we suggest Academia and industry should cooperate more? CERTAINLY Google cooperates (and hires from) Academia a lot They’re shaping the innovation landscape Considering them as separate worlds does not help They’re contributing to “traditional” academic research (Quantum Annealing, [9]) They’re pushing the current borders of AI (deep learning, anyone?)
  17. 17. THE (OBVIOUS) DISADVANTAGES OF A PHD GRADUATE ➤ The “overqualified and unexperienced” curse ➤ Research trains you to sustain and cope with failure ➤ You know how to quickly learn new stuff alone ➤ You have a long history of communicating your findings THE (NOT-SO-OBVIOUS) ADVANTAGES OF A PHD GRADUATE I’d argue this is the best skill to have today ➤ The “age” and “expectations” problems www.phdcomics.com
  18. 18. THE STUFF YOU (DEFINITELY) NEED hints on where to find it
  19. 19. I believe the main and most important skill one needs in this role is that of being able to learn quickly and having the passion for doing so
  20. 20. BUT PRACTICALLY SPEAKING… ➤ Mathematics & Statistics foundations This is the brain training you need to understand it all. I won’t list all the needed stuff because it wouldn't make sense, but in short…: Linear Algebra (matrices operations) Probability Theory, the concepts Graph Theory, the concepts Be proficient with Calculus and Mathematical Methods Statistical Tests and Techniques … ➤ Machine Learning You need to be able to understand an algorithm on pen and paper, otherwise it’s just pushing a button on a ML library. With practice you learn which to choose for what and how to assess its performance. As for libraries, it depends, but scikit-learn is great and very well documented, including the Maths behind algorithms so it’s a great resource.
  21. 21. BUT PRACTICALLY SPEAKING… ➤ Programming It’s essential code quickly and product reusable, robust scripts. I have a thing for Python. I also use R sometimes for stats analyses. Shell commands proficiency helps a lot to save time Numerical simulations: something like C++ is very useful Basics of web development and of the software development process ➤ Data visualisation tools Visualisations help you and others around you understand information I use Python libraries for simple things, but the beauty of D3 is unbeatable ➤ Big Data Technologies This is the bit about which there’s lots of talk these days. Analytical skills also means you learn the Technologies (Hadoop/Spark/Mahout…) with practice.
  22. 22. RIGHT, BUT WHAT EXACTLY DO YOU DO? tell me about your job!
  23. 23. Mallzee is the fashion app for everyone You swipe product right (like) or left (dislike) You can create your own style feeds You can search for specific products and favourite brands You can buy products We have millions of “swipes” plus user data
  24. 24. WHAT I DO IN MY JOB Follow the DS mantra: Exploratory Analyses Model Data pre-processing Product Insights Model Validation takes long time…[8] produce visualisations produce software
  25. 25. THE ROLE CONSISTS OF SEVERAL THINGS Understand user behaviour in all parts of the app Predict what drives retention/usage Analyse numerical data on swipes to see what’s hot this season Improve product with tailored-to-you features Computer Vision to see what images features perform best for what sorts and whom Measure all indicators across the business Recommendations
  26. 26. THE REFERENCES ➤ [1] Something I wrote for S2DS ➤ [1] Tukey, The Future of Data Analysis ➤ [2] Data Science, Classification and related methods, Kobe, Japan, 1996 ➤ [3] Leo Breiman, Statistical Modeling, the Two Cultures ➤ [4] HBR, Data Scientist: The Sexiest Job of the 21st Century ➤ [5] McKinsey, Big Data, the next frontier for innovation ➤ [6] KDNuggets, the boom in analytics education ➤ [7] TechCrunch, Why Decision Science matters ➤ [8] Nature Biotechnology, The missing piece to changing the university culture ➤ [8] The Economist, the disposable academic ➤ [9] What is the computational value of finite range tunnelling? ➤ [8] NY Times, the "Janitor work" is key hurdle for insight ➤ [8] M. Loudikes, What is Data Science? ➤ [9] The Edison European Project
  27. 27. Thanks! … and a special thanks to W. Kandinsky

×