Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Bayesian reasoning

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 28 Anzeige

Bayesian reasoning

Herunterladen, um offline zu lesen

1. Introduction and how to get into Data
2. Data Engineering and skills needed
3. Comparison of Data Analytics for statistic and real time streaming data
4. Bayesian Reasoning for Data

1. Introduction and how to get into Data
2. Data Engineering and skills needed
3. Comparison of Data Analytics for statistic and real time streaming data
4. Bayesian Reasoning for Data

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Bayesian reasoning (20)

Anzeige

Aktuellste (20)

Bayesian reasoning

  1. 1. “DATA IN THE WILD” – BEGINNER STEPS INTO DATA MARTA FAJLHAUER, GSTATS, BSC DATA ANALYST AT BRIGHTBLUE CONSULTING, PROFESSIONAL FELLOW OR ROYAL STATISTICAL SOCIETY POSTGRADUATE STUDENT AT QUEEN MARY UNIVERSITY OF LONDON
  2. 2. What I learned from analysing 250 profiles of my LinkedIn connections working in Data Science? What I learned during my work in Data Engineering What I learn when I work in Data Analytics. Bayesian reasoning for social media  curiosity, understanding, asking questions, looking for answers on business and personal questions.
  3. 3. I want to work in Data Science (£75,000 - £100,000) Procurement / IT Service Desk / Threat Intel Librarian / Audit / PMO / Corporate / Business System / Business / Technical / Analyst Data / Analytics Consultant Analytics and Business Intelligence Analytical storyteller AI and Advanced Analytics Econometrician Statistician Mathematician Software / Cloud / Mathematical / Data / Linux Operation / System / Service / Marketing / Backend / Blockchain / Splunk / Oracle / Machine Learning / AI Engineer Data and Software / System / Enterprise / Data Solution / Cloud Architect Lead Software crafter Software / Full Stack / Software developer Cloud / AI / Computer Vision / Machine Learning Consultant Applied Machine Learning Scientists Deep learning specialist Enterprise data strategy Machine Learning / AI / Robotics / Researcher Big Data Developer Oracle DBA DevOps -> Machine Learning -> R -> Python -> Deep Learning -> NLP -> AI -> Advanced Statistics
  4. 4. 241 profiless 86 data Scientists (27 PhD and 13 BSc) 64 Data Analysts (1 PhD and 35 BSc) 64 Engineers
  5. 5.  Computer Science or Mathematics background.  Others in every single category  Mathematics for Data Analytics and Computer Science for Data Engineering Data Scientists
  6. 6. less than 20% computer science 60% degree in computer science But…. Lead Software Crafter: BSc Health science DevOps: BSc Applied linguistics Marketing Engineer: English literature Senior Analytics Consultant: BSc Music Software Engineer: Public relations Data Engineer: Anthropology Data manager: BSc Arts Cloud Consultant: Advanced Aeronautical Engineering Data Engineer: Public Health
  7. 7. You need to choose what you want to expertise at: They are called doctors but does it mean that one can perform work of another? Does it mean that one is more important than another? No. It means that one decided to concentrate on a specific thing after exploration stage. EBOV virus for charity helping people in Africa. Crime Data mining using USA census data
  8. 8. DATA ENGINEERING – FIRST JOB: “DATA SOMETHING”
  9. 9. IT Ops and Security Machine data Real time visibility Forwarding data in real time. Collect and visualise Forward data in real time to indexes Scales from single server to distributed deployment Accepts any text data as input, parses the data into events, stores events in indexes, searches and reports
  10. 10.  Writing configuration files <TCP / UDP, SSL, HEC>  Set up receiving ports on indexers, add inputs to forwarders  Compress feed to save money for data pre-processing from Hadoop Clusters  Lesson 0: where is the coffee machine  Lesson 1: Not many girls in the Data Engineering work: The only girl, the only non-technical.  Lesson 2: Stack Overflow and Google is my best friend.  Lesson 3: How to set up Splunk image on Docker container  Lesson 4: setting up distributed, global deployment – very important to set up proper time and time zone to correlate across multiple sources, set up alerts in case of anomalies  Lesson 5: Encryption data and different levels of access are very important in finance – REGEX, Bush, Linux  Dashboard and automatic pivots using Splunk Programming Language.
  11. 11. No time to carefully check all details the analytics of this kind of data is completely different than for static data. In static data .csv you can check if you have missing data or not, you can visualise all details and understand the data but in real time rolling data it’s completely different. You have already set up dashboards to concentrate on the most important bits. In Splunk ,you can set up an alert When you deal with this kind of data you don’t concentrate on Statistics behind it only choose an algorithm from a selection that you think will the best meet conditions. With static data you think about R^2, coefficients and so much more.
  12. 12. read code written by someone else modify the elements for your own purpose Write your own code  There are languages like R when sometimes much more efficient is to use package already in the system.  When you set up a loop on millions of data first check if your loops give the expected output and run smoothly on a smaller data. Once you check that remember to add loop counter so you can track progress and set up automatic saving of the output.
  13. 13. DATA ANALYTICS Algorithms, R&D, statistical thinking
  14. 14.  Lesson 1: relying completely on statistical knowledge without thinking if correlation does imply causation. (not only regression)  Whatever you can plot it to visualise the data  R, Python, Excel, SAS whatever works for the given purpose – you choose.  Different models for different kind of data  In smaller datasets, static data you may have much bigger fun from an analytics point of view rather than with rolling in real time data coming from different sources.
  15. 15. Bayesian Reasoning for Social Data Sherlock Holmes and Watson
  16. 16.  It’s July, and mostly sunny <- prior. Predict: mostly sunny  Someone carry an umbrella <- likelihood Predict: rainy  What if this is country where you carry umbrella during hot days? What if you carry umbrella only when it’s raining?  Update belief <- posterior
  17. 17. If an absent-minded professor takes his umbrella into a classroom, there's a probability of 1/4 that he'll absent-mindedly leave it there. One day, he sets off with his umbrella, teaches in three classrooms, and comes back to his office... without his umbrella. What's the probability he left the umbrella? 16/ 64 12/ 64 16/16+12+9 ~ 43% P(left in the first classroom, given that he left it somewhere) = P(left it in the classroom and he left it somewhere) / P(he left it somewhere) = (1/4)/((1−27/64))
  18. 18. 𝑃 𝑇𝑟𝑢𝑡ℎ 𝐷𝑎𝑡𝑎 = 𝑃(𝐷𝑎𝑡𝑎|𝑇𝑟𝑢𝑡ℎ)𝑃(𝑇𝑟𝑢𝑡ℎ) 𝑃(𝐷𝑎𝑡𝑎) 𝑃 𝑇𝑟𝑢𝑡ℎ = 𝑡ℎ𝑒 𝑝𝑟𝑖𝑜𝑟 = 𝑤ℎ𝑎𝑡 𝑤𝑒 𝑏𝑒𝑙𝑖𝑒𝑣𝑒 𝑖𝑛 𝑃 𝐷𝑎𝑡𝑎 𝑇𝑟𝑢𝑡ℎ = 𝑡ℎ𝑒 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 = 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝑒𝑑 𝑡𝑜 𝑐𝑜𝑛𝑓𝑖𝑟𝑚 𝑜𝑢𝑟 𝑏𝑒𝑙𝑖𝑒𝑓 𝑃 𝑇𝑟𝑢𝑡ℎ 𝐷𝑎𝑡𝑎 = 𝑡ℎ𝑒 𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 = 𝑡ℎ𝑒 𝑢𝑝𝑑𝑎𝑡𝑒𝑑 𝑏𝑒𝑙𝑖𝑒𝑓 𝑃 𝑇𝑟𝑢𝑡ℎ 𝐷𝑎𝑡𝑎 ∝ 𝑃 𝑇𝑟𝑢𝑡ℎ 𝑃(𝐷𝑎𝑡𝑎|𝑇𝑟𝑢𝑡ℎ) Prior belief The data collected Updated belief Updated belief Thedata collected Prior belief
  19. 19. 𝒑𝒐𝒔𝒕𝒆𝒓𝒊𝒐𝒓 ∝ 𝒑𝒓𝒊𝒐𝒓 ∗ 𝒍𝒊𝒌𝒆𝒍𝒊𝒉𝒐𝒐𝒅 ROI, customer retention, losing umbrella: all is based on some previous belief
  20. 20. Why we may prefer to use Bayesian rather than Classical approaches to the data? problem with small n large p limited influence on what features will be selected in classical approaches power of making decision what coefficients are going into the model or how strongly they will go into the model.
  21. 21. Why we are so different yet so similar - No two people are exactly alike and no two people are exactly different preferences
  22. 22.  Bayesian statistics allows you to be subjective, to better connect the real world with the data.  P-values and confidence intervals vs posterior distribution. <all outcomes and their probabilities>  Answers that we look for do not match the answers from classical models.  Important question: what is the probability of an event when the p-value is less than 0.005?  A better than B with p-value 0..001. A is more expensive.  You have the predicted probability of quality guarantee in hand., expected prices on the market  Bayesian methods support complex decision – making under uncertainty.
  23. 23. Bayesian methods provide tradeoffs between speed and generality
  24. 24. Don’t know priors Are you sure? Multiple module analysis with different level of priors.
  25. 25. • Business rules influencing decision • Movement of needs depending on price • We need to think about competitors, situation on the market, prices of other products within the store
  26. 26. We try to measure the return of investment by media type. We have cross-sectional unit: regions, markets, trade areas, channels, brands, competitor brands. Another dimension is the time series can be weekly, monthly. at least 5 years of monthly data and 2 years of weekly data. The dependent variable we would have to be units, not currency due to price elasticity. Marketing Mix Modelling
  27. 27. • the theory that will never die • Bayesian Methods for Hackers - http://camdavidsonpilon.github.io/Probabilistic- Programming-and-Bayesian-Methods-for-Hackers/ • Think Bayes – Bayesian Statistics in Python https://greenteapress.com/wp/think-bayes/ • Statistical Computing for Scientists and engineers - https://www.zabaras.com/statistical- computing-2017 • Chris Bishop Introduction to Bayesian Inference: http://videolectures.net/mlss09uk_bishop_ibi/?q=mlss+2009 • Statistical Rethinking: Ebook: http://xcelab.net/rmpubs/rethinking/Statistical_Rethinking_sample.pdf Videos: https://www.youtube.com/watch?v=oy7Ks3YfbDg&list=PLDcUM9US4XdM9_N6XUUFrhghGJ4K2 5bFc
  28. 28. MARTA FAJLHAUER Email: fajlhauermarta@gmail.com LinkedIn: https://www.linkedin.com/in/martafajlhauer/

Hinweis der Redaktion

  • Structure of the talk.
  • Statement: “I want to work in Data Science” based on salary
    Explosion of information.
    First conference and where my friends works.

×