Anzeige

Statistical Programming with JavaScript

Data Architect um Ovo Energy
15. Apr 2016
Anzeige

Más contenido relacionado

Presentaciones para ti(20)

Similar a Statistical Programming with JavaScript(20)

Anzeige
Anzeige

Statistical Programming with JavaScript

  1. STAT I ST I CA L P R O G RA M M I N G I N JAVAS C R I PT D av i d S i m o n s @ Swa m Wi t h Tu rt l e s
  2. slides: www.tinyurl.com/stats-js
  3. demos: swamwithturtles.github.io/js-statistics code: github.com/SwamWithTurtles/js-statistics
  4. W H O A M I ? Freelance Software Developer @SwamWithTurtles Java and JavaScript Afraid of goats?
  5. W H O A M I ? DATA NERD
  6. C O N T E N T S T H E O RY CA S E S T U D I E S JAVA S C R I P T A P P L I CAT I O N W H AT I S DATA ? G A I N I N G I N S I G H T S R A N D O M N E S S S I M U L AT I O N L E A R N I N G T H R O U G H Reward: What shape is the internet?
  7. Data
  8. B E H I N D T H E H O O D A P I D B A D M I N I N T E R F A C E S C H E D U L E D T A S K S 3 R D P A R T Y A P I S
  9. W H AT D ATA WA S T H E R E ? S O …
  10. W H AT D ATA WA S T H E R E ? • Counts of lists (e.g. brands, products etc.) • Stock levels and prices of products • Days an item has been out of stock
  11. W H AT D ATA WA S T H E R E ? • Non-functional data • Numbers of users • Performance for users • Performance of third party APIs • Robustness of system (Uptime, status codes, frequency of errors)
  12. T H E R E I S D ATA E V E RY W H E R E T H E L E S S O N ?
  13. What is data?
  14. What is good data?
  15. W H AT D ATA S H O U L D I C A R E A B O U T ? • Data you get repeatedly • Data you can extract ‘information’ from • Normally this means numerical data, though NLP is getting big! • Data that answers valuable questions
  16. Gaining Insights
  17. A d a t a s e t : Identification WIND CEILING TEMP DEWPT RHX USAF NCDC Date HrMn I Type QCP Dir Q I Spd Q Hgt Q I I Temp Q Dewpt Q RHx 865300,99999,19860401,0000,4,FM-12, ,110,1,N, 7.2,1,22000,1,C,N, 21.6,1, 19.2,1, 86, 865300,99999,19860401,0300,4,FM-12, ,110,1,N, 5.1,1,22000,1,C,N, 19.4,1, 18.5,1, 95, 865300,99999,19860401,0600,4,FM-12, ,070,1,N, 7.2,1,03600,1,C,N, 19.2,1, 999.9,9,999, 865300,99999,19860401,0900,4,FM-12, ,070,1,N, 6.2,1,00120,1,C,N, 19.2,1, 18.9,1, 98, 865300,99999,19860401,1200,4,FM-12, ,070,1,N, 7.7,1,03600,1,C,N, 21.6,1, 18.3,1, 82, 865300,99999,19860401,1500,4,FM-12, ,040,1,N, 9.8,1,03600,1,C,N, 23.0,1, 18.8,1, 77, 865300,99999,19860401,1800,4,FM-12, ,030,1,N, 6.2,1,03600,1,C,N, 19.6,1, 19.0,1, 96, 865300,99999,19860401,2100,4,FM-12, ,050,1,N, 6.7,1,03600,1,C,N, 19.0,1, 18.7,1, 98, 865300,99999,19860402,0000,4,FM-12, ,340,1,N, 7.2,1,03600,1,C,N, 20.0,1, 19.4,1, 96, 865300,99999,19860402,0300,4,FM-12, ,360,1,N, 4.1,1,03600,1,C,N, 19.4,1, 19.1,1, 98, 865300,99999,19860402,0600,4,FM-12, ,999,1,C, 0.0,1,03600,1,C,N, 19.2,1, 18.9,1, 98, 865300,99999,19860402,0900,4,FM-12, ,999,1,C, 0.0,1,00210,1,C,N, 19.0,1, 18.7,1, 98, 865300,99999,19860402,1200,4,FM-12, ,200,1,N, 2.6,1,00210,1,C,N, 20.4,1, 20.1,1, 98, 865300,99999,19860402,1500,4,FM-12, ,210,1,N, 5.1,1,00750,1,C,N, 23.2,1, 19.3,1, 79, 865300,99999,19860402,1800,4,FM-12, ,200,1,N, 3.1,1,00750,1,C,N, 26.4,1, 18.4,1, 62, 865300,99999,19860402,2100,4,FM-12, ,999,1,C, 0.0,1,22000,1,C,N, 26.2,1, 17.1,1, 57, 865300,99999,19860403,0000,4,FM-12, ,140,1,N, 4.1,1,22000,1,C,N, 19.2,1, 17.0,1, 87, 865300,99999,19860403,0300,4,FM-12, ,999,1,C, 0.0,1,22000,1,C,N, 15.8,1, 15.2,1, 96, 865300,99999,19860403,0600,4,FM-12, ,999,1,C, 0.0,1,22000,1,C,N, 15.4,1, 14.0,1, 91, 865300,99999,19860403,1200,4,FM-12, ,060,1,N, 5.1,1,22000,1,C,N, 21.0,1, 19.8,1, 93, 865300,99999,19860403,1500,4,FM-12, ,060,1,N, 4.1,1,00900,1,C,N, 24.8,1, 21.3,1, 81, 865300,99999,19860403,1800,4,FM-12, ,050,1,N, 7.7,1,09000,1,C,N, 28.0,1, 21.4,1, 67, 865300,99999,19860403,2100,4,FM-12, ,040,1,N, 5.1,1,09000,1,C,N, 25.4,1, 21.4,1, 79, 865300,99999,19860404,0000,4,FM-12, ,060,1,N, 6.2,1,03600,1,C,N, 22.2,1, 21.3,1, 95, 865300,99999,19860404,0300,4,FM-12, ,050,1,N, 5.1,1,09000,1,C,N, 21.0,1, 20.7,1, 98, 865300,99999,19860404,0600,4,FM-12, ,060,1,N, 6.2,1,22000,1,C,N, 20.2,1, 19.9,1, 98, 865300,99999,19860404,1200,4,FM-12, ,040,1,N, 5.1,1,00120,1,C,N, 20.4,1, 19.5,1, 95, 865300,99999,19860404,1500,4,FM-12, ,020,1,N, 7.7,1,00420,1,C,N, 24.2,1, 20.4,1, 79, 865300,99999,19860404,1800,4,FM-12, ,250,1,N, 4.1,1,00750,1,C,N, 25.6,1, 20.7,1, 74, 865300,99999,19860404,2100,4,FM-12, ,250,1,N, 5.1,1,00750,1,C,N, 23.6,1, 20.4,1, 82, 865300,99999,19860405,0000,4,FM-12, ,180,1,N, 6.2,1,00420,1,C,N, 20.2,1, 19.6,1, 96,
  18. s u m m a r y s t a t i s t i c s
  19. S U M M A RY S TAT I S T I C S • A statistic is a function of the data we have inputed • It aims to capture information about values to make it more understandable
  20. T H E FA M O U S O N E : • Mean (‘average’) • Sum all of the data and divide by the number of items • Gives a sense of ‘size’
  21. Group 1: Group 2:
  22. O T H E R S TAT I S T I C S • “Location” • Mean, Mode, Median • “Spread” • Standard Deviation • “Shape” • Skew, Kurtosis
  23. D E M O
  24. Distributions
  25. What is a random variable?
  26. Discrete Variables Can be any of a list of values, each with its own probability H E A D S 0 . 5 TA I L S 0 . 5 2 1 / 3 6 3 2 / 3 6 4 3 / 3 6 5 4 / 3 6 6 5 / 3 6 7 6 / 3 6 8 5 / 3 6 9 4 / 3 6 1 0 3 / 3 6 1 1 2 / 3 6 1 2 1 / 3 6
  27. This makes sense: X = Result of a coin flip H E A D S 0 . 5 TA I L S 0 . 5 But: X won’t always have the same value
  28. R A N D O M VA R I A B L E S X = Result of a coin flip H E A D S 0 . 5 TA I L S 0 . 5 X is a Random Variable This is its distribution
  29. D E M O …
  30. Continuous A numerical variable, that can be any number (sometimes within a range) height weight Math.random()
  31. H O W D O W E D E F I N E T H E D I S T R I B U T I O N ? Math.random() height
  32. D E M O
  33. S O W H AT ? E R R R …
  34. • When we do data analysis, we’re really looking at the range of values a random variable can be… • … and asking questions about its distribution.
  35. Y O U ’ R E A N A U D I T O R I M A G I N E …
  36. A U D I T I N G A L E D G E R • Make a list of all ingoing and outgoing transactions • These are random variables. • What is their distribution? Does it deviate from what we expect?
  37. B E N F O R D ’ S L A W http://www.journalofaccountancy.com/Issues/1999/May/nigrini
  38. I N T U I T I V E U S E R I N P U T S D E S I G N I N G
  39. O U R TA S K … • Designing a system that tries to understand what happens under financial system “shocks” • So: a user would input a shock, its impacts would propagate and we would see our bottom line.
  40. O U R F I R S T AT T E M P T • Shock ‘sliders’ that scaled linearly 0 % 2 5 % B O O M 9 0 % B U S T
  41. D I S T R I B U T I O N O F F I N A N C I A L C H A N G E S
  42. S O … • Shock ‘sliders’ that scaled linearly 0 % 8 % B O O M 1 0 5 % B U S T Change that happens with 75% chance Change that happens with 10% chance
  43. Randomness
  44. M A K I N G R A N D O M VA R I A B L E S
  45. S O M E WA R N I N G S • Exactly what randomness means is a fuzzy question. • These numbers are not ‘cryptographically’ random.
  46. J AVA S C R I P T ’ S E N T RY T O R A N D O M N E S S • Different runtimes can implement it differently. • V8 implements Multiply-With- Carry: • Take a sequence of ‘seed’ values • Iteratively perform modular arithmetic-based operations • Extend the initial seed values to a longer sequence. Math.random()
  47. W H AT A B O U T O T H E R D I S T R I B U T I O N S ? B U T …
  48. T H E S H O R T A N S W E R Math.random()= f( )
  49. T H E S H O R T A N S W E R = H E A D S 0 . 5 TA I L S 0 . 5 =
  50. W H AT ’ S T H E F U N C T I O N ? jStat beta centralF cauchy chi-squared exponential gamma inverse gamma kumaraswamy lognormal normal pareto student t uniform weibull binomial negative binomial hypergeometric poisson triangular OR
  51. U S I N G R A N D O M N E S S
  52. w hy w o u l d i w a n t t o u s e R A N D O M N E S S ?
  53. S T U B B E D T E S T D ATA • Avoid coupling yourself to specific test implementations • Spin-up life-like environments for load testing
  54. N O N - D E T E R M I N I S T I C A L G O R I T H M S • Modelling underlying or random data • Solving a problem that is expensive or impossible to solve perfectly
  55. P I T FA L L S
  56. C H O O S I N G T H E D I S T R I B U T I O N • What if a ‘uniform’ distribution isn’t enough? • What if we want random data that isn’t just numbers?
  57. E X A M P L E : S O C I A L N E T W O R K
  58. E X A M P L E : S O C I A L N E T W O R K 11 Traversals
  59. D E M O
  60. B a r a b a s i - A l b e r t R a n d o m M o d e l
  61. B A R A B A S I - A L B E R T R A N D O M M O D E L • Start with two linked objects • Add one new object at a time • Link that object to one existing object, with already ‘popular’ objects more likely to be chosen.
  62. T H I S M O D E L S … • Academic Citations • Actor filmographies • Spread of Infectious diseases • Social Networks
  63. C O N T E N T S T H E O RY CA S E S T U D I E S JAVA S C R I P T A P P L I CAT I O N W H AT I S DATA ? G A I N I N G I N S I G H T S R A N D O M N E S S S I M U L AT I O N L E A R N I N G T H R O U G H Reward: What shape is the internet?
  64. We’reOUTof TIME
  65. • Data is any information we collect. Not all data is valuable. • Seeing trends in lots of numbers is hard. Summary statistics and charts help us unpick its meaning. • Data can be treated as random ‘realisations’ from a backing distribution. • Making random variables is easy, and can be done in different shapes for different purposes. W H AT I S DATA ? G A I N I N G I N S I G H T S R A N D O M N E S S S I M U L AT I O N
  66. L I B R A R I E S W E U S E D G E N E R A L L I B R A R I E S K N O C K O U T. J S R E Q U I R E . J S B O O T S T R A P D ATA M A N I P U L AT I O N L O D A S H J S TAT D ATA I M P O RT PA PA PA R S E C H A RT I N G D 3 C H A R T. J S
  67. T H A N K YO U D av i d S i m o n s @ Swa m Wi t h Tu rt l e s
Anzeige