Almost every application needs data to function - and if you don't know how to be nice to your data, then things will start to go wrong. This talk aims to convince JavaScript developers that they do need to care about statistics, and then talk about how to do so. We look at some theory and lots of case studies and real-world advice to deal with a range of scenarios.
The talk aims to touch on the entire data life cycle: We'll dive into data modelling and how the shape and size of your data affects your architecture, and how to build these architectures using JavaScript. Once the data is in the front-end, we'll touch on the wide range of libraries that allows your code to react based on the data, and the wrappers on top that aid visualisation and readability.
6. C O N T E N T S
T H E O RY CA S E S T U D I E S
JAVA S C R I P T
A P P L I CAT I O N
W H AT I S
DATA ?
G A I N I N G
I N S I G H T S
R A N D O M N E S S S I M U L AT I O N
L E A R N I N G T H R O U G H
Reward: What shape is the internet?
11. W H AT D ATA
WA S T H E R E ?
• Counts of lists (e.g. brands,
products etc.)
• Stock levels and prices of
products
• Days an item has been out
of stock
12. W H AT D ATA
WA S T H E R E ?
• Non-functional data
• Numbers of users
• Performance for users
• Performance of third
party APIs
• Robustness of system
(Uptime, status codes,
frequency of errors)
13. T H E R E I S D ATA
E V E RY W H E R E
T H E L E S S O N ?
16. W H AT D ATA
S H O U L D I C A R E
A B O U T ?
• Data you get repeatedly
• Data you can extract
‘information’ from
• Normally this means
numerical data, though
NLP is getting big!
• Data that answers valuable
questions
20. S U M M A RY
S TAT I S T I C S
• A statistic is a function of
the data we have inputed
• It aims to capture
information about values
to make it more
understandable
21. T H E FA M O U S
O N E :
• Mean (‘average’)
• Sum all of the data
and divide by the
number of items
• Gives a sense of ‘size’
27. Discrete Variables
Can be any of a list of values, each with its own probability
H E A D S 0 . 5
TA I L S 0 . 5
2 1 / 3 6
3 2 / 3 6
4 3 / 3 6
5 4 / 3 6
6 5 / 3 6
7 6 / 3 6
8 5 / 3 6
9 4 / 3 6
1 0 3 / 3 6
1 1 2 / 3 6
1 2 1 / 3 6
28. This makes sense:
X = Result of a coin flip
H E A D S 0 . 5
TA I L S 0 . 5 But:
X won’t always have the
same value
29. R A N D O M VA R I A B L E S
X = Result of a coin flip
H E A D S 0 . 5
TA I L S 0 . 5
X is a
Random Variable
This is its distribution
37. A U D I T I N G A
L E D G E R
• Make a list of all ingoing
and outgoing transactions
• These are random
variables.
• What is their distribution?
Does it deviate from what
we expect?
38. B E N F O R D ’ S L A W
http://www.journalofaccountancy.com/Issues/1999/May/nigrini
39. I N T U I T I V E
U S E R I N P U T S
D E S I G N I N G
40. O U R TA S K …
• Designing a system that
tries to understand what
happens under financial
system “shocks”
• So: a user would input a
shock, its impacts would
propagate and we would
see our bottom line.
41. O U R F I R S T AT T E M P T
• Shock ‘sliders’ that scaled linearly
0 %
2 5 %
B O O M
9 0 %
B U S T
42. D I S T R I B U T I O N O F F I N A N C I A L
C H A N G E S
43. S O …
• Shock ‘sliders’ that scaled linearly
0 %
8 %
B O O M
1 0 5 %
B U S T
Change that happens
with 75% chance
Change that happens
with 10% chance
46. S O M E
WA R N I N G S
• Exactly what randomness
means is a fuzzy question.
• These numbers are not
‘cryptographically’
random.
47. J AVA S C R I P T ’ S
E N T RY T O
R A N D O M N E S S
• Different runtimes can
implement it differently.
• V8 implements Multiply-With-
Carry:
• Take a sequence of ‘seed’
values
• Iteratively perform modular
arithmetic-based operations
• Extend the initial seed values
to a longer sequence.
Math.random()
48. W H AT A B O U T
O T H E R
D I S T R I B U T I O N S ?
B U T …
49. T H E S H O R T A N S W E R
Math.random()= f( )
50. T H E S H O R T A N S W E R
=
H E A D S 0 . 5
TA I L S 0 . 5
=
51. W H AT ’ S T H E F U N C T I O N ?
jStat
beta
centralF
cauchy
chi-squared
exponential
gamma
inverse gamma
kumaraswamy
lognormal
normal
pareto
student t
uniform
weibull
binomial
negative binomial
hypergeometric
poisson
triangular
OR
53. w hy w o u l d i w a n t
t o u s e
R A N D O M N E S S
?
54. S T U B B E D
T E S T D ATA
• Avoid coupling yourself to
specific test
implementations
• Spin-up life-like
environments for load
testing
55. N O N -
D E T E R M I N I S T I C
A L G O R I T H M S
• Modelling underlying or
random data
• Solving a problem that is
expensive or impossible to
solve perfectly
57. C H O O S I N G T H E
D I S T R I B U T I O N
• What if a ‘uniform’
distribution isn’t enough?
• What if we want random
data that isn’t just
numbers?
61. B a r a b a s i - A l b e r t
R a n d o m M o d e l
62. B A R A B A S I - A L B E R T
R A N D O M M O D E L
• Start with two linked
objects
• Add one new object at a
time
• Link that object to one
existing object, with
already ‘popular’ objects
more likely to be chosen.
63. T H I S
M O D E L S …
• Academic Citations
• Actor filmographies
• Spread of Infectious
diseases
• Social Networks
64. C O N T E N T S
T H E O RY CA S E S T U D I E S
JAVA S C R I P T
A P P L I CAT I O N
W H AT I S
DATA ?
G A I N I N G
I N S I G H T S
R A N D O M N E S S S I M U L AT I O N
L E A R N I N G T H R O U G H
Reward: What shape is the internet?
66. • Data is any information we collect. Not all data is
valuable.
• Seeing trends in lots of numbers is hard. Summary
statistics and charts help us unpick its meaning.
• Data can be treated as random ‘realisations’ from a
backing distribution.
• Making random variables is easy, and can be done in
different shapes for different purposes.
W H AT I S
DATA ?
G A I N I N G
I N S I G H T S
R A N D O M N E S S S I M U L AT I O N
67. L I B R A R I E S W E U S E D
G E N E R A L L I B R A R I E S
K N O C K O U T. J S
R E Q U I R E . J S
B O O T S T R A P
D ATA M A N I P U L AT I O N
L O D A S H
J S TAT
D ATA I M P O RT PA PA PA R S E
C H A RT I N G
D 3
C H A R T. J S
68. T H A N K YO U
D av i d S i m o n s
@ Swa m Wi t h Tu rt l e s