SlideShare ist ein Scribd-Unternehmen logo
1 von 73
Applied Bayesian Inference with
PyMC
@MrSantoni
Which color will sell more?
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
#buy / N #buy / N
• What if N is small?
• What is N to have 90% confidence?
• What if N is different on A and B?
Bayesian Inference
Probability:
Claim: we think Bayesian
Frequentist
Bayesian
Frequence
Belief
test 1 test 2 test 3
Claim: we think Bayesian
no-bugs
confidence
Bayesian Inference =
update your beliefs
new evidence
prior belief
The Developer View
Statistical
Problem
def frequentist(): return 80%
def bayesian(): return
0% 100%
How to?
0% 100%
How to?
𝑃 𝐴 𝐵 =
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃(𝐵)
Closed-form solution:
Realistic Cases
Toy Examples
0% 100%
PyMC
PyMC
• Perform Bayesian Inference
• Markov Chain Monte Carlo techniques
• A.k.a. Probabilistic Programming
Show me the code!
Example A/B test
Only one difference between A and B
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Assume there is
p_a
probability of clicking BUY when landing on A
p_b
probability of clicking BUY when landing on B
How to compute p_a and p_b?
Page A
– N_a visitors
– C_a BUY-click on page A
Page B
– N_b visitors
– C_b BUY-click on page B
Frequentist:
C_a / N_a
BUT:
Observed frequency does not necessarily equal p_a
Bayesian:
Infer true frequency from observed data
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Bayesian Worflow
1. Define prior
2. Fit to observations
3. Get posteriors
from pymc import Uniform, rbernoulli, Bernoulli, MCMC
from matplotlib import pyplot as plt
p_A_true = 0.05
N = 1500
occurrences = rbernoulli(p_A_true, N)
print 'Click-BUY:'
print occurrences.sum()
print 'Observed frequency:'
print occurrences.sum() / float(N)
Click-BUY:
68
Observed frequency:
0.0453333333333
Clicking BUY
Bernoulli distribution
𝑃 𝑐𝑙𝑖𝑐𝑘 =
𝑝
1 − 𝑝
𝑐𝑙𝑖𝑐𝑘 = 1
𝑐𝑙𝑖𝑐𝑘 = 0
0
0.2
0.4
0.6
0.8
click=1 click=0
𝑝
p_A = Uniform('p_A', lower=0, upper=1)
0 1 P_a
print p_A.random()
print p_A.value
array(0.906086144982998)
array(0.906086144982998)
print p_A.random()
print p_A.value
array(0.285313846133313)
array(0.285313846133313)
p_A = Uniform('p_A', lower=0, upper=1)
obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
p_A = Uniform('p_A', lower=0, upper=1)
[------- 20% ] 4053 of 20000 complete in 0.5 sec
[------------- 36% ] 7315 of 20000 complete in 1.0 sec
[-----------------53% ] 10627 of 20000 complete in 1.5 sec
[-----------------69%------ ] 13939 of 20000 complete in 2.0 sec
[-----------------81%----------- ] 16376 of 20000 complete in 2.5 sec
[-----------------96%---------------- ] 19342 of 20000 complete in 3.0 sec
[-----------------100%-----------------] 20000 of 20000 complete in 3.1 sec
[ 0.04656576 0.04656576 0.04656576 ..., 0.03803667 0.03803667
0.03803667]
mcmc = MCMC([p_A, obs])
mcmc.sample(20000, 1000)
print mcmc.trace('p_A')[:]
obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
plt.figure(figsize=(8, 7))
plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True)
plt.xlabel('Probability of clicking BUY')
plt.ylabel('Density')
plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A')
plt.legend()
plt.savefig('p_A_hist_N_%s.png' % N)
plt.show()
Confidence 90% that P is between X and Y?
There is 90% probability that p_A is between
0.0373019596856 and 0.0548052806892
p_A_samples = mcmc.trace('p_A')[:]
lower_bound = np.percentile(p_A_samples, 5)
upper_bound = np.percentile(p_A_samples, 95)
print 'There is 90%% probability that p_A is between %s and %s' %
(lower_bound, upper_bound)
What if N_a is lower?
from pymc import Uniform, rbernoulli, Bernoulli, MCMC
from matplotlib import pyplot as plt
p_A_true = 0.05
N = 50
occurrences = rbernoulli(p_A_true, N)
print 'Click-BUY:'
print occurrences.sum()
print 'Observed frequency:'
print occurrences.sum() / float(N)
Click-BUY:
2
Observed frequency:
0.04
p_A = Uniform('p_A', lower=0, upper=1)
obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
mcmc = MCMC([p_A, obs])
mcmc.sample(20000, 1000)
print mcmc.trace('p_A')[:]
[----- 14% ] 2874 of 20000 complete in 0.5 sec
[----------- 30% ] 6035 of 20000 complete in 1.0 sec
[-----------------47% ] 9440 of 20000 complete in 1.5 sec
[-----------------63%---- ] 12775 of 20000 complete in 2.0 sec
[-----------------81%---------- ] 16203 of 20000 complete in 2.5 sec
[-----------------100%-----------------] 20000 of 20000 complete in 3.0 sec
[ 0.06240723 0.06240723 0.06240723 ..., 0.01864419 0.01864419
0.01864419]
plt.figure(figsize=(8, 7))
plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True)
plt.xlabel('Probability of clicking BUY')
plt.ylabel('Density')
plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A')
plt.legend()
plt.savefig('p_A_hist_N_%s.png' % N)
plt.show()
Confidence 90% that P is between X and Y?
There is 90% probability that p_A is between
0.0160966147705 and 0.114655284797
p_A_samples = mcmc.trace('p_A')[:]
lower_bound = np.percentile(p_A_samples, 5)
upper_bound = np.percentile(p_A_samples, 95)
print 'There is 90%% probability that p_A is between %s and %s' %
(lower_bound, upper_bound)
N_a = 1500 N_a = 50
Does the red have a larger probability of being clicked?
Page A
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
Page B
A Tea Pot
Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit
delicata id, idque splendide constituam ex vel. Sea in nemore impedit
singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu.
Mel id mollis comprehensam, nemore verear mei cu.
Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna
latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum
omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis
delicata te pri, commodo corrumpit deterruisset eu cum. An mei
tincidunt incorrupte dissentias, prompta diceret delenit vis ad.
Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum
nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum
constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et
his. Munere discere vis id, te sea homero suscipiantur definitionem, quot
dicam vis ne.
BUY
from pymc import Uniform, rbernoulli, Bernoulli, MCMC, deterministic
from matplotlib import pyplot as plt
p_A_true = 0.05
p_B_true = 0.04
N_A = 1500
N_B = 750
occurrences_A = rbernoulli(p_A_true, N_A)
occurrences_B = rbernoulli(p_B_true, N_B)
print 'Observed frequency:'
print 'A'
print occurrences_A.sum() / float(N_A)
print 'B'
print occurrences_B.sum() / float(N_B)
Observed frequency:
A
0.0533333333333
B
0.0413333333333
p_A = Uniform('p_A', lower=0, upper=1)
p_B = Uniform('p_B', lower=0, upper=1)
@deterministic
def delta(p_A=p_A, p_B=p_B):
return p_A - p_B
obs_A = Bernoulli('obs_A', p_A, value=occurrences_A, observed=True)
obs_B = Bernoulli('obs_B', p_B, value=occurrences_B, observed=True)
mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])
mcmc.sample(25000, 5000)
[----- 14% ] 3561 of 25000 complete in 0.5 sec
[--------- 25% ] 6332 of 25000 complete in 1.0 sec
[------------ 33% ] 8454 of 25000 complete in 1.5 sec
[--------------- 41% ] 10499 of 25000 complete in 2.0 sec
[-----------------50% ] 12602 of 25000 complete in 2.5 sec
[-----------------59%-- ] 14780 of 25000 complete in 3.0 sec
[-----------------67%----- ] 16883 of 25000 complete in 3.5 sec
[-----------------75%-------- ] 18954 of 25000 complete in 4.0 sec
[-----------------83%----------- ] 20877 of 25000 complete in 4.5 sec
[-----------------91%-------------- ] 22924 of 25000 complete in 5.0 sec
[-----------------100%-----------------] 25000 of 25000 complete in 5.5 sec
p_A_samples = mcmc.trace('p_A')[:]
p_B_samples = mcmc.trace('p_B')[:]
delta_samples = mcmc.trace('delta')[:]
plt.subplot(3,1,1)
plt.xlim(0, 0.1)
plt.hist(p_A_samples, bins=35, histtype='stepfilled', normed=True, color='blue', label='Posterior
of p_A')
plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A (unknown)')
plt.xlabel('Probability of clicking BUY via A')
plt.legend()
plt.subplot(3,1,2)
plt.xlim(0, 0.1)
plt.hist(p_B_samples, bins=35, histtype='stepfilled', normed=True, color='green',
label='Posterior of p_B')
plt.vlines(p_B_true, 0, 90, linestyle='--', label='True p_B (unknown)')
plt.xlabel('Probability of clicking BUY via B')
plt.legend()
plt.subplot(3,1,3)
plt.xlim(0, 0.1)
plt.hist(delta_samples, bins=35, histtype='stepfilled', normed=True, color='red', label='Posterior
of delta')
plt.vlines(p_A_true - p_B_true, 0, 90, linestyle='--', label='True delta (unknown)')
plt.xlabel('p_A - p_B')
plt.legend()
plt.savefig('A_and_B.png')
plt.show()
p_A > p_B
How much are we confident?
print 'Probability that p_A > p_B:'
print (delta_samples > 0).mean()
Probability that p_A > p_B:
0.8919
N_A = 1500
N_B = 750
N_A = 1500
N_B = 200
print 'Probability that p_A > p_B:'
print (delta_samples > 0).mean()
Probability that p_A > p_B:
0.73455
MCMC
mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])
mcmc.sample(25000, 5000)
Posterior P(p_A, p_B, delta | obs_A, obs_B) as samples
25000 iterations
5000 burn-in
Metropolis-Hastings algorithm
Open the black box
mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta])
mcmc.sample(25000, 5000)
from pymc.Matplot import plot as mcplot
mcplot(mcmc)
PyMC
• Easy to interpret results
– confidence, no p-values!
• No crazy math
• Computationally expensive
Thank you
@MrSantoni
marcosantoni@hotmail.it
Back
Serie A 13/14
Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR
24/08/2013 Sampdoria Juventus 0 1 A 0 0 D
24/08/2013 Verona Milan 2 1 H 1 1 D
25/08/2013 Cagliari Atalanta 2 1 H 1 1 D
25/08/2013 Inter Genoa 2 0 H 0 0 D
25/08/2013 Lazio Udinese 2 1 H 2 0 H
25/08/2013 Livorno Roma 0 2 A 0 0 D
25/08/2013 Napoli Bologna 3 0 H 2 0 H
25/08/2013 Parma Chievo 0 0 D 0 0 D
25/08/2013 Torino Sassuolo 2 0 H 1 0 H
26/08/2013 Fiorentina Catania 2 1 H 2 1 H
31/08/2013 Chievo Napoli 2 4 A 2 2 D
31/08/2013 Juventus Lazio 4 1 H 2 1 H
01/09/2013 Atalanta Torino 2 0 H 0 0 D
01/09/2013 Bologna Sampdoria 2 2 D 1 1 D
01/09/2013 Catania Inter 0 3 A 0 1 A
01/09/2013 Genoa Fiorentina 2 5 A 0 3 A
01/09/2013 Milan Cagliari 3 1 H 2 1 H
01/09/2013 Roma Verona 3 0 H 0 0 D
01/09/2013 Sassuolo Livorno 1 4 A 0 1 A
01/09/2013 Udinese Parma 3 1 H 1 0 H
14/09/2013 Inter Juventus 1 1 D 0 0 D
14/09/2013 Napoli Atalanta 2 0 H 0 0 D
14/09/2013 Torino Milan 2 2 D 0 0 D
15/09/2013 Fiorentina Cagliari 1 1 D 0 0 D
https://datahub.io/dataset/italian-football-data-serie-a-b
Win-rate
Did it change?
Bayesian Worflow
1. Define Prior
2. Fit to observations
3. Get Posteriors
Winning a Match
Bernoulli distribution
𝑃 𝑤 =
𝑝
1 − 𝑝
𝑤 = 1
𝑤 = 0
0
0.2
0.4
0.6
0.8
Win (w=1) Lose (w=0)
𝑝
𝑝: switchpoint?
Model the switchpoint
𝑝 =
𝑝1
𝑝2
𝑡 < 𝜏
𝑡 ≥ 𝜏
Goal -> infer 𝑝1, 𝑝2, 𝜏, 𝑝
Bayesian Worflow
1. Define Prior
2. Fit to observations
3. Get Posteriors
Let’s model this
• goal: infer unknown p1, p2, TAU
• FIRST STEP OF Bayesian Inference: assign a prior
probability to different possible values of p
• what would be a good prior for p1, p2? Use
uniform:
– p1 ~ Uniform(0,1)
– p2 ~ Uniform(0,1)
– TAU ~ DiscreteUniform(1, 38)
• P(TAU=k)=1/38 for all k
from pymc import Uniform, DiscreteUniform, deterministic, Bernoulli, Model, MCMC
p_1 = Uniform('p_1', lower=0, upper=1)
p_2 = Uniform('p_2', lower=0, upper=1)
tau = DiscreteUniform('tau', lower=1, upper=38)
print 'Random output: ', tau.random(), tau.random(), tau.random()
Random output: 14 24 33
@deterministic
def p_(tau=tau, p_1=p_1, p_2=p_2, num_matches=38):
# concatenate p_1 and p_2 based on tau
out = np.empty(num_matches)
out[:tau] = p_1
out[tau:] = p_2
return out
Load Data
import pandas as pd
df = pd.read_csv('serie_a.csv', parse_dates=['Date'], date_parser=parse_date)
matches = df[(df.HomeTeam == ‘Milan’) | (df.AwayTeam == ‘Milan’)]
matches = matches.set_index(['Date'])
matches = compute_extra_columns(matches, team)
# some pandas manipulations occur here
matches[‘Win’] = … # 1 if Milan won, 0 otherwise
Fit the Model
observed_matches = Bernoulli('obs', p=p_, value=matches[['Win']], observed=True)
model = Model([observed_matches, p_1, p_2, tau])
mcmc = MCMC(model)
mcmc.sample(40000, 10000)
p_1_samples = mcmc.trace('p_1')[:]
p_2_samples = mcmc.trace('p_2')[:]
tau_samples = mcmc.trace('tau')[:]
print p_1_samples[:10]
print p_2_samples[:10]
print tau_samples[:10]
[ 0.42067236 0.42067236 0.42067236 0.43900391 0.43900391 0.43900391
0.43900391 0.43900391 0.43900391 0.43900391]
[ 0.49213381 0.49213381 0.49213381 0.56072562 0.79863176 0.79863176
0.67416932 0.68382528 0.6069458 0.60062698]
[10 10 24 35 35 35 35 27 27 27]
plt.figure(figsize=(14.5, 10))
ax = plt.subplot(311)
ax.set_autoscaley_on(False)
plt.hist(p_1_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_1', color='#A60628', normed=True, bins=30)
plt.legend(loc='upper left')
ax = plt.subplot(312)
plt.hist(p_2_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_2', color='#7A68A6', normed=True, bins=30)
plt.legend(loc='upper left')
ax = plt.subplot(313)
plt.hist(tau_samples, histtype='stepfilled', alpha=0.85, label='posterior of tau', color='#467821', normed=True, bins=30)
plt.legend(loc='upper left')
plt.show()
Expected Win Probability
num_matches = 38
N = tau_samples.shape[0]
expected_p_per_match = np.zeros(num_matches)
for match in range(num_matches):
ix = match < tau_samples
p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]])
expected_p_per_match[match] = np.percentile(p_samples_match, 50)
Compute Confidence Bounds
lower_p_per_match = np.zeros(num_matches)
upper_p_per_match = np.zeros(num_matches)
for match in range(num_matches):
ix = match < tau_samples
p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]])
lower_p_per_match[match] = np.percentile(p_samples_match, 5)
upper_p_per_match[match] = np.percentile(p_samples_match, 95)
Bayesian returns a distribution. What have we gained? We see uncertainty in our
estimates. The wider the distribution, the less certain our posterior belief should be.

Weitere ähnliche Inhalte

Andere mochten auch

Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...
Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...
Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...Ed Batista
 
Probabilistic Programming in Python
Probabilistic Programming in PythonProbabilistic Programming in Python
Probabilistic Programming in PythonPeadar Coyle
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Salesforce Engineering
 
Introduction to CLIPS Expert System
Introduction to CLIPS Expert SystemIntroduction to CLIPS Expert System
Introduction to CLIPS Expert SystemMotaz Saad
 
Probabilistic programming
Probabilistic programmingProbabilistic programming
Probabilistic programmingEli Gottlieb
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesGilad Barkan
 

Andere mochten auch (8)

Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...
Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...
Ed Batista, Interpersonal Dynamics (aka Touchy Feely) @StanfordBiz, Class 4: ...
 
Ai 7
Ai 7Ai 7
Ai 7
 
Probabilistic Programming in Python
Probabilistic Programming in PythonProbabilistic Programming in Python
Probabilistic Programming in Python
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?
 
Introduction to CLIPS Expert System
Introduction to CLIPS Expert SystemIntroduction to CLIPS Expert System
Introduction to CLIPS Expert System
 
Probabilistic programming
Probabilistic programmingProbabilistic programming
Probabilistic programming
 
Mycin
MycinMycin
Mycin
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummies
 

Ähnlich wie Bayesian A/B Testing with PyMC

Ähnlich wie Bayesian A/B Testing with PyMC (10)

PERANCANGAN TEKNIK INDUSTRI.pptx
PERANCANGAN TEKNIK INDUSTRI.pptxPERANCANGAN TEKNIK INDUSTRI.pptx
PERANCANGAN TEKNIK INDUSTRI.pptx
 
Creating Profitable Advertising
Creating Profitable AdvertisingCreating Profitable Advertising
Creating Profitable Advertising
 
Naomi Stevenson Design Portfolio
Naomi Stevenson Design PortfolioNaomi Stevenson Design Portfolio
Naomi Stevenson Design Portfolio
 
Alphard_Purple.pptx
Alphard_Purple.pptxAlphard_Purple.pptx
Alphard_Purple.pptx
 
Alphard green
Alphard greenAlphard green
Alphard green
 
SlideEasy 5.pptx
SlideEasy 5.pptxSlideEasy 5.pptx
SlideEasy 5.pptx
 
Portfolio
PortfolioPortfolio
Portfolio
 
Arcturus.pptx
Arcturus.pptxArcturus.pptx
Arcturus.pptx
 
The Bad & The Ugly
The Bad & The UglyThe Bad & The Ugly
The Bad & The Ugly
 
The Momentum Method
The Momentum MethodThe Momentum Method
The Momentum Method
 

Kürzlich hochgeladen

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 

Kürzlich hochgeladen (20)

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 

Bayesian A/B Testing with PyMC

  • 1. Applied Bayesian Inference with PyMC @MrSantoni
  • 2. Which color will sell more? Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY Page B A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY
  • 3. Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY Page B A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY #buy / N #buy / N
  • 4. • What if N is small? • What is N to have 90% confidence? • What if N is different on A and B?
  • 6. Probability: Claim: we think Bayesian Frequentist Bayesian Frequence Belief
  • 7. test 1 test 2 test 3 Claim: we think Bayesian no-bugs confidence
  • 8. Bayesian Inference = update your beliefs new evidence prior belief
  • 9. The Developer View Statistical Problem def frequentist(): return 80% def bayesian(): return 0% 100%
  • 11. How to? 𝑃 𝐴 𝐵 = 𝑃 𝐵 𝐴 𝑃(𝐴) 𝑃(𝐵) Closed-form solution: Realistic Cases Toy Examples 0% 100%
  • 12. PyMC
  • 13. PyMC • Perform Bayesian Inference • Markov Chain Monte Carlo techniques • A.k.a. Probabilistic Programming
  • 14. Show me the code!
  • 16. Only one difference between A and B Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY Page B A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY
  • 17. Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY Page B A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY
  • 18. Assume there is p_a probability of clicking BUY when landing on A p_b probability of clicking BUY when landing on B How to compute p_a and p_b?
  • 19. Page A – N_a visitors – C_a BUY-click on page A Page B – N_b visitors – C_b BUY-click on page B
  • 20. Frequentist: C_a / N_a BUT: Observed frequency does not necessarily equal p_a
  • 21. Bayesian: Infer true frequency from observed data
  • 22. Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY
  • 23. Bayesian Worflow 1. Define prior 2. Fit to observations 3. Get posteriors
  • 24. from pymc import Uniform, rbernoulli, Bernoulli, MCMC from matplotlib import pyplot as plt p_A_true = 0.05 N = 1500 occurrences = rbernoulli(p_A_true, N) print 'Click-BUY:' print occurrences.sum() print 'Observed frequency:' print occurrences.sum() / float(N) Click-BUY: 68 Observed frequency: 0.0453333333333
  • 25. Clicking BUY Bernoulli distribution 𝑃 𝑐𝑙𝑖𝑐𝑘 = 𝑝 1 − 𝑝 𝑐𝑙𝑖𝑐𝑘 = 1 𝑐𝑙𝑖𝑐𝑘 = 0 0 0.2 0.4 0.6 0.8 click=1 click=0 𝑝
  • 26. p_A = Uniform('p_A', lower=0, upper=1) 0 1 P_a print p_A.random() print p_A.value array(0.906086144982998) array(0.906086144982998) print p_A.random() print p_A.value array(0.285313846133313) array(0.285313846133313)
  • 27. p_A = Uniform('p_A', lower=0, upper=1) obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
  • 28. p_A = Uniform('p_A', lower=0, upper=1) [------- 20% ] 4053 of 20000 complete in 0.5 sec [------------- 36% ] 7315 of 20000 complete in 1.0 sec [-----------------53% ] 10627 of 20000 complete in 1.5 sec [-----------------69%------ ] 13939 of 20000 complete in 2.0 sec [-----------------81%----------- ] 16376 of 20000 complete in 2.5 sec [-----------------96%---------------- ] 19342 of 20000 complete in 3.0 sec [-----------------100%-----------------] 20000 of 20000 complete in 3.1 sec [ 0.04656576 0.04656576 0.04656576 ..., 0.03803667 0.03803667 0.03803667] mcmc = MCMC([p_A, obs]) mcmc.sample(20000, 1000) print mcmc.trace('p_A')[:] obs = Bernoulli('obs', p_A, value=occurrences, observed=True)
  • 29. plt.figure(figsize=(8, 7)) plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True) plt.xlabel('Probability of clicking BUY') plt.ylabel('Density') plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A') plt.legend() plt.savefig('p_A_hist_N_%s.png' % N) plt.show()
  • 30. Confidence 90% that P is between X and Y? There is 90% probability that p_A is between 0.0373019596856 and 0.0548052806892 p_A_samples = mcmc.trace('p_A')[:] lower_bound = np.percentile(p_A_samples, 5) upper_bound = np.percentile(p_A_samples, 95) print 'There is 90%% probability that p_A is between %s and %s' % (lower_bound, upper_bound)
  • 31. What if N_a is lower?
  • 32. from pymc import Uniform, rbernoulli, Bernoulli, MCMC from matplotlib import pyplot as plt p_A_true = 0.05 N = 50 occurrences = rbernoulli(p_A_true, N) print 'Click-BUY:' print occurrences.sum() print 'Observed frequency:' print occurrences.sum() / float(N) Click-BUY: 2 Observed frequency: 0.04
  • 33. p_A = Uniform('p_A', lower=0, upper=1) obs = Bernoulli('obs', p_A, value=occurrences, observed=True) mcmc = MCMC([p_A, obs]) mcmc.sample(20000, 1000) print mcmc.trace('p_A')[:] [----- 14% ] 2874 of 20000 complete in 0.5 sec [----------- 30% ] 6035 of 20000 complete in 1.0 sec [-----------------47% ] 9440 of 20000 complete in 1.5 sec [-----------------63%---- ] 12775 of 20000 complete in 2.0 sec [-----------------81%---------- ] 16203 of 20000 complete in 2.5 sec [-----------------100%-----------------] 20000 of 20000 complete in 3.0 sec [ 0.06240723 0.06240723 0.06240723 ..., 0.01864419 0.01864419 0.01864419]
  • 34. plt.figure(figsize=(8, 7)) plt.hist(mcmc.trace('p_A')[:], bins=35, histtype='stepfilled', normed=True) plt.xlabel('Probability of clicking BUY') plt.ylabel('Density') plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A') plt.legend() plt.savefig('p_A_hist_N_%s.png' % N) plt.show()
  • 35. Confidence 90% that P is between X and Y? There is 90% probability that p_A is between 0.0160966147705 and 0.114655284797 p_A_samples = mcmc.trace('p_A')[:] lower_bound = np.percentile(p_A_samples, 5) upper_bound = np.percentile(p_A_samples, 95) print 'There is 90%% probability that p_A is between %s and %s' % (lower_bound, upper_bound)
  • 36. N_a = 1500 N_a = 50
  • 37. Does the red have a larger probability of being clicked? Page A A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY Page B A Tea Pot Lorem ipsum dolor sit amet, nemore accusam mel ne, usu offendit delicata id, idque splendide constituam ex vel. Sea in nemore impedit singulis, vivendo sadipscing cum ea. Eum debet torquatos prodesset cu. Mel id mollis comprehensam, nemore verear mei cu. Mei meis iuvaret vituperata ad, ne cetero iisque singulis eum. Ex magna latine virtute nam, ne graecis dissentias eloquentiam ius. Nam alienum omittam no. Eu vix docendi maiestatis signiferumque, alienum officiis delicata te pri, commodo corrumpit deterruisset eu cum. An mei tincidunt incorrupte dissentias, prompta diceret delenit vis ad. Sea ad sadipscing intellegebat, quod sumo mea cu, ei eos feugait alienum nominavi. Ei vix simul possit. Recteque tincidunt incorrupte pri no, ipsum constituam eu quo. Per ne populo quodsi persius, molestie efficiantur et his. Munere discere vis id, te sea homero suscipiantur definitionem, quot dicam vis ne. BUY
  • 38. from pymc import Uniform, rbernoulli, Bernoulli, MCMC, deterministic from matplotlib import pyplot as plt p_A_true = 0.05 p_B_true = 0.04 N_A = 1500 N_B = 750 occurrences_A = rbernoulli(p_A_true, N_A) occurrences_B = rbernoulli(p_B_true, N_B) print 'Observed frequency:' print 'A' print occurrences_A.sum() / float(N_A) print 'B' print occurrences_B.sum() / float(N_B) Observed frequency: A 0.0533333333333 B 0.0413333333333
  • 39. p_A = Uniform('p_A', lower=0, upper=1) p_B = Uniform('p_B', lower=0, upper=1) @deterministic def delta(p_A=p_A, p_B=p_B): return p_A - p_B obs_A = Bernoulli('obs_A', p_A, value=occurrences_A, observed=True) obs_B = Bernoulli('obs_B', p_B, value=occurrences_B, observed=True) mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta]) mcmc.sample(25000, 5000) [----- 14% ] 3561 of 25000 complete in 0.5 sec [--------- 25% ] 6332 of 25000 complete in 1.0 sec [------------ 33% ] 8454 of 25000 complete in 1.5 sec [--------------- 41% ] 10499 of 25000 complete in 2.0 sec [-----------------50% ] 12602 of 25000 complete in 2.5 sec [-----------------59%-- ] 14780 of 25000 complete in 3.0 sec [-----------------67%----- ] 16883 of 25000 complete in 3.5 sec [-----------------75%-------- ] 18954 of 25000 complete in 4.0 sec [-----------------83%----------- ] 20877 of 25000 complete in 4.5 sec [-----------------91%-------------- ] 22924 of 25000 complete in 5.0 sec [-----------------100%-----------------] 25000 of 25000 complete in 5.5 sec
  • 40. p_A_samples = mcmc.trace('p_A')[:] p_B_samples = mcmc.trace('p_B')[:] delta_samples = mcmc.trace('delta')[:]
  • 41. plt.subplot(3,1,1) plt.xlim(0, 0.1) plt.hist(p_A_samples, bins=35, histtype='stepfilled', normed=True, color='blue', label='Posterior of p_A') plt.vlines(p_A_true, 0, 90, linestyle='--', label='True p_A (unknown)') plt.xlabel('Probability of clicking BUY via A') plt.legend() plt.subplot(3,1,2) plt.xlim(0, 0.1) plt.hist(p_B_samples, bins=35, histtype='stepfilled', normed=True, color='green', label='Posterior of p_B') plt.vlines(p_B_true, 0, 90, linestyle='--', label='True p_B (unknown)') plt.xlabel('Probability of clicking BUY via B') plt.legend() plt.subplot(3,1,3) plt.xlim(0, 0.1) plt.hist(delta_samples, bins=35, histtype='stepfilled', normed=True, color='red', label='Posterior of delta') plt.vlines(p_A_true - p_B_true, 0, 90, linestyle='--', label='True delta (unknown)') plt.xlabel('p_A - p_B') plt.legend() plt.savefig('A_and_B.png') plt.show()
  • 42.
  • 43. p_A > p_B How much are we confident? print 'Probability that p_A > p_B:' print (delta_samples > 0).mean() Probability that p_A > p_B: 0.8919
  • 44. N_A = 1500 N_B = 750 N_A = 1500 N_B = 200
  • 45. print 'Probability that p_A > p_B:' print (delta_samples > 0).mean() Probability that p_A > p_B: 0.73455
  • 46. MCMC
  • 47. mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta]) mcmc.sample(25000, 5000) Posterior P(p_A, p_B, delta | obs_A, obs_B) as samples 25000 iterations 5000 burn-in Metropolis-Hastings algorithm
  • 48. Open the black box mcmc = MCMC([p_A, p_B, obs_A, obs_B, delta]) mcmc.sample(25000, 5000) from pymc.Matplot import plot as mcplot mcplot(mcmc)
  • 49.
  • 50.
  • 51.
  • 52. PyMC • Easy to interpret results – confidence, no p-values! • No crazy math • Computationally expensive
  • 53.
  • 55. Back
  • 57. Date HomeTeam AwayTeam FTHG FTAG FTR HTHG HTAG HTR 24/08/2013 Sampdoria Juventus 0 1 A 0 0 D 24/08/2013 Verona Milan 2 1 H 1 1 D 25/08/2013 Cagliari Atalanta 2 1 H 1 1 D 25/08/2013 Inter Genoa 2 0 H 0 0 D 25/08/2013 Lazio Udinese 2 1 H 2 0 H 25/08/2013 Livorno Roma 0 2 A 0 0 D 25/08/2013 Napoli Bologna 3 0 H 2 0 H 25/08/2013 Parma Chievo 0 0 D 0 0 D 25/08/2013 Torino Sassuolo 2 0 H 1 0 H 26/08/2013 Fiorentina Catania 2 1 H 2 1 H 31/08/2013 Chievo Napoli 2 4 A 2 2 D 31/08/2013 Juventus Lazio 4 1 H 2 1 H 01/09/2013 Atalanta Torino 2 0 H 0 0 D 01/09/2013 Bologna Sampdoria 2 2 D 1 1 D 01/09/2013 Catania Inter 0 3 A 0 1 A 01/09/2013 Genoa Fiorentina 2 5 A 0 3 A 01/09/2013 Milan Cagliari 3 1 H 2 1 H 01/09/2013 Roma Verona 3 0 H 0 0 D 01/09/2013 Sassuolo Livorno 1 4 A 0 1 A 01/09/2013 Udinese Parma 3 1 H 1 0 H 14/09/2013 Inter Juventus 1 1 D 0 0 D 14/09/2013 Napoli Atalanta 2 0 H 0 0 D 14/09/2013 Torino Milan 2 2 D 0 0 D 15/09/2013 Fiorentina Cagliari 1 1 D 0 0 D https://datahub.io/dataset/italian-football-data-serie-a-b
  • 59. Bayesian Worflow 1. Define Prior 2. Fit to observations 3. Get Posteriors
  • 60. Winning a Match Bernoulli distribution 𝑃 𝑤 = 𝑝 1 − 𝑝 𝑤 = 1 𝑤 = 0 0 0.2 0.4 0.6 0.8 Win (w=1) Lose (w=0) 𝑝
  • 62. Model the switchpoint 𝑝 = 𝑝1 𝑝2 𝑡 < 𝜏 𝑡 ≥ 𝜏 Goal -> infer 𝑝1, 𝑝2, 𝜏, 𝑝
  • 63. Bayesian Worflow 1. Define Prior 2. Fit to observations 3. Get Posteriors
  • 64. Let’s model this • goal: infer unknown p1, p2, TAU • FIRST STEP OF Bayesian Inference: assign a prior probability to different possible values of p • what would be a good prior for p1, p2? Use uniform: – p1 ~ Uniform(0,1) – p2 ~ Uniform(0,1) – TAU ~ DiscreteUniform(1, 38) • P(TAU=k)=1/38 for all k
  • 65. from pymc import Uniform, DiscreteUniform, deterministic, Bernoulli, Model, MCMC p_1 = Uniform('p_1', lower=0, upper=1) p_2 = Uniform('p_2', lower=0, upper=1) tau = DiscreteUniform('tau', lower=1, upper=38) print 'Random output: ', tau.random(), tau.random(), tau.random() Random output: 14 24 33 @deterministic def p_(tau=tau, p_1=p_1, p_2=p_2, num_matches=38): # concatenate p_1 and p_2 based on tau out = np.empty(num_matches) out[:tau] = p_1 out[tau:] = p_2 return out
  • 66. Load Data import pandas as pd df = pd.read_csv('serie_a.csv', parse_dates=['Date'], date_parser=parse_date) matches = df[(df.HomeTeam == ‘Milan’) | (df.AwayTeam == ‘Milan’)] matches = matches.set_index(['Date']) matches = compute_extra_columns(matches, team) # some pandas manipulations occur here matches[‘Win’] = … # 1 if Milan won, 0 otherwise
  • 67. Fit the Model observed_matches = Bernoulli('obs', p=p_, value=matches[['Win']], observed=True) model = Model([observed_matches, p_1, p_2, tau]) mcmc = MCMC(model) mcmc.sample(40000, 10000) p_1_samples = mcmc.trace('p_1')[:] p_2_samples = mcmc.trace('p_2')[:] tau_samples = mcmc.trace('tau')[:] print p_1_samples[:10] print p_2_samples[:10] print tau_samples[:10] [ 0.42067236 0.42067236 0.42067236 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391 0.43900391] [ 0.49213381 0.49213381 0.49213381 0.56072562 0.79863176 0.79863176 0.67416932 0.68382528 0.6069458 0.60062698] [10 10 24 35 35 35 35 27 27 27]
  • 68. plt.figure(figsize=(14.5, 10)) ax = plt.subplot(311) ax.set_autoscaley_on(False) plt.hist(p_1_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_1', color='#A60628', normed=True, bins=30) plt.legend(loc='upper left') ax = plt.subplot(312) plt.hist(p_2_samples, histtype='stepfilled', alpha=0.85, label='posterior of p_2', color='#7A68A6', normed=True, bins=30) plt.legend(loc='upper left') ax = plt.subplot(313) plt.hist(tau_samples, histtype='stepfilled', alpha=0.85, label='posterior of tau', color='#467821', normed=True, bins=30) plt.legend(loc='upper left') plt.show()
  • 69.
  • 70. Expected Win Probability num_matches = 38 N = tau_samples.shape[0] expected_p_per_match = np.zeros(num_matches) for match in range(num_matches): ix = match < tau_samples p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]]) expected_p_per_match[match] = np.percentile(p_samples_match, 50)
  • 71.
  • 72. Compute Confidence Bounds lower_p_per_match = np.zeros(num_matches) upper_p_per_match = np.zeros(num_matches) for match in range(num_matches): ix = match < tau_samples p_samples_match = np.concatenate([p_1_samples[ix], p_2_samples[~ix]]) lower_p_per_match[match] = np.percentile(p_samples_match, 5) upper_p_per_match[match] = np.percentile(p_samples_match, 95)
  • 73. Bayesian returns a distribution. What have we gained? We see uncertainty in our estimates. The wider the distribution, the less certain our posterior belief should be.

Hinweis der Redaktion

  1. imagine to build e-commerce website choose color
  2. set up experiment
  3. Interpretation of probability Freq: probability is the frequency of event Difficult to understand for other scenario E.g. Presidential Elections (happen only once) Bayes: measure of belief or confidence in an event occurring. Assign a belief of 0 to an event: certainty NO occur
  4. You look for bugs in your code You are starting to believe that there may be no bugs in this code If you think this way, then congratulations: You already are thinking Bayesian!
  5. Bayesian inference is simply updating your beliefs after considering new evidence
  6. a Python library for performing Bayesian analysis that is undaunted by the mathematiccal monster we have created The code is not random; it is probabilistic in the sense that we create probability models using programming variables as the model’s components.
  7. We go through a simple example to understand some basic features of PyMC
  8. Only one difference between A and B: any change in dynamics can be attributed to that change
  9. No need to be same number on A or on B
  10. Observed frequency <> true frequency (probability) Only for large numbers (law of large numbers)
  11. Only one difference between A and B: any change in dynamics can be attributed to that change
  12. Define a model (random variables) prior probabilities i.e. our prior belief Fit to the dataset compute posterior probabilities
  13. random variable which takes the value 1 with success probability of p and the value 0 with failure probability of 1-p. What is the value of p?
  14. random value value not determined
  15. obs: observations of clicking BUY random variable but unlike p_A we observed value argument observed to True -> value should not be changed
  16. Only one difference between A and B: any change in dynamics can be attributed to that change
  17. N_A > N_B Posterior of p_B is flatter Most of Posterior of p_A – p_B is above 0. So we are confident p_A > p_B
  18. If this probability is too low, one can try to get more samples from B (to make it less flat).
  19. Fitting a model means characterizing its posterior distribution somehow. the MCMC sampler randomly updates the values of p_A, p_B, delta  over a specified number of iterations (iter). burn parameter specifies a sufficiently large number of iterations for the algorithm to converge
  20. Recommend it Nice intro to BI and Probabilistic Programming assumes NO prior knowledge of Bayesian inference and probability HOW TO: Probability applied to real examples
  21. Was there a change in the win rate?
  22. Define a model (random variables) prior probabilities i.e. our prior belief Fit to the dataset compute posterior probabilities
  23. random variable which takes the value 1 with success probability of p and the value 0 with failure probability of 1-p. What is the value of p?
  24. What is the value of p? seems to increase at some point during observations
  25. Let’s assume that on some day TAU during the observation period the parameter p suddenly jumps to a higher value. So, we really have two p parameters: one for the period before TAU, and one for the rest of the observation period
  26. Define a model (random variables) prior probabilities i.e. our prior belief Fit to the dataset compute posterior probabilities