We use Iterative Supervised Clustering as a simple building block for exploring Pinterest's Content. But simplicity can unlock great power and with this building block we show the shocking result of how hard it is to replicated data science conclusions. This begs us to challenge the future for When is Data Science a House of Cards?
10. Tool Pros Cons
Cluster algorithms
(SVM, K-Means, Spectral)
• Considers all users
• Accurate
• Tough to communicate
• Definitions change over time
User experience studies • Deep knowledge
• Captures the immeasurable
• Costly
• Considers few users
Domain expert hypothesis • Human interpretable • Inaccurate
11. Tool Pros Cons
Cluster algorithms
(SVM, K-Means, Spectral)
• Considers all users
• Accurate
• Tough to communicate
• Definitions change over time
User experience studies • Deep knowledge
• Captures the immeasurable
• Costly
• Considers few users
Domain expert hypothesis • Human interpretable • Inaccurate
27. Iteration2
42% of domains left
Few Many Few Some Few Many
0 0 0 0 0 0
Cluster 1 Cluster 3Cluster 2
Pin creates Repins Pin creates RepinsPin creates Repins
28. Description
Domains with few Pins, but
these Pins thrive in the
Pinterest ecosystem
Calculation
def
detect_pinterest_specials(domain_engagement):
ratio = domain_engagement.n_repins / max(1.0,
float(domain_engagement.n_pin_creates))
return domain_engagement.n_pin_creates <= X
and ratio >= Y
Examples Fashion and impulse sites
Iteration2
Pinterest specials
Few
Pinterest specials
Repins
Many
0 0
Pin creates
29. Iteration3
33% of domains left
Few Few Few Some Few Many
0 0 0 0 0 0
Cluster 1 Cluster 3Cluster 2
Pin creates Repins Pin creates RepinsPin creates Repins
30. Iteration3
Steady growth
Description
Active Pin creates and
steady growth throughout
the year
Calculation
def detect_steady_growth(domain_engagement):
(growth_rate, intercept) =
np.polyfit(range(len(domain_engagement.monthly_repins)
), domain_engagement.monthly_repins,1)
return months_pins_created >= X and growth_rate >= Y
Examples Recipe and DIY sites
Some
Steady growth
Repins
Many
0 0
Pin creates
31. Iteration4
25% of domains left
Few Some Many Some Few Some
0 0 0 0 0 0
Cluster 1 Cluster 3Cluster 2
Pin creates Repins Pin creates RepinsPin creates Repins
32. Iteration4
Slow growth
Description Similar to steady growth,
but not as fast
Calculation
def detect_steady_growth(domain_engagement):
(growth_rate, intercept) = np.podef
detect_steady_growth(domain_engagement):
(growth_rate, intercept) =
np.polyfit(range(len(domain_engagement.monthly_repins)),
domain_engagement.monthly_repins,1)
return months_pins_created >= X and growth_rate >=
Ylyfit(range(len(domain_engagement.monthly_repins)),
domain_engagement.monthly_repins,1)
return months_pins_created >= X and growth_rate >= Y
Examples Little lower quality recipe
and DIY sites
Few
Slow growth
Repins
Many
0 0
Pin creates
33. Iteration5
Churning
Description Slowly fade through the year
Calculation
def detect_churning(domain_engagement):
(repin_growth, intercept) = np.polyfit(
range(len(domain_engagement.monthly_repins) - 2),
domain_engagement.monthly_repins[2:],
1)
(pin_create_growth, intercept) = np.polyfit(
range(len(domain_engagement.monthly_repins) - 2),
domain_engagement.monthly_pin_creates[2:],
1)
return repin_growth < 0 and pin_create_growth < 0
Examples Fashion sale
and click bait sites
Few
Churning
Repins
Many
0 0
Pin creates
34. Iteration6
Yearly
Description Slowly fade through the year
Calculation
def detect_churning(domain_engagement):
(repin_growth, intercept) = np.polyfit(
range(len(domain_engagement.monthly_repins) - 2),
domain_engagement.monthly_repins[2:],
1)
(pin_create_growth, intercept) = np.polyfit(
range(len(domain_engagement.monthly_repins) - 2),
domain_engagement.monthly_pin_creates[2:],
1)
return repin_growth < 0 and pin_create_growth < 0
Examples Seasonal fashion,
such as snow boots
Few
Yearly
Pin creates Repins
Many
0 0
35. Iteration7
Late bloomer
Description Peak mid year
Calculation
def detect_late_bloomer(domain_engagement):
(concavity, pin_growth, intercept) = np.polyfit(
range(len(domain_engagement.monthly_repins) - 2),
[r + p for (r, p) in zip(domain_engagement.monthly_repins[2:],
domain_engagement.monthly_pin_creates[2:])],
2)
return concavity < 0
Examples Blogs that get off to a slow
start
Few
Pinterest late bloomer
Pin creates Repins
Many
0 0
48. Baseline
clusters
Results e Results l Results d Results m Results z Results b Results k
Dark content
Pinterest specials
Steady growth
Slow growth
Churning
Yearly
Late bloomer
Existingclustersasourbaseline
49. Baseline
clusters
Results e Results l Results d Results m Results z Results b Results k
Dark content Unpopular (95%) Trailing (90%)
Pinterest specials Trailing (100%)
Viral on Pinterest
(98%)
Pin creates drop
off (97%)
Steady growth
Increasing repins
(94%)
Continuous
growth (94%)
Slow growth
Churning
Yearly
Late bloomer
90%Matches
50. Baseline
clusters
Results e Results l Results d Results m Results z Results b Results k
Dark content Unpopular (95%) Trailing (90%)
Original pinny
(84%)
Pinterest specials Trailing (100%)
Minimal original
Pins (66%)
Viral on Pinterest
(98%)
Pin creates drop
off (97%)
Steady growth
Pinterest viral
content (62%) Other (53%)
Original Pinny
(51%)
Viral on the
internet (69%)
Increasing repins
(94%)
Continuous
growth (94%)
Suspected Save
button high Pin
creates (73%)
Slow growth
Pinterest viral
content (55%)
Original Pinny
(82%)
Viral on the
internet (65%)
Increasing repins
(65%)
Continuous
growth (86%)
Suspected Save
button high Pin
creates (51%)
Churning
Original Pinny
(68%)
Viral on the
internet (53%)
Yearly
Original Pinny
(71%)
Late bloomer
Original Pinny
(71%)
Continuous
growth (55%)
Suspected Save
button high Pin
creates (59%)
50%Matches
51. Baseline
Clusters
Results e Results l Results d Results m Results z Results b Results k
Dark content Unpopular (95%) Trailing (90%)
Original pinny
(84%)
Pinterest specials Trailing (100%)
Minimal original
Pins (66%)
Viral on Pinterest
(98%)
Pin creates drop
off (97%)
Steady growth
Pinterest viral
content (62%) Other (53%)
Original Pinny
(51%)
Viral on the
internet (69%)
Increasing repins
(94%)
Continuous
growth (94%)
Suspected Save
button high Pin
creates (73%)
Slow growth
Pinterest viral
content (55%)
Original Pinny
(82%)
Viral on the
internet (65%)
Increasing repins
(65%)
Continuous
growth (86%)
Suspected Save
button high Pin
creates (51%)
Churning
Original Pinny
(68%)
Viral on the
internet (53%)
Yearly
Original Pinny
(71%)
Late bloomer
Original Pinny
(71%)
Continuous
growth (55%)
Suspected Save
button high Pin
creates (59%)
50%Matches
52. Baseline
clusters
Results e Results l Results d Results m Results z Results b Results k
Dark content Unpopular (95%) Trailing (90%)
Original pinny
(84%)
Pinterest specials Trailing (100%)
Minimal original
Pins (66%)
Viral on Pinterest
(98%)
Pin creates drop
off (97%)
Steady growth
Pinterest viral
content (62%) Other (53%)
Original Pinny
(51%)
Viral on the
internet (69%)
Increasing repins
(94%)
Continuous
growth (94%)
Suspected Save
button high Pin
creates (73%)
Slow growth
Pinterest viral
content (55%)
Original Pinny
(82%)
Viral on the
internet (65%)
Increasing repins
(65%)
Continuous
growth (86%)
Suspected Save
button high Pin
creates (51%)
Churning
Original Pinny
(68%)
Viral on the
internet (53%)
Yearly
Original Pinny
(71%)
Late bloomer
Original Pinny
(71%)
Continuous
growth (55%)
Suspected Save
button high Pin
creates (59%)
50%Matches
53. Baseline
clusters
Results e Results l Results d Results m Results z Results b Results k
Dark content Unpopular (95%) Trailing (90%)
Original pinny
(84%)
Pinterest specials Trailing (100%)
Minimal original
Pins (66%)
Viral on Pinterest
(98%)
Pin creates drop
off (97%)
Steady growth
Pinterest viral
content (62%) Other (53%)
Original Pinny
(51%)
Viral on the
internet (69%)
Increasing repins
(94%)
Continuous
growth (94%)
Suspected Save
button high Pin
creates (73%)
Slow growth
Pinterest viral
content (55%)
Original Pinny
(82%)
Viral on the
internet (65%)
Increasing repins
(65%)
Continuous
growth (86%)
Suspected Save
button high Pin
creates (51%)
Churning
Original Pinny
(68%)
Viral on the
internet (53%)
Yearly
Original Pinny
(71%)
Late bloomer
Original Pinny
(71%)
Continuous
growth (55%)
Suspected Save
button high Pin
creates (59%)
50%Matches
54. Baseline
clusters
Results e Results l Results d Results m Results z Results b Results k
Yearly Seasonal Throwback Seasonal Annual
Steady growth
Gaining
popularity Increasing repins
Continuous
growth High engagement
Pinterest specials Initial flurry
Minimal original
Pins Viral on Pinterest
Pin create drop
off
Unpopular
domains with
good content
Conceptuallysimilarclusters
But not related in implementation