Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Using machine learning
to determine drivers
of bounce and conversion
Velocity 2016 Santa Clara
Pat Meenan
@patmeenan
Tammy Everts
@tameverts
What we did
(and why we did it)
Get the code
https://github.com/WPO-
Foundation/beacon-ml
Deep learning
weights
Random forest
Lots of random decision trees
Vectorizing the data
• Everything needs to be numeric
• Strings converted to several inputs as yes/no
(1/0)
• i.e. Device ...
Balancing the data
• 3% conversion rate
• 97% accurate by always guessing no
• Subsample the data for 50/50 mix
Validation data
• Train on 80% of the data
• Validate on 20% to prevent overfitting
Smoothing the data
ML works best on normally distributed data
scaler = StandardScaler()
x_train = scaler.fit_transform(x_t...
Input/output relationships
• SSL highly correlated with conversions
• Long sessions highly correlated with
not bouncing
• ...
Training deep learning
model = Sequential()
model.add(...)
model.compile(optimizer='adagrad',
loss='binary_crossentropy',
...
Training random forest
clf = RandomForestClassifier(n_estimators=FOREST_SIZE,
criterion='gini',
max_depth=None,
min_sample...
Feature importances
clf.feature_importances_
What we learned
What’s in our beacon?
• Top-level – domain, timestamp, SSL
• Session – start time, length (in pages), total load time
• Us...
Conversion rate
Conversion rate
Bounce rate
Bounce rate
Finding 1
Number of scripts was a predictor… but
not in the way we expected
Number of scripts per page (median)
Finding 2
When entire sessions were more
complex, they converted less
Finding 3
Sessions that converted
had 38% fewer images
than sessions that didn’t
Number of images per page (median)
Finding 4
DOM ready was the greatest indicator
of bounce rate
DOM ready (median)
Finding 5
Full load time was the second greatest
indicator of bounce rate
timers_loaded (median)
Finding 6
Mobile-related measurements weren’t
meaningful predictors of conversions
Conversions
Finding 7
Some conventional metrics
were (almost) meaningless, too
Feature Importance (out of 93)
DNS lookup 79
Start render 69
Takeaways
1. YMMV
2. Do this with your own data
3. Gather your RUM data
4. Run the machine learning
against it
Thanks!
Nächste SlideShare
Wird geladen in …5
×

Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers of Bounce and Conversion

295 Aufrufe

Veröffentlicht am

Recently, Google partnered with SOASTA to train a machine-learning model on a large sample of real-world performance, conversion, and bounce data. In this talk at Velocity Santa Clara, Pat Meenan of Google and Tammy Everts of SOASTA offer an overview of the resulting model—able to predict the impact of performance work and other site metrics on conversion and bounce rates.

Veröffentlicht in: Technologie
  • Login to see the comments

  • Gehören Sie zu den Ersten, denen das gefällt!

Velocity 2016 Speaking Session - Using Machine Learning to Determine Drivers of Bounce and Conversion

  1. 1. Using machine learning to determine drivers of bounce and conversion Velocity 2016 Santa Clara
  2. 2. Pat Meenan @patmeenan Tammy Everts @tameverts
  3. 3. What we did (and why we did it)
  4. 4. Get the code https://github.com/WPO- Foundation/beacon-ml
  5. 5. Deep learning weights
  6. 6. Random forest Lots of random decision trees
  7. 7. Vectorizing the data • Everything needs to be numeric • Strings converted to several inputs as yes/no (1/0) • i.e. Device manufacturer • “Apple” would be a discrete input • Watch out for input explosion (UA String)
  8. 8. Balancing the data • 3% conversion rate • 97% accurate by always guessing no • Subsample the data for 50/50 mix
  9. 9. Validation data • Train on 80% of the data • Validate on 20% to prevent overfitting
  10. 10. Smoothing the data ML works best on normally distributed data scaler = StandardScaler() x_train = scaler.fit_transform(x_train) x_val = scaler.transform(x_val)
  11. 11. Input/output relationships • SSL highly correlated with conversions • Long sessions highly correlated with not bouncing • Remove correlated features from training
  12. 12. Training deep learning model = Sequential() model.add(...) model.compile(optimizer='adagrad', loss='binary_crossentropy', metrics=["accuracy"]) model.fit(x_train, y_train, nb_epoch=EPOCH_COUNT, batch_size=32, validation_data=(x_val, y_val), verbose=2, shuffle=True)
  13. 13. Training random forest clf = RandomForestClassifier(n_estimators=FOREST_SIZE, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=12, random_state=None, verbose=2, warm_start=False, class_weight=None) clf.fit(x_train, y_train)
  14. 14. Feature importances clf.feature_importances_
  15. 15. What we learned
  16. 16. What’s in our beacon? • Top-level – domain, timestamp, SSL • Session – start time, length (in pages), total load time • User agent – browser, OS, mobile ISP • Geo – country, city, organization, ISP, network speed • Bandwidth • Timers – base, custom, user-defined • Custom metrics • HTTP headers • Etc.
  17. 17. Conversion rate
  18. 18. Conversion rate
  19. 19. Bounce rate
  20. 20. Bounce rate
  21. 21. Finding 1 Number of scripts was a predictor… but not in the way we expected
  22. 22. Number of scripts per page (median)
  23. 23. Finding 2 When entire sessions were more complex, they converted less
  24. 24. Finding 3 Sessions that converted had 38% fewer images than sessions that didn’t
  25. 25. Number of images per page (median)
  26. 26. Finding 4 DOM ready was the greatest indicator of bounce rate
  27. 27. DOM ready (median)
  28. 28. Finding 5 Full load time was the second greatest indicator of bounce rate
  29. 29. timers_loaded (median)
  30. 30. Finding 6 Mobile-related measurements weren’t meaningful predictors of conversions
  31. 31. Conversions
  32. 32. Finding 7 Some conventional metrics were (almost) meaningless, too
  33. 33. Feature Importance (out of 93) DNS lookup 79 Start render 69
  34. 34. Takeaways
  35. 35. 1. YMMV 2. Do this with your own data 3. Gather your RUM data 4. Run the machine learning against it
  36. 36. Thanks!

×