SlideShare a Scribd company logo
1 of 116
Download to read offline
Building an experimentation
                          framework for web apps



                                  Zhi-Da Zhong
                                  zz@etsy.com



Tuesday, July 26, 2011
About the talk
                               Why
                               What
                            Framework
                            Break / hack
                            Tech Details
                             Test design
                              Analysis


Tuesday, July 26, 2011
Why?



Tuesday, July 26, 2011
Questions


                         “What will happen if I do X”?

                             “Is X better than Y?”




Tuesday, July 26, 2011
The future
                                  &
                         alternate universes
                              (We’re bad at those.)




Tuesday, July 26, 2011
Then what?



Tuesday, July 26, 2011
Experiments




Tuesday, July 26, 2011
Experiments



                            Try it out.




Tuesday, July 26, 2011
Experiments



                               Try it out.
                         Data beats speculation.




Tuesday, July 26, 2011
Experiments



                         Try different alternatives
                           on different people.




Tuesday, July 26, 2011
Experiments



                         Try different alternatives
                           on different people.




Tuesday, July 26, 2011
Which is better?



                                v.s.




Tuesday, July 26, 2011
Not a great experiment




Tuesday, July 26, 2011
Web apps



Tuesday, July 26, 2011
Front end experiments


                         • Layout, colors, images, copy, ...
                         • No functional changes
                         • Impact can be surprisingly high




Tuesday, July 26, 2011
A little more complex...



                         • Multipage flows
                         • Functionality changes




Tuesday, July 26, 2011
Backend experiments



                         • Why not?
                         • Algorithms, architectures, batch processes, ...




Tuesday, July 26, 2011
The Etsy search backend

                                                                      Web app
                         •   New algorithm                            search()



                         •   New RPC protocol                    searchA()   searchB()




                         •   New result data structure
                                                          Search                      Search
                         •   New Solr trunk snapshot     cluster A                   cluster B




Tuesday, July 26, 2011
DB re-architecture



                         • Postgres => Sharded MySQL
                         • Multiple experiments




Tuesday, July 26, 2011
Whole new features

                               New pages
                                   +
                              New DB tables
                                   +
                              New batch jobs
                                   +
                                   ...


Tuesday, July 26, 2011
Not just 2 variants



                         • A/B/C... tests
                         • Multi-variate tests




Tuesday, July 26, 2011
Caveats


                         • Content not under your control
                         • Price tests?
                         • Hard-to-measure/quantify things
                         • Long term impact?




Tuesday, July 26, 2011
Other tests



                         • Internal users testing
                         • Whitelisted user testing




Tuesday, July 26, 2011
Opt-in experiments




Tuesday, July 26, 2011
Complementary techniques


                         • Observed/recorded testing
                            - show different people the same thing

                         • Side-by-side testing
                            - show each person 2 alternatives




Tuesday, July 26, 2011
Side by side testing




Tuesday, July 26, 2011
How



Tuesday, July 26, 2011
A common approach


                         • JS-based
                         • Non-techie UI
                         • “No IT!”
                         • “Designed For Marketers, By Marketers”




Tuesday, July 26, 2011
Our approach


                         • The developer is the user
                         • Code as configuration
                         • An integral part of the dev process




Tuesday, July 26, 2011
Developer as the user



                         • The builder of the feature writes the test
                         • Not just a marketing tool




Tuesday, July 26, 2011
Code as config

                         • Simplicity
                         • Expressivity
                         • Quality
                         • Version => complete system state
                          •   Revision history




Tuesday, July 26, 2011
Part of the dev process



                            Every change is an experiment!




Tuesday, July 26, 2011
What does it look like?



Tuesday, July 26, 2011
Tuesday, July 26, 2011
Default => Experiment => (new) Default




Tuesday, July 26, 2011
To add a new feature...

                         + $config[‘new_search’] = array(
                         +    ‘enabled’ => ‘off’
                         + );


                             function search() {
                         +     if ($cfg->isEnabled(‘new_search’)) {
                         +       return do_new_search();
                         +     }

                                 // existing stuff
                             }




Tuesday, July 26, 2011
Deploy that



Tuesday, July 26, 2011
Now we go crazy...


                         function do_new_search() {
                           // exciting new stuff
                           // that might or might not work
                           // but we can deploy it anyway
                           // since it’s flagged off
                         }




Tuesday, July 26, 2011
Internal user testing

                             $config[‘new_search’] = array(
                         +      ‘enabled’ => ‘rampup’,
                         +      ‘rampup’ => array(
                         +        ‘admin’ => true
                                )
                             );




Tuesday, July 26, 2011
Whitelists

                             $config[‘new_search’] = array(
                                ‘enabled’ => ‘rampup’,
                                ‘rampup’ => array(
                         +        ‘whitelist’ => array('zhida'),
                                  ‘admin’ => true
                                )
                             );




Tuesday, July 26, 2011
Opt-in experiments

                             $config[‘new_search’] = array(
                                ‘enabled’ => ‘rampup’,
                                ‘rampup’ => array(
                         +        ‘group’ => 12345,
                                  ‘admin’ => true
                                )
                             );




Tuesday, July 26, 2011
A/B

                             $config[‘new_search’] = array(
                                ‘enabled’ => ‘rampup’,
                                ‘rampup’ => array(
                         +        ‘percent’ => 1.5,
                                  ‘admin’ => true
                                )
                             );




Tuesday, July 26, 2011
If it works...


                             $config[‘new_search’] = array(
                         +      ‘enabled’ => ‘on’
                             );




Tuesday, July 26, 2011
Order matters



                         Whitelist / Blacklist > Internal > Opt-in > Random




Tuesday, July 26, 2011
The framework



Tuesday, July 26, 2011
As easy as...




Tuesday, July 26, 2011
As easy as...


                         1. Pick a variant




Tuesday, July 26, 2011
As easy as...


                         1. Pick a variant
                         2. Do what it says




Tuesday, July 26, 2011
As easy as...


                         1. Pick a variant
                         2. Do what it says
                         3. Log the event




Tuesday, July 26, 2011
What's in a test?



Tuesday, July 26, 2011
Variants


                         • Key-value pairs
                          •   interpreted by the app

                         • Name
                          •   mostly for logging




Tuesday, July 26, 2011
SubjectIdProvider
                                                             function getID()

                         •       Why?
                             •       hashing and other selectors
                             •       logging
                         •       Types of subjects
                                 •     Users...but not always
                                 •     Different groups of users - sellers vs buyers, etc.
                                 •     Different ways to identify them - signed in vs signed out



Tuesday, July 26, 2011
Selectors



                         function select($subjectID) => Variant Name




Tuesday, July 26, 2011
Combining multiple selectors

                         • OR
                          •   breaks blacklists

                         • AND
                          •   breaks whitelists

                         • Sequence
                          •   works!




Tuesday, July 26, 2011
Selector sequence



                         • Defines an ordering
                         • Returns A/B/C/... or <don't care>




Tuesday, July 26, 2011
Loggers



                         function log($testKey, $variantKey, $subjectKey)




Tuesday, July 26, 2011
More => better

                         • More data
                         • More ways to track
                          •   access logs
                          •   3P analytics
                          •   custom




Tuesday, July 26, 2011
Access log augmentation


                         • Apache note
                         • Lots of log analysis tools
                          •   grep
                          •   $$




Tuesday, July 26, 2011
3P Analytics

                         • Quick to start
                         • May be cheap
                         • Volume?
                         • Lag time?
                         • Flexibility / customization?



Tuesday, July 26, 2011
3P Analytics - how


                         • Custom variables
                          •   take note of number & size limits

                         • Custom segments
                         • Canned metrics




Tuesday, July 26, 2011
3P Analytics - example

                         <script type="text/javascript">
                            var pageTracker = _gat._getTracker("UA-1234567-8");
                            pageTracker._initData();
                            pageTracker._setCustomVar(2, "AB", "search_test.variantC", 3);
                            pageTracker._trackPageview();
                         </script>




Tuesday, July 26, 2011
Our own event tracking

                                                  HTML,    event
                                                   JS     beacon
                                                                        Web app

                         •   HTML beacons                          Event log


                         •   Hadoop
                                                             Hadoop

                         •   Cloud                            Results




Tuesday, July 26, 2011
Break / hack
                          https://github.com/etsy/ab




Tuesday, July 26, 2011
Building on top of the
                                core API


Tuesday, July 26, 2011
Test builders

                         • Capture common patterns
                          •   feature ramp ups
                          •   opt-in experiments

                         • Help with test design
                          •   weight equalization
                          •   multivariate testing




Tuesday, July 26, 2011
Automatic Dispatchers

                         • Separate dispatching and work
                         • Work with components that have well-defined
                           invocation APIs
                         • Define a particular level of granularity
                         • Feel like magic



Tuesday, July 26, 2011
Dispatcher example - MVC


                         • View dispatch
                         • Controller dispatch
                         • Spring framework, etc.




Tuesday, July 26, 2011
Selector Registry


                         • Reuse           $selectorReg = array(
                                              ‘staff’ => ‘InternalUserSelector’,
                         • Clarity            ‘whitelist’ => ‘WhitelistSelector’,
                                              ‘percent’ => ‘WeightedSelector’
                         • Documentation   );




Tuesday, July 26, 2011
Randomized Selector



Tuesday, July 26, 2011
What does it mean?




Tuesday, July 26, 2011
What does it mean?


                         • Independent of subject attributes




Tuesday, July 26, 2011
What does it mean?


                         • Independent of subject attributes
                         • Independent of other tests




Tuesday, July 26, 2011
What does it mean?


                         • Independent of subject attributes
                         • Independent of other tests
                         • Independent of (coarse-grained) time




Tuesday, July 26, 2011
Persistence




Tuesday, July 26, 2011
Persistence


                         • Better experience




Tuesday, July 26, 2011
Persistence


                         • Better experience
                         • Better data




Tuesday, July 26, 2011
Persistence


                         • Better experience
                         • Better data
                         • Multi-part tests




Tuesday, July 26, 2011
Persistence


                         • Better experience
                         • Better data
                         • Multi-part tests
                         • ...but not forever




Tuesday, July 26, 2011
Ramping up/down


                         • Vary group sizes
                         • Reduce risk
                         • Distribute load




Tuesday, July 26, 2011
Persistence + Ramping

                         • Minimize inconsistency
                         • Ramping up
                          •   Should just add people to the treatment group

                         • Ramping down
                          •   Should just remove part of the treatment group




Tuesday, July 26, 2011
rand()

                         • Explicit persistence
                          •   Cookie
                          •   DB

                         • Scaling
                         • Maintenance



Tuesday, July 26, 2011
Hashing


                         variant = H(id)




Tuesday, July 26, 2011
Hashing


                         variant = H(id)


                            Persistence




Tuesday, July 26, 2011
Hashing


                                       variant = H(id)




                         Persistence



Tuesday, July 26, 2011
Hashing


                                        variant = H(id)


                                       Attribute independence




                         Persistence



Tuesday, July 26, 2011
Hashing


                                               variant = H(id)




                         Persistence   Attribute independence



Tuesday, July 26, 2011
Hashing


                                               variant = H(id)


                                          Test independence?


                         Persistence   Attribute independence



Tuesday, July 26, 2011
Hashing


                                          variant = H(test id, id)

                                               Test independence




                         Persistence   Attribute independence



Tuesday, July 26, 2011
Hashing


                                          variant = H(test id, id)




                         Persistence   Attribute independence   Test independence



Tuesday, July 26, 2011
Hashing


                                          variant = H(test id, id)


                                                 What else?


                         Persistence   Attribute independence   Test independence



Tuesday, July 26, 2011
Hashing


                                          variant = H(test id, id)


                                                  Weights!


                         Persistence   Attribute independence   Test independence



Tuesday, July 26, 2011
Hashing


                                                  h = H(test id, id)




                         Persistence   Attribute independence   Test independence



Tuesday, July 26, 2011
Hashing


                                                  h = H(test id, id)

                                          variant = P(h, weights)


                         Persistence   Attribute independence   Test independence



Tuesday, July 26, 2011
Partitioning



                         Hash


                                0         1


Tuesday, July 26, 2011
Partitioning



                         Hash


                                0               1
                                       .5
                                    Partition
Tuesday, July 26, 2011
Partitioning



                         Hash


                                0 A       B   1
                                     .5
                                  Partition
Tuesday, July 26, 2011
Ramping up



                         Hash


                                 0 A       B   1
                                         .7
                                   Partition
Tuesday, July 26, 2011
Which hash function?


                         • MD5/SHA-256/...
                         • Test it!
                         • But be careful...




Tuesday, July 26, 2011
A/B + opt-in


                         • Need to separate the groups for analysis
                         • Solution: use more than 2 variants!
                          •   Act according to variant properties
                          •   Track by variant name




Tuesday, July 26, 2011
Analysis



Tuesday, July 26, 2011
...   Confidence interval ... something something
                                     ... Binomial ... blah blah ...




Tuesday, July 26, 2011
Confidence Intervals



                         • How sure are we?
                         • What if it were random?




Tuesday, July 26, 2011
Binomial experiments




Tuesday, July 26, 2011
Binomial experiments



                             HT HTTT HT H H




Tuesday, July 26, 2011
Binomial experiments



                             HT HTTT HT H H
                             T HT HTT H HT H




Tuesday, July 26, 2011
Results



Tuesday, July 26, 2011
Dashboards




Tuesday, July 26, 2011
A few test design tips



Tuesday, July 26, 2011
Whatʼs the question?




Tuesday, July 26, 2011
Whatʼs the question?


                             What metrics?




Tuesday, July 26, 2011
Whatʼs the question?


                             What metrics?

                            How much better?




Tuesday, July 26, 2011
Who?

                         • Different roles
                         • Old vs new
                          •   Novelty
                          •   Habit
                          •   Expectation




Tuesday, July 26, 2011
When?

                         • User types vary
                         • Activity patterns vary
                         • Site content might vary
                         • Performance might vary
                         • Full weeks are often a good starting point



Tuesday, July 26, 2011
Summary



Tuesday, July 26, 2011
Better living through
                                   experimentation


                         • More risk taking => better product
                         • MTTR
                         • Lower stress




Tuesday, July 26, 2011
You can too.



Tuesday, July 26, 2011

More Related Content

Similar to Building an experimentation framework

Time Series Data Storage in MongoDB
Time Series Data Storage in MongoDBTime Series Data Storage in MongoDB
Time Series Data Storage in MongoDBsky_jackson
 
Mozilla: Continuous Deploment on SUMO
Mozilla: Continuous Deploment on SUMOMozilla: Continuous Deploment on SUMO
Mozilla: Continuous Deploment on SUMOMatt Brandt
 
CMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social DrupalCMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social DrupalBlake Hall
 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastoreikailan
 
Instagram Training for Collective Bias
Instagram Training for Collective BiasInstagram Training for Collective Bias
Instagram Training for Collective BiasDawn Camp
 
SBML (the Systems Biology Markup Language)
SBML (the Systems Biology Markup Language)SBML (the Systems Biology Markup Language)
SBML (the Systems Biology Markup Language)Mike Hucka
 
Panasonic search
Panasonic searchPanasonic search
Panasonic searchAOE
 
Writing a Crawler with Python and TDD
Writing a Crawler with Python and TDDWriting a Crawler with Python and TDD
Writing a Crawler with Python and TDDAndrea Francia
 
The State of Front End Web Development 2011
The State of Front End Web Development 2011The State of Front End Web Development 2011
The State of Front End Web Development 2011Pascal Rettig
 
Jazzed about Solr: People as a Search Problem - By Joshua Tuberville
Jazzed about Solr: People as a Search Problem - By Joshua TubervilleJazzed about Solr: People as a Search Problem - By Joshua Tuberville
Jazzed about Solr: People as a Search Problem - By Joshua Tubervillelucenerevolution
 
Jazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemJazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemLucidworks (Archived)
 
Creating common assessments in Limelight
Creating common assessments in LimelightCreating common assessments in Limelight
Creating common assessments in LimelightTerri Sallee
 
Selenium Page Objects101
Selenium Page Objects101Selenium Page Objects101
Selenium Page Objects101Adam Goucher
 
iPhone App from concept to product
iPhone App from concept to productiPhone App from concept to product
iPhone App from concept to productjoeysim
 
eXo Software Factory Overview
eXo Software Factory OvervieweXo Software Factory Overview
eXo Software Factory OverviewArnaud Héritier
 
SecurityBSides las vegas - Agnitio
SecurityBSides las vegas - AgnitioSecurityBSides las vegas - Agnitio
SecurityBSides las vegas - AgnitioSecurity Ninja
 
Drizzle 7.0, Future of Virtualizing
Drizzle 7.0, Future of VirtualizingDrizzle 7.0, Future of Virtualizing
Drizzle 7.0, Future of VirtualizingBrian Aker
 
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)Balanced Team
 

Similar to Building an experimentation framework (20)

Time Series Data Storage in MongoDB
Time Series Data Storage in MongoDBTime Series Data Storage in MongoDB
Time Series Data Storage in MongoDB
 
Mozilla: Continuous Deploment on SUMO
Mozilla: Continuous Deploment on SUMOMozilla: Continuous Deploment on SUMO
Mozilla: Continuous Deploment on SUMO
 
CMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social DrupalCMS Expo 2011 - Social Drupal
CMS Expo 2011 - Social Drupal
 
2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore2011 july-gtug-high-replication-datastore
2011 july-gtug-high-replication-datastore
 
Instagram Training for Collective Bias
Instagram Training for Collective BiasInstagram Training for Collective Bias
Instagram Training for Collective Bias
 
SBML (the Systems Biology Markup Language)
SBML (the Systems Biology Markup Language)SBML (the Systems Biology Markup Language)
SBML (the Systems Biology Markup Language)
 
Panasonic search
Panasonic searchPanasonic search
Panasonic search
 
Writing a Crawler with Python and TDD
Writing a Crawler with Python and TDDWriting a Crawler with Python and TDD
Writing a Crawler with Python and TDD
 
Frontend Caching, PHPTek 2011, Chicago
Frontend Caching, PHPTek 2011, ChicagoFrontend Caching, PHPTek 2011, Chicago
Frontend Caching, PHPTek 2011, Chicago
 
The State of Front End Web Development 2011
The State of Front End Web Development 2011The State of Front End Web Development 2011
The State of Front End Web Development 2011
 
Jazzed about Solr: People as a Search Problem - By Joshua Tuberville
Jazzed about Solr: People as a Search Problem - By Joshua TubervilleJazzed about Solr: People as a Search Problem - By Joshua Tuberville
Jazzed about Solr: People as a Search Problem - By Joshua Tuberville
 
Jazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search ProblemJazeed about Solr - People as A Search Problem
Jazeed about Solr - People as A Search Problem
 
Creating common assessments in Limelight
Creating common assessments in LimelightCreating common assessments in Limelight
Creating common assessments in Limelight
 
Selenium Page Objects101
Selenium Page Objects101Selenium Page Objects101
Selenium Page Objects101
 
iPhone App from concept to product
iPhone App from concept to productiPhone App from concept to product
iPhone App from concept to product
 
eXo Software Factory Overview
eXo Software Factory OvervieweXo Software Factory Overview
eXo Software Factory Overview
 
SecurityBSides las vegas - Agnitio
SecurityBSides las vegas - AgnitioSecurityBSides las vegas - Agnitio
SecurityBSides las vegas - Agnitio
 
Drizzle 7.0, Future of Virtualizing
Drizzle 7.0, Future of VirtualizingDrizzle 7.0, Future of Virtualizing
Drizzle 7.0, Future of Virtualizing
 
JavaScript Secrets
JavaScript SecretsJavaScript Secrets
JavaScript Secrets
 
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
 

Recently uploaded

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Building an experimentation framework

  • 1. Building an experimentation framework for web apps Zhi-Da Zhong zz@etsy.com Tuesday, July 26, 2011
  • 2. About the talk Why What Framework Break / hack Tech Details Test design Analysis Tuesday, July 26, 2011
  • 4. Questions “What will happen if I do X”? “Is X better than Y?” Tuesday, July 26, 2011
  • 5. The future & alternate universes (We’re bad at those.) Tuesday, July 26, 2011
  • 8. Experiments Try it out. Tuesday, July 26, 2011
  • 9. Experiments Try it out. Data beats speculation. Tuesday, July 26, 2011
  • 10. Experiments Try different alternatives on different people. Tuesday, July 26, 2011
  • 11. Experiments Try different alternatives on different people. Tuesday, July 26, 2011
  • 12. Which is better? v.s. Tuesday, July 26, 2011
  • 13. Not a great experiment Tuesday, July 26, 2011
  • 15. Front end experiments • Layout, colors, images, copy, ... • No functional changes • Impact can be surprisingly high Tuesday, July 26, 2011
  • 16. A little more complex... • Multipage flows • Functionality changes Tuesday, July 26, 2011
  • 17. Backend experiments • Why not? • Algorithms, architectures, batch processes, ... Tuesday, July 26, 2011
  • 18. The Etsy search backend Web app • New algorithm search() • New RPC protocol searchA() searchB() • New result data structure Search Search • New Solr trunk snapshot cluster A cluster B Tuesday, July 26, 2011
  • 19. DB re-architecture • Postgres => Sharded MySQL • Multiple experiments Tuesday, July 26, 2011
  • 20. Whole new features New pages + New DB tables + New batch jobs + ... Tuesday, July 26, 2011
  • 21. Not just 2 variants • A/B/C... tests • Multi-variate tests Tuesday, July 26, 2011
  • 22. Caveats • Content not under your control • Price tests? • Hard-to-measure/quantify things • Long term impact? Tuesday, July 26, 2011
  • 23. Other tests • Internal users testing • Whitelisted user testing Tuesday, July 26, 2011
  • 25. Complementary techniques • Observed/recorded testing - show different people the same thing • Side-by-side testing - show each person 2 alternatives Tuesday, July 26, 2011
  • 26. Side by side testing Tuesday, July 26, 2011
  • 28. A common approach • JS-based • Non-techie UI • “No IT!” • “Designed For Marketers, By Marketers” Tuesday, July 26, 2011
  • 29. Our approach • The developer is the user • Code as configuration • An integral part of the dev process Tuesday, July 26, 2011
  • 30. Developer as the user • The builder of the feature writes the test • Not just a marketing tool Tuesday, July 26, 2011
  • 31. Code as config • Simplicity • Expressivity • Quality • Version => complete system state • Revision history Tuesday, July 26, 2011
  • 32. Part of the dev process Every change is an experiment! Tuesday, July 26, 2011
  • 33. What does it look like? Tuesday, July 26, 2011
  • 35. Default => Experiment => (new) Default Tuesday, July 26, 2011
  • 36. To add a new feature... + $config[‘new_search’] = array( + ‘enabled’ => ‘off’ + ); function search() { + if ($cfg->isEnabled(‘new_search’)) { + return do_new_search(); + } // existing stuff } Tuesday, July 26, 2011
  • 38. Now we go crazy... function do_new_search() { // exciting new stuff // that might or might not work // but we can deploy it anyway // since it’s flagged off } Tuesday, July 26, 2011
  • 39. Internal user testing $config[‘new_search’] = array( + ‘enabled’ => ‘rampup’, + ‘rampup’ => array( + ‘admin’ => true ) ); Tuesday, July 26, 2011
  • 40. Whitelists $config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array( + ‘whitelist’ => array('zhida'), ‘admin’ => true ) ); Tuesday, July 26, 2011
  • 41. Opt-in experiments $config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array( + ‘group’ => 12345, ‘admin’ => true ) ); Tuesday, July 26, 2011
  • 42. A/B $config[‘new_search’] = array( ‘enabled’ => ‘rampup’, ‘rampup’ => array( + ‘percent’ => 1.5, ‘admin’ => true ) ); Tuesday, July 26, 2011
  • 43. If it works... $config[‘new_search’] = array( + ‘enabled’ => ‘on’ ); Tuesday, July 26, 2011
  • 44. Order matters Whitelist / Blacklist > Internal > Opt-in > Random Tuesday, July 26, 2011
  • 46. As easy as... Tuesday, July 26, 2011
  • 47. As easy as... 1. Pick a variant Tuesday, July 26, 2011
  • 48. As easy as... 1. Pick a variant 2. Do what it says Tuesday, July 26, 2011
  • 49. As easy as... 1. Pick a variant 2. Do what it says 3. Log the event Tuesday, July 26, 2011
  • 50. What's in a test? Tuesday, July 26, 2011
  • 51. Variants • Key-value pairs • interpreted by the app • Name • mostly for logging Tuesday, July 26, 2011
  • 52. SubjectIdProvider function getID() • Why? • hashing and other selectors • logging • Types of subjects • Users...but not always • Different groups of users - sellers vs buyers, etc. • Different ways to identify them - signed in vs signed out Tuesday, July 26, 2011
  • 53. Selectors function select($subjectID) => Variant Name Tuesday, July 26, 2011
  • 54. Combining multiple selectors • OR • breaks blacklists • AND • breaks whitelists • Sequence • works! Tuesday, July 26, 2011
  • 55. Selector sequence • Defines an ordering • Returns A/B/C/... or <don't care> Tuesday, July 26, 2011
  • 56. Loggers function log($testKey, $variantKey, $subjectKey) Tuesday, July 26, 2011
  • 57. More => better • More data • More ways to track • access logs • 3P analytics • custom Tuesday, July 26, 2011
  • 58. Access log augmentation • Apache note • Lots of log analysis tools • grep • $$ Tuesday, July 26, 2011
  • 59. 3P Analytics • Quick to start • May be cheap • Volume? • Lag time? • Flexibility / customization? Tuesday, July 26, 2011
  • 60. 3P Analytics - how • Custom variables • take note of number & size limits • Custom segments • Canned metrics Tuesday, July 26, 2011
  • 61. 3P Analytics - example <script type="text/javascript"> var pageTracker = _gat._getTracker("UA-1234567-8"); pageTracker._initData(); pageTracker._setCustomVar(2, "AB", "search_test.variantC", 3); pageTracker._trackPageview(); </script> Tuesday, July 26, 2011
  • 62. Our own event tracking HTML, event JS beacon Web app • HTML beacons Event log • Hadoop Hadoop • Cloud Results Tuesday, July 26, 2011
  • 63. Break / hack https://github.com/etsy/ab Tuesday, July 26, 2011
  • 64. Building on top of the core API Tuesday, July 26, 2011
  • 65. Test builders • Capture common patterns • feature ramp ups • opt-in experiments • Help with test design • weight equalization • multivariate testing Tuesday, July 26, 2011
  • 66. Automatic Dispatchers • Separate dispatching and work • Work with components that have well-defined invocation APIs • Define a particular level of granularity • Feel like magic Tuesday, July 26, 2011
  • 67. Dispatcher example - MVC • View dispatch • Controller dispatch • Spring framework, etc. Tuesday, July 26, 2011
  • 68. Selector Registry • Reuse $selectorReg = array( ‘staff’ => ‘InternalUserSelector’, • Clarity ‘whitelist’ => ‘WhitelistSelector’, ‘percent’ => ‘WeightedSelector’ • Documentation ); Tuesday, July 26, 2011
  • 70. What does it mean? Tuesday, July 26, 2011
  • 71. What does it mean? • Independent of subject attributes Tuesday, July 26, 2011
  • 72. What does it mean? • Independent of subject attributes • Independent of other tests Tuesday, July 26, 2011
  • 73. What does it mean? • Independent of subject attributes • Independent of other tests • Independent of (coarse-grained) time Tuesday, July 26, 2011
  • 75. Persistence • Better experience Tuesday, July 26, 2011
  • 76. Persistence • Better experience • Better data Tuesday, July 26, 2011
  • 77. Persistence • Better experience • Better data • Multi-part tests Tuesday, July 26, 2011
  • 78. Persistence • Better experience • Better data • Multi-part tests • ...but not forever Tuesday, July 26, 2011
  • 79. Ramping up/down • Vary group sizes • Reduce risk • Distribute load Tuesday, July 26, 2011
  • 80. Persistence + Ramping • Minimize inconsistency • Ramping up • Should just add people to the treatment group • Ramping down • Should just remove part of the treatment group Tuesday, July 26, 2011
  • 81. rand() • Explicit persistence • Cookie • DB • Scaling • Maintenance Tuesday, July 26, 2011
  • 82. Hashing variant = H(id) Tuesday, July 26, 2011
  • 83. Hashing variant = H(id) Persistence Tuesday, July 26, 2011
  • 84. Hashing variant = H(id) Persistence Tuesday, July 26, 2011
  • 85. Hashing variant = H(id) Attribute independence Persistence Tuesday, July 26, 2011
  • 86. Hashing variant = H(id) Persistence Attribute independence Tuesday, July 26, 2011
  • 87. Hashing variant = H(id) Test independence? Persistence Attribute independence Tuesday, July 26, 2011
  • 88. Hashing variant = H(test id, id) Test independence Persistence Attribute independence Tuesday, July 26, 2011
  • 89. Hashing variant = H(test id, id) Persistence Attribute independence Test independence Tuesday, July 26, 2011
  • 90. Hashing variant = H(test id, id) What else? Persistence Attribute independence Test independence Tuesday, July 26, 2011
  • 91. Hashing variant = H(test id, id) Weights! Persistence Attribute independence Test independence Tuesday, July 26, 2011
  • 92. Hashing h = H(test id, id) Persistence Attribute independence Test independence Tuesday, July 26, 2011
  • 93. Hashing h = H(test id, id) variant = P(h, weights) Persistence Attribute independence Test independence Tuesday, July 26, 2011
  • 94. Partitioning Hash 0 1 Tuesday, July 26, 2011
  • 95. Partitioning Hash 0 1 .5 Partition Tuesday, July 26, 2011
  • 96. Partitioning Hash 0 A B 1 .5 Partition Tuesday, July 26, 2011
  • 97. Ramping up Hash 0 A B 1 .7 Partition Tuesday, July 26, 2011
  • 98. Which hash function? • MD5/SHA-256/... • Test it! • But be careful... Tuesday, July 26, 2011
  • 99. A/B + opt-in • Need to separate the groups for analysis • Solution: use more than 2 variants! • Act according to variant properties • Track by variant name Tuesday, July 26, 2011
  • 101. ... Confidence interval ... something something ... Binomial ... blah blah ... Tuesday, July 26, 2011
  • 102. Confidence Intervals • How sure are we? • What if it were random? Tuesday, July 26, 2011
  • 104. Binomial experiments HT HTTT HT H H Tuesday, July 26, 2011
  • 105. Binomial experiments HT HTTT HT H H T HT HTT H HT H Tuesday, July 26, 2011
  • 108. A few test design tips Tuesday, July 26, 2011
  • 110. Whatʼs the question? What metrics? Tuesday, July 26, 2011
  • 111. Whatʼs the question? What metrics? How much better? Tuesday, July 26, 2011
  • 112. Who? • Different roles • Old vs new • Novelty • Habit • Expectation Tuesday, July 26, 2011
  • 113. When? • User types vary • Activity patterns vary • Site content might vary • Performance might vary • Full weeks are often a good starting point Tuesday, July 26, 2011
  • 115. Better living through experimentation • More risk taking => better product • MTTR • Lower stress Tuesday, July 26, 2011
  • 116. You can too. Tuesday, July 26, 2011