Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Deploying Data Science for Distribution of The New York Times - Anne Bauer

492 Aufrufe

Veröffentlicht am

How many newspapers should be distributed to each store for sale every day? The data science group at The New York Times addresses this optimization problem using custom time series modeling and analytical solutions, while also incorporating qualitative business concerns. I'll describe our modeling and data engineering approaches, written in Python and hosted on Google Cloud Platform.

Veröffentlicht in: Technologie
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Deploying Data Science for Distribution of The New York Times - Anne Bauer

  1. 1. Deploying Data Science for Distribution of Anne Bauer anne.bauer@nytimes.com Lead Data Scientist, NYTimes PyData 20181017
  2. 2. Single copy newspaper distribution 1.  people still buy physical newspapers? 2.  algorithms 3.  experiments to test the algorithms 4.  ...we need to modify the algorithms 5.  app architecture
  3. 3. Single copy newspaper distribution 1.  people still buy physical newspapers? 2.  algorithms 3.  experiments to test the algorithms 4.  ...we need to modify the algorithms 5.  app architecture
  4. 4. YES! at ~ 47,000 stores! (And you should too.)
  5. 5. How many papers should we deliver to each store each day? Too many or too few: a waste of $$, or missed sales! “Single copy” optimization
  6. 6. Single copy: the process Weekly process •  Stores report sales for 1-2 weeks ago (depending on the distributor) •  We pick up the data via FTP, ingest them into our systems •  Our models are retrained, predictions run •  Predictions are handed off via FTP to the circulation department Turnaround time ~ few hours
  7. 7. Single copy: the existing algorithm Heuristics with many if/then statements •  Highest sale over recent weeks × A + B •  A, B are extremely hand-tuned by store type, location, ... •  Interspersed amid 4600 lines of COBOL
  8. 8. Single copy: the existing algorithm Heuristics with many if/then statements •  Highest sale over recent weeks × A + B •  A, B are extremely hand-tuned by store type, location, ... •  Interspersed amid 4600 lines of COBOL
  9. 9. Single copy: the existing algorithm Heuristics with many if/then statements •  Highest sale over recent weeks × A + B •  A, B are extremely hand-tuned by store type, location, ... •  Interspersed amid 4600 lines of COBOL •  Difficult to modify to include, e.g. print site cost differences •  Quintessential time series modeling problem. Perfect for data science!
  10. 10. Single copy newspaper distribution 1.  people still buy physical newspapers? 2.  algorithms 3.  experiments to test the algorithms 4.  ...we need to modify the algorithms 5.  app architecture
  11. 11. Algorithm components The problem is separable into two parts: Prediction: Given previous sales, how many papers will sell next Thursday? Policy: We think N papers will sell, with a known uncertainty distribution. How many should we send (draw)?
  12. 12. First pass: AR(1) Xt = c + φ Xt-1 + εt Daeil Kim
  13. 13. AR(1) Prediction •  Xt = c + φ Xt-1 + εt •  Today’s sale is a linear function of last week(s) •  One model per store per day of week •  Use the past year’s data to fit for c, φ •  AR(1) vs. AR(N) and training window chosen via cross-validation Policy •  Draw = ceil(demand) •  Bump: if there have been recent sell-outs, send an extra
  14. 14. AR(1) Implementation •  Python 2, with statsmodels AR model. Single script. •  Plots (matplotlib pngs) hosted using Flask to monitor draws & sales •  Run by cron on a local server •  No separate dev/prd environments; code “deployed” via scp
  15. 15. Second pass: Poisson Regression Dorian Goldman
  16. 16. Poisson Regression Prediction •  Today’s sale is a linear function of the previous week(s) and the previous year •  One model per store per day of week •  Use the past year’s data to fit model parameters •  Feature time scales chosen via cross-validation •  Assume the sales are drawn from a Poisson distribution rather than Gaussian •  Sell-outs considered in the likelihood function
  17. 17. Poisson Regression b: # papers bought d: # papers delivered (the draw) z: demand (Poisson distributed latent variable) λ: Poisson parameter for the demand distribution Each store has a different λ each day. z for that store & day is drawn from a Poisson distribution with that λ. Parameterize Poisson parameter λ as log-linear function of features X. θ are the parameters fitted in the problem via ML.
  18. 18. Poisson Regression b: # papers bought d: # papers delivered (the draw) z: demand (Poisson distributed latent variable) λ: Poisson parameter for the demand distribution Probability of the # bought given the demand depends on if the demand > papers delivered (i.e. if there was a sell-out) Use this probability for a maximum likelihood estimation of the parameters θ that describe λ
  19. 19. Poisson Regression Policy: Newsvendor Algorithm •  Profit = price × min(d, z) – cost × d •  Take derivative of the profit, set it equal to zero, implies: Probability(z <= d) = (price-cost)/price •  Optimal draw: smallest integer such that Probability(z <= d) >= (price-cost)/price •  Probability given by the CDF of the Poisson distribution, z = the demand prediction, brute force find best d. z = demand d = draw = # delivered
  20. 20. Poisson Regression Implementation: refactored code! •  Models abstracted to sklearn-like classes to allow for easy future expansion with plug & play model integration •  Common library of functions to: •  get data from the DB •  calculate costs •  check data quality •  ... •  __init__() •  query() •  transform() •  fit() •  predict() •  policy()
  21. 21. Single copy newspaper distribution 1.  people still buy physical newspapers? 2.  algorithms 3.  experiments to test the algorithms 4.  ...we need to modify the algorithms 5.  app architecture
  22. 22. Treatment & Control groups: match sales Simple approach •  Take a random sample that approximates the total sales distribution •  For each member of this “treatment” sample, find closest match in mean sales Trial & error checks! •  Exclude cases with any large differences in sales during the training period •  Only consider matches with the same production costs (~print site) •  Make sure treatment & control sell the paper on the same weekdays •  Better no match than a distant match
  23. 23. Reporting D3 Dashboard Optimize for profit: ✔ Make stakeholders happy: ✗ Our profit comes at the expense of sales! Sales matter beyond sales profit. Circulation numbers matter. Hard to quantify that value!
  24. 24. Goal: Optimize for profit ... but don’t decrease sales “too much” ∴ Constrained optimization
  25. 25. Single copy newspaper distribution 1.  people still buy physical newspapers? 2.  algorithms 3.  experiments to test the algorithms 4.  ...we need to modify the algorithms 5.  app architecture
  26. 26. Constrained newsvendor algorithm Policy: Newsvendor Algorithm •  Profit = price × min(d, z) – cost × d Maximize profit – λ × sales (negative λ to boost sales) Effectively modifies the sales price of the paper •  (price – λ) × min(d, z) – cost × d •  Optimal draw: smallest integer such that Probability(z <= d) >= (price-λ-cost)/(price-λ) Negative λ → increase effective sales price → worth sending extra papers z = demand d = draw = # delivered
  27. 27. The stakeholders choose λ To our surprise, they chose λ such that sales loss ~0 and profit was suboptimal. But still much better than the original algorithm!! This tuneable knob is very handy; we run experiments with different λs and the stakeholders can make the final decisions on which results are best. Δ | |
  28. 28. Reporting: model comparison Look at both profit and sales differences between treatment & control Leave trade-off decisions to the stakeholders: better for everyone.
  29. 29. Single copy newspaper distribution 1.  people still buy physical newspapers? 2.  algorithms 3.  experiments to test the algorithms 4.  ...we need to modify the algorithms 5.  app architecture
  30. 30. Current architecture: Google Cloud App Engine: Web front end App Engine Flex: Back ends for reporting and predictions BigQuery, Cloud Storage, Cloud SQL: for hosting data and configuration Deployed via Drone (github.com/NYTimes/drone-gae) Github → Docker → GCR → AE Flex Github → AE Standard
  31. 31. Architecture: Process Data transfer •  Weekly cron job per distributor, on AE instance •  Taskqueue task: copy data from FTP to BQ, using config info in GCS •  Task fails if the data are not there •  The task queue retries every N minutes until the data shows up Logging •  Logs sent to Stackdriver, emails sent upon errors •  Quality checks and progress messages sent to Slack
  32. 32. Architecture: Process Reporting •  Reads data from BQ •  Calculates aggregations & stats about algorithm experiments, using config info from CloudSQL (BQ & pandas) •  Saves aggregated data back to BQ •  Runs statistical tests on data quality (e.g. last week’s total sales within 3σ of previous mean), aborts if failure •  Syncs the aggregated BQ tables with CloudSQL, for use in filtering the front end UI
  33. 33. Architecture: Process Predictions •  Reads data from BQ •  Retrains and predicts next week’s sales & how many papers to deliver to each store each day (sklearn, scipy), using config info from CloudSQL •  Saves results to GCS •  Runs tests for unexpected changes in predictions, aborts if failure Upload •  The front end copies the results from GCS back to the FTP site
  34. 34. A well-distributed project experiments A/B testing algorithms with $ directly as a KPI communication Fold qualitative business concerns into the math engineering Google Cloud Platform improves our process algorithms Sell-outs, costs directly incorporated

×