3. Our Challenge
• We have tens of millions of players and dozens of
games across multiple platforms
• Our games have diverse event taxonomies
• We want to build accurate models for personalizing
our gameplay experiences
3
4. “One of the holy grails of machine learning is
to automate more and more of the feature
engineering process.”
Pedro Domingos
CACM 2012
4
5. Our Approach
• Leverage ML libraries to automate feature engineering
• Develop Portfolio-Scale data products
• Empower our game studios with ML models
5
7. Applications
Propensity Models: What actions are players performing?
Segmentation: Who are our players?
Anomaly Detection: which players are bad actors?
Recommendation: What actions should they take?
7
8. Feature Encoding
Input Dataset
• Thousands of events per player
Feature Generation
• Aggregation with FeatureTools
Output Dataset
• A single row per player
8
Raw
Event
Data
Player Summaries
9. Propensity Models
• We predict which users are likely to act using classification models
• Game studios use propensity scores to define experiment groups
• Feature generation reduces the need for manual feature engineering
9
Data
Extract
Feature
Engineering
Feature
Application
Model
Training
Model
Publish
10. Segmentation
• Generated features are used as input to k-means clustering
• Archetype labels are assigned based on qualitative analysis
10
11. Anomaly Detection
• Players are represented as 1D images
• We train an autoencoder to reduce dimensionality
• Players with large vector differences are flagged as suspect
11
Features
Latent
Space
InputLayer
OutputLayer
Players
Features
Players
AutoencoderInput Vectors Output Vectors
12. Recommendation Systems
• Feature engineering is used for item & guild recommendations
• Cosine similarity is applied to normalized generated features
Item Recommendations
sim(u, v) = u * v
|| u || * || v ||
weighti
= ∑ sim(u, w) * rating(w, i)
w = user neighborhood
12
14. FeatureTools
• A python library for deep feature synthesis
• Represents data as entity sets
• Identifies feature descriptors for transforming your
data into a shallow and wide format
• Open-source version maintained by FeatureLabs
14
20. Applying FeatureTools
• We translate our raw tracking events into player summaries
• Supports dozens of games with diverse taxonomies
• Minimizes manual steps in our data science workflows
• Scales to millions of players and billions of records
20
22. Tech Stack
• Databricks for PySpark
• FeatureTools for generation
• Pandas UDFs for distribution
• MLlib for predictive modeling
22
23. • Introduced in Spark 2.3
• Provide Scalar and Grouped map operations
• Partitioned using a groupby clause
• Enable distributing code that uses Pandas
23
Pandas UDFs
26. AutoModel System
•Generates hundreds of propensity models
•Powers features in our games & live services
26
Data
Extract
Feature
Engineering
Feature
Application
Model
Training
Model
Publish
28. Machine Learning at Zynga
Old Approach
• Custom data science and
engineering work per model
• Months-long development cycles
• Ad-hoc process for deploying
models to production
28
New Approach
• Minimal effort spent on the
feature engineering stage
• No custom work for new games
• Model outputs are published to
application databases
29. Takeaways
• Zynga is leveraging automated feature engineering to build
Portfolio-Scale data products
• We are using PySpark to scale to tens of millions of players
• Feature generation has unlocked novel data products
29