This document discusses Riot Games' use of Apache Spark and machine learning to combat abusive language in League of Legends chat. It summarizes that Riot used Word2Vec and TF-IDF models on months of chat logs to identify toxic language like "noob" and "rekt". Riot then used Apache Spark ML to scale these models and their complexity to larger datasets and neural networks using techniques like GPUs and TensorFlow. This helped Riot better shield players from toxicity in games.
3. Choose
From over 130 champions, each
having a unique backstory and
abilities.
Compete
With your team to complete
objectives and battle the enemy
team.
Win
Take down defenses and destroy
the enemy nexus.
5. 1%
of all players are
consistently
unsportsmanlike
2%
of all games infected by
serious toxicity
In-Game Toxicity
95%
of all serious toxicity
comes from players who
are otherwise
sportsmanlike
6.
7.
8. wN
w1
Exploration
Word2Vec
i was out of mana
⎲g(embeddings)
⎳
context target (wt
)
w2
wt
Predictnearbywords(wt
)
. . .
256 dimension embeddings
month of chat logs
each line of chat is a document
split on spaces and lower case
14. Enhancements
● AWS Clusters
● Apache Spark ML
Pros
Scale out model complexity
Scale out training data size
Extractors
Word2Vec
TF-IDF
CountVectorizer
Transformers
n-grams
Tokenizer
Standard Scaler
Algorithms
Logistic Regression
Random Forests
Gradient Boosted
Trees
Spark Machine Learning Library
20. Conclusions
helps us to …
● Shield our players from extreme toxicity in games!
● Rapidly explore the space of solutions
● Scale to far larger datasets than we could process before
● Scale hyperparameter searches across neural network architectures