Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
1 von 34

EDHREC @ Data Science MD

3

Teilen

Herunterladen, um offline zu lesen

A talk on EDHREC, a service for magic the gathering deck recommendations. I discuss the algorithms used, my infrastructure, and some lessons learned about building data science applications.

Weitere Verwandte Inhalte

Das Könnte Ihnen Auch Gefallen

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

EDHREC @ Data Science MD

  1. 1. EDHREC, Magic: TG Recommendation Engine (and data science on games) Donald Miner @donaldpminer dminer@minerkasch.com September 21st, 2015 - Data Science MD Meetup Games & Stuff in Glen Burnie, MD
  2. 2. About Don
  3. 3. About Don, Planeswalker
  4. 4. Talk agenda  Background  EDHREC Overview  EDHREC Data Analysis  EDHREC Architecture  Data Science Application UX Lessons Learned  Related Work in Magic and Other Domains  Virtues of Data Science on Games
  5. 5. Magic: The Gathering  Trading card game  First published in 1993  20 million players in 2015 (World of Warcraft has 7.1 million subscribers)  Organized tournaments  Secondary market 1993 $27,000
  6. 6. Elder Dragon Highlander / Commander  One of the Magic “formats”  Started independently from WOTC late 00’s  Officially supported starting 2011  Typically multiplayer  100-card singleton deck (instead of 60-card, up to 4x copies)  Each deck has a single “commander” (unique to this format)
  7. 7. Data Science  Term coined around 2008  Represents a shift in data analysis in industry  A mix of computer science, machine learning, statistics, programming, visualization, and domain knowledge
  8. 8. EDHREC Overview
  9. 9. EDHREC Deck Recommendations
  10. 10. EDHREC Commander Stats
  11. 11. EDHREC Card Stats
  12. 12. EDHREC Recommendation Engine
  13. 13. EDHREC Algorithm 1.0 User-based Collaborative Filtering Image from http://blog.comsysto.com/2013/04/03/background-of-collaborative-filtering-with-mahout/ Analogy: Deck -> User Card -> Item Pros: Better at picking up bigger themes in decks Easy to implement Cons: Had issues discovering subtle deck themes Had issues pointing out combos
  14. 14. Recommendation Engine 2.0 Algorithm 31,000 decks Decks that contain Sanguine Bond AND Exquisite Blood ÷ Decks that contain Sanguine Bond OR Exquisite Blood Step 1: Card Affinity Matrix Jaccard / Tanimoto distance Repeat for every card combination (15,000 cards) This is the basis of the Card Analysis page This matrix is built offline in batch Image from http://blog.comsysto.com/2013/04/03/background-of-collaborative-filtering-with-mahout/
  15. 15. Recommendation Engine 2.0 Algorithm 31,000 decks 1. Select each row of the Tanimoto matrix corresponding to cards in Deck D 2. Sum the columns 3. Sort by score, display results Step 2: Calculate Scores This gives you a sum of the Tanimoto coefficients I really have no idea what this algorithm is called… I’m not sure if it’s novel or not This is performed in real time
  16. 16. Lessons learned: Taking out the garbage  A lot of garbage gets submitted to EDHREC  Decks with <20 cards  Decks with invalid commanders  Decks with illegal cards  The algorithms handle this well and rarely do problem cards show up  However, pruning “worthless” decks significantly improves performance due to all the O(N^2) algorithms going on General advice: Think about which pieces of data are worthless in your data set
  17. 17. Lessons learned: Partitioning (too much or too little)  Partitioning the user/deck space into subgroups is a great way to speed things up in recommendation engines  The 31,000 EDHREC decks are partitioned into 27 partitions (one per possible color combination)  Algorithms are ran typically on a single partition (e.g., Red/Blue deck recommendations only come from other Red/Blue decks)  However, themes that span color combinations suffer worse recommendations  However, partitioning too deep causes problems  I tried partitioning by commander, and that was awful: new commanders, themes than span commanders suffer General advice: There is no good way to figure out a partition scheme, just try it out
  18. 18. EDHREC Architecture
  19. 19. Batch Processes (cron) EDHREC Architecture Reddit Bot (praw)
  20. 20. Batch Processes (cron) Reddit Bot (praw) Redis • In-memory key/value data store • Stores website state • Utilized as a cache • Stores all of the decks • Stores all of the pre-computed stats • Stores all metadata about Magic cards • EDHREC serializes most things to common internal json data formats • Very fast • Very easy to use • Good support with Python • Getting harder to do “analysis” • Going to move to Redshift SQL database for analytical things
  21. 21. Batch Processes (cron) Reddit Bot (praw) Cherrypy • “A Minimalist Python Web Framework” • Runs the website • Pulls data from Redis and then renders the results as HTML • Most of the data from Redis is cached in memory objects (IPC to Redis too slow) • EDHREC runs 6 of these in parallel behind an NGINX round robin proxy • Very easy to use, doesn’t get in your way • Very easy to expose Python data science • Running into problems with maintainability due to my own sloppiness
  22. 22. Batch Processes (cron) Reddit Bot (praw) Python • Programming language • Plenty of good libraries for data analysis: numpy, pandas in this case • Can handle the “full stack” well (from data analysis to web front end) • PRAW is a great framework for building Reddit bots • Most things run every few hours
  23. 23. Batch Processes (cron) Reddit Bot (praw) Amazon Web Services • Infrastructure as a Service • Easily spin up new servers with pre-built operating system • EDHREC runs on one m4.2xlarge 8 CPUs, 32GB RAM, Better network 10 cents per hour ($72/month) • Great for recovering from failures • Easy to upgrade machine • Very good uptime so far • Easy to backup to s3
  24. 24. Some observations about User Experience and AI applications
  25. 25. LOL! Look at the dumb bot! Lesson learned: Humans LOVE pointing out when something the AI is doing is strange or wrong, even if it gets it right 90% of the time. Therefore, I am very conservative of what I end up publishing as I’ve gotten burned a few times. Which can be a shame sometimes. (just a couple examples)
  26. 26. The apocalypse is near  “EDHREC is ruining EDH/Commander”  “EDHREC is taking the fun out of deck construction”  “EDHREC kills conversation” MapQuest takes the fun out of planning trips!  Mostly these are taken as compliments  AI is going to have resistance from people who liked the manual labor  I don’t think the commentary entirely off base… but...
  27. 27. Sometimes too much is too much  Over-engineering and doing too much is an easy trap  You want to make it better and provide more “intelligence”  Give the users ability to discover and find things  Increases user engagement  Better results  Philosophy: EDHREC is a tool, not a solution  I’m starting to see my other data science projects this way Lesson learned: Spend more time on interactive “discovery tools” than intelligent do-everything algorithms
  28. 28. Interesting related things to look at
  29. 29. RoboRosewater  Rosewater is the name of the Magic lead designer  RoboRosewater is a “backwards” neural network, trained on Magic cards
  30. 30. MTG Finance Lots of analysis around Magic finance! mtgstocks.com
  31. 31. Diablo 3 build clustering
  32. 32. Virtues of this whole thing Community  Most hobbies are defined by communities  Technology can bring communities together Self-Development  Data has value and getting data of value is hard  Hobby-based data is relatively easy to acquire (compared to say data used by health care companies)  A great way to do real data science on real data (opposed to synthetic data on a more valuable data set) Profit!  Hobbyists are passionate about their hobby and willing to spend money on it  They will pay for and support services they like
  33. 33. EDHREC, Magic: TG Recommendation Engine (and data science on games) Donald Miner @donaldpminer dminer@minerkasch.com September 21st, 2015 - Data Science MD Meetup Games & Stuff in Glen Burnie, MD

Notizen

  • Building a Magic: The Gathering card game recommendation engine and using data science on data about hobbies
    In this talk, Don will give an overview of edhrec.com, a service that provides recommendations for a specific style of play in the Magic: The Gathering trading card game called Commander. The service takes user-created "decks", saves them in a database, and then provides recommendations on what other cards that user should be using in their deck. The website has been around for about a year and is visited by over 50,000 players a month as of September 2015. The talk is geared towards people that don't know anything about Magic or Commander, however, and most of the time will be spent discussing: the methods and approaches used, specifically recommendation engines and the common problems when using them in practice lessons learned about human factor of having a data-driven service that targets a passionate hobbyist population that doesn't know much about data science or even computer science the virtues of spending time on analyzing data for seemingly "toy" domains
  • Building a Magic: The Gathering card game recommendation engine and using data science on data about hobbies
    In this talk, Don will give an overview of edhrec.com, a service that provides recommendations for a specific style of play in the Magic: The Gathering trading card game called Commander. The service takes user-created "decks", saves them in a database, and then provides recommendations on what other cards that user should be using in their deck. The website has been around for about a year and is visited by over 50,000 players a month as of September 2015. The talk is geared towards people that don't know anything about Magic or Commander, however, and most of the time will be spent discussing: the methods and approaches used, specifically recommendation engines and the common problems when using them in practice lessons learned about human factor of having a data-driven service that targets a passionate hobbyist population that doesn't know much about data science or even computer science the virtues of spending time on analyzing data for seemingly "toy" domains
  • ×