Proof of concept - xavierfeltin/mtg_data_mining GitHub Wiki
Proof of concept (POC)
This POC aims to demonstrate the interest of this research around Latent Semantic Analysis, Collaborative Filtering and TopN recommendations applied to the cards game Magic the Gathering.
The POC is available here: https://xavierfeltin.github.io/mtg_data_mining/
Structure of the interface
The interface is divided into three parts:
- Left side: user deck with its configuration and the cards that the user has selected
- Right side: the main part displaying the available cards, the recommendations, ...
- Top: the navigation around the cards catalog, the decks visualizer and the form to create a new deck
Main functionalities
Create and manage a deck
The user can select the colors and the game mode in which for his deck. The new deck will be empty when created. The creation of a new deck will replace the existing one.
Get recommendations based on a card's information
When the user is visualizing a card details, the application displays recommendations based on the similarity of their content (color, cost, description, ...) with the consulted card. (Latent Semantic Analysis)
Get recommendations based on a card use in previous games
When the user is visualizing a card details, the application displays recommendations based on how often they have been played in previous games with the consulted card. (Collaborative Filtering)
Get recommendations based on the cards selected in the deck
When the user is visualizing the cards available for the colors and mode matching his deck, the application displays recommendations based on how often they have been played in previous games with the cards selected in the deck. (TopN recommendation using Bayesian Personalized Ranking)
Visualize decks used in modelization
The user can visualize the cards of all the decks used during the generation of the TopN model.
All the recommendations are calculated in real time using the models coefficients loaded from JSON files. The models have been previously processed with the Python modules developed in this project.
Even if Association Rules have been a subject of study, they are not exploited at the moment in this POC.
Data
Which data?
The data used to compute the recommendations are coming from:
- MTGJson: Json containing the description and the different data (mana cost, multiverseid, ...) about all the cards in Magic.
- MTGDeck: in partnership with the webmaster sharing for this project 5000 dekcs of each mode (commander, legacy, standard, pauper and vintage).
The cards images used in the web interface are coming from "gatherer.wizards.com". After asking Wizards and reading the fan content policy, it seems ok to use them in this context (free, non-commercial, fan-content, ...).
Decks repartitions
The decks are divided as follow:
Colors | Commander | Legacy | Pauper | Standard | Vintage |
---|---|---|---|---|---|
Black | 117 | 99 | 99 | 58 | 1 |
Blue | 494 | 9 | 127 | 14 | 3 |
Green | 407 | 0 | 35 | 47 | 0 |
Red | 269 | 42 | 237 | 388 | 2 |
White | 47 | 41 | 4 | 41 | 0 |
No Color | 0 | 11 | 0 | 0 | 0 |
Black Blue | 22 | 60 | 521 | 282 | 22 |
Black Green | 103 | 32 | 43 | 160 | 0 |
Black Red | 17 | 193 | 22 | 337 | 7 |
Black White | 53 | 160 | 89 | 172 | 0 |
Blue Green | 7 | 0 | 103 | 80 | 0 |
Blue Red | 104 | 18 | 629 | 80 | 36 |
Blue White | 145 | 32 | 129 | 393 | 135 |
Green Red | 13 | 7 | 188 | 144 | 0 |
Green White | 49 | 21 | 57 | 129 | 3 |
Red White | 23 | 19 | 104 | 52 | 5 |
Black Blue Green | 301 | 350 | 29 | 215 | 115 |
Black Blue Red | 338 | 332 | 162 | 417 | 96 |
Black Blue White | 16 | 184 | 22 | 185 | 24 |
Black Green Red | 82 | 110 | 378 | 52 | 1 |
Black Green White | 58 | 315 | 19 | 37 | 11 |
Black Red White | 340 | 160 | 75 | 188 | 4 |
Blue Green Red | 168 | 13 | 264 | 21 | 47 |
Blue Green White | 37 | 16 | 26 | 85 | 47 |
Blue Red White | 147 | 45 | 182 | 20 | 162 |
Green Red White | 53 | 5 | 182 | 43 | 4 |
Black Blue Green Red | 66 | 1326 | 162 | 35 | 130 |
Black Blue Green White | 202 | 377 | 4 | 23 | 35 |
Black Green Red White | 129 | 266 | 22 | 6 | 9 |
Blue Green Red white | 57 | 4 | 669 | 23 | 49 |
Black Blue Green Red White | 31 | 339 | 410 | 33 | 2816 |
As said before, the strategies and the choices made by the players really depend of the mode and the colors played. The table clearly show different repartitions in colors depending of the mode. Some cards combinations may be strong in some mode and really weak in some others.
Note: since, this project is working with 5000 decks by category, there are some categories that have really few decks . The lower limit for reliable results from the deck samples is set to 30 decks. Categories with less than 30 decks are displayed in bold inside the above table. The item scores processed from the collaborative filtering for these categories are not really representative. Thus, be careful when consulting the results.
Cards repartition
Here is the repartition of the 5505 cards across the decks from every mode (around 25 000 decks):
Most of the cards are present in less than 10% of the decks. Thus, 85-90% of the cards can be considered to be played a little compared to the number of decks. The 10-15% cards remaining are common cards used to structure a deck.
With this repartition, the collaborative filtering approach is meaningful since most of the cards have almost the same support across the database (see the collaborative filtering page for more information).
Technical considerations
All the data processing for this project has been done on my laptop. I did not use external ressources. The models (> 100) were processed in 30mn by using 6 cores in parallel.
The web application is deployed on Github pages. As such, it does not have a back-end server. It relies only on JSON files for external data and Magic The Gathering web service for getting cards images.
The use of a database will be helpful to develop more efficient sorting and filtering functionalities. For example, the website will not suggesst text recommendations for cards not matching the deck colors chosen by the player.
Conclusion
This POC shows the possibilities of data science approaches applied to a complex game such as Magic.
Actual websites with hundred of thousands of decks at their disposal will be able to generate more meaningful recommendations for their players.
However, they need to remain careful. Magic the Gathering is regularly evolving by including new cards. Old game styles may become forsaken by the players. The integration of too old decks of cards in the recommendation process may lead to disappointing results for the players.