Proof of concept - xavierfeltin/mtg_data_mining GitHub Wiki

Proof of concept (POC)

This POC aims to demonstrate the interest of this research around Latent Semantic Analysis, Collaborative Filtering and TopN recommendations applied to the cards game Magic the Gathering.

The POC is available here: https://xavierfeltin.github.io/mtg_data_mining/

Structure of the interface

The interface is divided into three parts:

  • Left side: user deck with its configuration and the cards that the user has selected
  • Right side: the main part displaying the available cards, the recommendations, ...
  • Top: the navigation around the cards catalog, the decks visualizer and the form to create a new deck

Main functionalities

Create and manage a deck

The user can select the colors and the game mode in which for his deck. The new deck will be empty when created. The creation of a new deck will replace the existing one.

Get recommendations based on a card's information

When the user is visualizing a card details, the application displays recommendations based on the similarity of their content (color, cost, description, ...) with the consulted card. (Latent Semantic Analysis)

Get recommendations based on a card use in previous games

When the user is visualizing a card details, the application displays recommendations based on how often they have been played in previous games with the consulted card. (Collaborative Filtering)

Get recommendations based on the cards selected in the deck

When the user is visualizing the cards available for the colors and mode matching his deck, the application displays recommendations based on how often they have been played in previous games with the cards selected in the deck. (TopN recommendation using Bayesian Personalized Ranking)

Visualize decks used in modelization

The user can visualize the cards of all the decks used during the generation of the TopN model.

All the recommendations are calculated in real time using the models coefficients loaded from JSON files. The models have been previously processed with the Python modules developed in this project.

Even if Association Rules have been a subject of study, they are not exploited at the moment in this POC.

Data

Which data?

The data used to compute the recommendations are coming from:

  • MTGJson: Json containing the description and the different data (mana cost, multiverseid, ...) about all the cards in Magic.
  • MTGDeck: in partnership with the webmaster sharing for this project 5000 dekcs of each mode (commander, legacy, standard, pauper and vintage).

The cards images used in the web interface are coming from "gatherer.wizards.com". After asking Wizards and reading the fan content policy, it seems ok to use them in this context (free, non-commercial, fan-content, ...).

Decks repartitions

The decks are divided as follow:

Colors Commander Legacy Pauper Standard Vintage
Black 117 99 99 58 1
Blue 494 9 127 14 3
Green 407 0 35 47 0
Red 269 42 237 388 2
White 47 41 4 41 0
No Color 0 11 0 0 0
Black Blue 22 60 521 282 22
Black Green 103 32 43 160 0
Black Red 17 193 22 337 7
Black White 53 160 89 172 0
Blue Green 7 0 103 80 0
Blue Red 104 18 629 80 36
Blue White 145 32 129 393 135
Green Red 13 7 188 144 0
Green White 49 21 57 129 3
Red White 23 19 104 52 5
Black Blue Green 301 350 29 215 115
Black Blue Red 338 332 162 417 96
Black Blue White 16 184 22 185 24
Black Green Red 82 110 378 52 1
Black Green White 58 315 19 37 11
Black Red White 340 160 75 188 4
Blue Green Red 168 13 264 21 47
Blue Green White 37 16 26 85 47
Blue Red White 147 45 182 20 162
Green Red White 53 5 182 43 4
Black Blue Green Red 66 1326 162 35 130
Black Blue Green White 202 377 4 23 35
Black Green Red White 129 266 22 6 9
Blue Green Red white 57 4 669 23 49
Black Blue Green Red White 31 339 410 33 2816

As said before, the strategies and the choices made by the players really depend of the mode and the colors played. The table clearly show different repartitions in colors depending of the mode. Some cards combinations may be strong in some mode and really weak in some others.

Note: since, this project is working with 5000 decks by category, there are some categories that have really few decks . The lower limit for reliable results from the deck samples is set to 30 decks. Categories with less than 30 decks are displayed in bold inside the above table. The item scores processed from the collaborative filtering for these categories are not really representative. Thus, be careful when consulting the results.

Cards repartition

Here is the repartition of the 5505 cards across the decks from every mode (around 25 000 decks): Cards repartition

Most of the cards are present in less than 10% of the decks. Thus, 85-90% of the cards can be considered to be played a little compared to the number of decks. The 10-15% cards remaining are common cards used to structure a deck.

With this repartition, the collaborative filtering approach is meaningful since most of the cards have almost the same support across the database (see the collaborative filtering page for more information).

Technical considerations

All the data processing for this project has been done on my laptop. I did not use external ressources. The models (> 100) were processed in 30mn by using 6 cores in parallel.

The web application is deployed on Github pages. As such, it does not have a back-end server. It relies only on JSON files for external data and Magic The Gathering web service for getting cards images.

The use of a database will be helpful to develop more efficient sorting and filtering functionalities. For example, the website will not suggesst text recommendations for cards not matching the deck colors chosen by the player.

Conclusion

This POC shows the possibilities of data science approaches applied to a complex game such as Magic.

Actual websites with hundred of thousands of decks at their disposal will be able to generate more meaningful recommendations for their players.

However, they need to remain careful. Magic the Gathering is regularly evolving by including new cards. Old game styles may become forsaken by the players. The integration of too old decks of cards in the recommendation process may lead to disappointing results for the players.