Proof of concept - xavierfeltin/mtg_data_mining GitHub Wiki

Proof of concept (POC)

This POC aims to demonstrate the interest of this research around Latent Semantic Analysis, Collaborative Filtering and TopN recommendations applied to the cards game Magic the Gathering.

The POC is available here: https://xavierfeltin.github.io/mtg_data_mining/

Structure of the interface

The interface is divided into three parts:

Left side: user deck with its configuration and the cards that the user has selected
Right side: the main part displaying the available cards, the recommendations, ...
Top: the navigation around the cards catalog, the decks visualizer and the form to create a new deck

Main functionalities

Create and manage a deck

The user can select the colors and the game mode in which for his deck. The new deck will be empty when created. The creation of a new deck will replace the existing one.

Get recommendations based on a card's information

When the user is visualizing a card details, the application displays recommendations based on the similarity of their content (color, cost, description, ...) with the consulted card. (Latent Semantic Analysis)

Get recommendations based on a card use in previous games

When the user is visualizing a card details, the application displays recommendations based on how often they have been played in previous games with the consulted card. (Collaborative Filtering)

Get recommendations based on the cards selected in the deck

When the user is visualizing the cards available for the colors and mode matching his deck, the application displays recommendations based on how often they have been played in previous games with the cards selected in the deck. (TopN recommendation using Bayesian Personalized Ranking)

Visualize decks used in modelization

The user can visualize the cards of all the decks used during the generation of the TopN model.

All the recommendations are calculated in real time using the models coefficients loaded from JSON files. The models have been previously processed with the Python modules developed in this project.

Even if Association Rules have been a subject of study, they are not exploited at the moment in this POC.

Data

Which data?

The data used to compute the recommendations are coming from:

MTGJson: Json containing the description and the different data (mana cost, multiverseid, ...) about all the cards in Magic.
MTGDeck: in partnership with the webmaster sharing for this project 5000 dekcs of each mode (commander, legacy, standard, pauper and vintage).

The cards images used in the web interface are coming from "gatherer.wizards.com". After asking Wizards and reading the fan content policy, it seems ok to use them in this context (free, non-commercial, fan-content, ...).

Decks repartitions

The decks are divided as follow:

Colors	Commander	Legacy	Pauper	Standard	Vintage
Black	117	99	99	58	1
Blue	494	9	127	14	3
Green	407	0	35	47	0
Red	269	42	237	388	2
White	47	41	4	41	0
No Color	0	11	0	0	0
Black Blue	22	60	521	282	22
Black Green	103	32	43	160	0
Black Red	17	193	22	337	7
Black White	53	160	89	172	0
Blue Green	7	0	103	80	0
Blue Red	104	18	629	80	36
Blue White	145	32	129	393	135
Green Red	13	7	188	144	0
Green White	49	21	57	129	3
Red White	23	19	104	52	5
Black Blue Green	301	350	29	215	115
Black Blue Red	338	332	162	417	96
Black Blue White	16	184	22	185	24
Black Green Red	82	110	378	52	1
Black Green White	58	315	19	37	11
Black Red White	340	160	75	188	4
Blue Green Red	168	13	264	21	47
Blue Green White	37	16	26	85	47
Blue Red White	147	45	182	20	162
Green Red White	53	5	182	43	4
Black Blue Green Red	66	1326	162	35	130
Black Blue Green White	202	377	4	23	35
Black Green Red White	129	266	22	6	9
Blue Green Red white	57	4	669	23	49
Black Blue Green Red White	31	339	410	33	2816

As said before, the strategies and the choices made by the players really depend of the mode and the colors played. The table clearly show different repartitions in colors depending of the mode. Some cards combinations may be strong in some mode and really weak in some others.

Note: since, this project is working with 5000 decks by category, there are some categories that have really few decks . The lower limit for reliable results from the deck samples is set to 30 decks. Categories with less than 30 decks are displayed in bold inside the above table. The item scores processed from the collaborative filtering for these categories are not really representative. Thus, be careful when consulting the results.

Cards repartition

Here is the repartition of the 5505 cards across the decks from every mode (around 25 000 decks): Cards repartition

Most of the cards are present in less than 10% of the decks. Thus, 85-90% of the cards can be considered to be played a little compared to the number of decks. The 10-15% cards remaining are common cards used to structure a deck.

With this repartition, the collaborative filtering approach is meaningful since most of the cards have almost the same support across the database (see the collaborative filtering page for more information).

Technical considerations

All the data processing for this project has been done on my laptop. I did not use external ressources. The models (> 100) were processed in 30mn by using 6 cores in parallel.

The web application is deployed on Github pages. As such, it does not have a back-end server. It relies only on JSON files for external data and Magic The Gathering web service for getting cards images.

The use of a database will be helpful to develop more efficient sorting and filtering functionalities. For example, the website will not suggesst text recommendations for cards not matching the deck colors chosen by the player.

Conclusion

This POC shows the possibilities of data science approaches applied to a complex game such as Magic.

Actual websites with hundred of thousands of decks at their disposal will be able to generate more meaningful recommendations for their players.

However, they need to remain careful. Magic the Gathering is regularly evolving by including new cards. Old game styles may become forsaken by the players. The integration of too old decks of cards in the recommendation process may lead to disappointing results for the players.