ItemKNN - xavierfeltin/mtg_data_mining GitHub Wiki
ItemKNN directly takes advantage of the work already done on the collaborative filtering approaches in the first part of this project.
First, ItemKNN builds a matrix with the rating of the users on the different items of the catalog.
ItemA | ItemB | ItemC | |
---|---|---|---|
User1 | 0 | 4 | 3 |
User2 | 1 | 5 | 3 |
User3 | 2 | 4 | 4 |
From there, it deduces a similarity matrix. It is the same matrix described in the collaborative filtering page. For each pair of items bought together a similarity score is computed.
Then the k-nearest-neighbors of each item are kept (k highest scores for each item). Lastly, the model will be confronted with the current shopping cart to obtain the top N recommendations for the user.
The publication presents different models with variations to compute the similarity between two items (only the cosine distance was used in the first part of this project)
This model is based on the direction of the vectors representing the two cards (i and j). The vectors contain 0 if the item has not been rated by an user or the rating score given by the user to this item.
This model is based on the support of two individual cards (i and j) and the support of the two cards together (ij). The support of the second card as a coefficient between 0 and 1 to parameter the model. In the publication, this coefficient is set to 0.5.
With these two models, two normalization of the data are presented: row and similarity normalizations.
In order to reduce the contribution of users with many items over the contribution of users with fewer items, the ranking scores are normalized for each user. This treatment is performed before computing the similarity scores with a cosine or a conditional probability model.
Thus, the precedent matrix becomes (normalization by row):
ItemA | ItemB | ItemC | |
---|---|---|---|
User1 | 0 | 4/7 | 3/7 |
User2 | 1/9 | 5/9 | 3/9 |
User3 | 2/10 | 4/10 | 4/10 |
This normalization is applied to reduce the fact that the similarity scores between an item and its k most similar items can be scaled differently from an item to another. The scaling issue is coming from the fact that the neighborood of each items is more or less densely populated.
Here, an example of a similarity score matrix computed with the cosine model:
ItemA | ItemB | ItemC | |
---|---|---|---|
ItemA | 1.0 | 0.6 | 0.5 |
itemB | 0.6 | 1.0 | 0.7 |
ItemC | 0.5 | 0.7 | 1.0 |
Thus, the smiliarity score matrix becomes (normalization by column):
ItemA | ItemB | ItemC | |
---|---|---|---|
ItemA | 1.0/2.1 | 0.6/2.3 | 0.5/2.2 |
itemB | 0.6/2.1 | 1.0/2.3 | 0.7/2.2 |
ItemC | 0.5/2.1 | 0.7/2.3 | 1.0/2.2 |
This approach is transposed to Magic the Gathering by considering decks and cards instead of users and items. We consider here only if a card has been selected or not by a player in a deck.
Consequently, each deck is considered as an independent user. The rating score is 1 if the deck contains the card and 0 otherwise.
The users x items matrix with the ratings scores may be transposed as following:
CardA | CardB | CardC | |
---|---|---|---|
Deck1 | 0 | 1 | 1 |
Deck2 | 1 | 0 | 0 |
Deck3 | 1 | 1 | 1 |
The approach is applied as such with the processing of similarity scores between items, the selection of the k-nearest-neighbors for each item and the recommendation of the top N cards based on the current selection of cards by the player to build a new deck.
From the publication, the following similarity models have been implemented:
- Cosine similarity:
- Vanilla
- With the normalization of:
- similarity scores
- decks and similarity scores
- Conditional Probability-Based Similarity with a normalization of:
- With the normalization of:
- similarity scores
- decks and similarity scores
- With the normalization of:
An experimental model has been added by adding a multiplier coefficient to the cosine model equals to the textual similarity of the two cards processed with LSA. The objective is to take into consideration another information about the cards to see if there may be a correlation and so improve the model.
The decks are grouped by mode and colors as described in the proof of concept page.
To validate and measure the quality of a recommendation model, the decks are separated in two:
- A card is randomly taken out of the deck and put in a testing set
- The rest of the deck is used as the training set
The publication defines two indicators to measure the quality of a model.
The first one is the Hit Rate (HR). It measures how many times the model has been able to recommend the missing card in the testing set for each deck. A score of 1.0 means for all the decks the missing card has been recommended among the N recommendations.
The second one is the Average Reciprocal Hit Rank (ARHR). It measures in which position among the N recommendations, the card in the testing set has been recommended for each deck. A score of 1.0 means all the cards have been recommended in the first position.
The values of the HR and ARHR indicators used to compare the different models between them are the mean over 20 runs. At each run, different random training and testing sets are created.
To measure the significance at 95% of the difference of performance between two models, the paired t-test has been applied with a Bonferroni correction.
Legend
Decks Groups | Models |
---|---|
|
|
Structure of the studied data sets:
Deck groups | Nb of decks | Nb of cards | Nb of cards per deck | Density |
---|---|---|---|---|
1 | 332 | 373 | 29 | 7.67% |
2 | 329 | 623 | 28 | 4.56% |
3 | 377 | 518 | 33 | 6.33% |
4 | 237 | 116 | 18 | 15.31% |
5 | 410 | 355 | 28 | 9.29% |
6 | 269 | 661 | 69 | 10.49% |
7 | 202 | 635 | 98 | 15.43% |
8 | 338 | 668 | 95 | 14.15% |
9 | 1326 | 485 | 31 | 6.45% |
10 | 2816 | 776 | 39 | 5.01% |
HR of each model against all studied data sets with 95% of confidence: (in bold the best scores at 95% of confidence)
cos | cos | cos | cos+lsa | cos+lsa | cos+lsa | proba | proba | |
---|---|---|---|---|---|---|---|---|
- | S | D,S | - | S | D,S | - | D,S | |
1 | 0,621 | 0,632 | 0,633 | 0,619 | 0,634 | 0,64 | 0,311 | 0,357 |
2 | 0,676 | 0,698 | 0,694 | 0,681 | 0,699 | 0,698 | 0,49 | 0,518 |
3 | 0,585 | 0,61 | 0,605 | 0,586 | 0,604 | 0,604 | 0,223 | 0,26 |
4 | 0,847 | 0,851 | 0,856 | 0,849 | 0,856 | 0,857 | 0,8 | 0,804 |
5 | 0,578 | 0,623 | 0,648 | 0,579 | 0,626 | 0,649 | 0,452 | 0,492 |
6 | 0,336 | 0,349 | 0,341 | 0,339 | 0,352 | 0,347 | 0,143 | 0,187 |
7 | 0,414 | 0,442 | 0,443 | 0,429 | 0,46 | 0,465 | 0,187 | 0,249 |
8 | 0,364 | 0,392 | 0.395 | 0.364 | 0.397 | 0.403 | 0,189 | 0,186 |
9 | 0,74 | 0,758 | 0,752 | 0,744 | 0.759 | 0.75 | 0,499 | 0,566 |
10 | 0,663 | 0,707 | 0,673 | 0,662 | 0.708 | 0.669 | 0,186 | 0,21 |
ARHR of each model against all studied data sets with 95% of confidence: (in bold the best scores at 95% of confidence)
cos | cos | cos | cos+lsa | cos+lsa | cos+lsa | proba | proba | |
---|---|---|---|---|---|---|---|---|
- | S | D,S | - | S | D,S | - | D,S | |
1 | 0,451 | 0,46 | 0,463 | 0,443 | 0,457 | 0,464 | 0,21 | 0,235 |
2 | 0,533 | 0,549 | 0,551 | 0,531 | 0,547 | 0,548 | 0,318 | 0,341 |
3 | 0,438 | 0,453 | 0,454 | 0,441 | 0,455 | 0,453 | 0,13 | 0,155 |
4 | 0.584 | 0.613 | 0.626 | 0,594 | 0.624 | 0.635 | 0,535 | 0,529 |
5 | 0,47 | 0,469 | 0,487 | 0,487 | 0,469 | 0,487 | 0,247 | 0,249 |
6 | 0,214 | 0.22 | 0.212 | 0.222 | 0.229 | 0,226 | 0,077 | 0,101 |
7 | 0,25 | 0,262 | 0,261 | 0,258 | 0,275 | 0,275 | 0,121 | 0,149 |
8 | 0,214 | 0,225 | 0.226 | 0.211 | 0.228 | 0.23 | 0,111 | 0,113 |
9 | 0,518 | 0,525 | 0,507 | 0,518 | 0.524 | 0.51 | 0,33 | 0,349 |
10 | 0,478 | 0,516 | 0,495 | 0,479 | 0.514 | 0.49 | 0,11 | 0,132 |
The cosine similarity approach modelizes better the studied data sets. The probability based approach is counter performing compared to the results obtained in the publication. It may be due to the fact that card decks have a standardized structure.
Contrary to the similarity score normalization, the row normalization does not add a significant gain. This is due to the fact card decks follow particular game rules depending of the mode. Two of these rules are the number of cards for the deck (100 in Commander and usually 60 cards for the other modes) and how many times a same card can be used in a deck (usually maximum 4 times).
Since the density of the neighborood for two cards may be really different, as expected the normalization of the similarity score has a bigger impact on the indicators.
The experimental coefficient based on the card's textual information adds a small gain on the HR indicator. The ARHR indicator is a bit under the best score but the ranking of recommendatiions is still consistent. The model takes a bit longer to process due to the computing of the LSA but it does not impact the processing of the top N recommendations afterwards.
Here is an example of recommendations obtained with the cosine+lsa model corrected with decks and similarity scores normalization.
For this example, the deck is coming from deckstat.net website (a different source from MTGDeck decks). It is a Legacy Blue Black Red Green deck. The recommendations were processed with the model training with the equivalent deck database from MTGDeck.
Two important facts:
- The cards for this deck are present in the deck database
- The deck is not present as such in the training database
In yellow, the cards already selected by the player to build his deck.
We are expecting some of the other cards in the top 10 recommendations made by the model.
The top 10 recommendations made by the model are as follows: (in bold recommended cards matching the original deck)
Ranking | Multiverseid | Card's name |
---|---|---|
1 | 228255 | Flusterstorm |
2 | 386627 | Polluted Delta |
3 | 190393 | Scalding Tarn |
4 | 886 | Underground Sea |
5 | 376559 | Toxic Deluge |
6 | 878 | Badlands |
7 | 887 | Volcanic Island |
8 | 884 | Tropical Island |
9 | 3915 | Hydroblast |
10 | 879 | Bayou |
In green, the recommendations matching the original deck:
The other recommended cards:
The input is a full deck without the "Land" cards (used to provide Mana to cast different special effects). A Magic player at this point of deck building starts searching which "Land" cards should go with his deck.
The suggested cards are interesting since most of them are "Lands". Consequently, the model takes into account the cards and their numbers set as input for making its recommendations.
The model's recommendations match most of the choices made in the original deck. This can be interpreted as the player who built the example deck made relevant choices with the Magic meta-game used to train the model.
This model for Top N recommendations based on collaborative filtering provides results taking into account the cards and their numbers set as input.
It is necessary to be careful with the data used to train the model since they need to reflect the current meta-game to provide relevant recommendations. The recommendations are directly based on other players decks used to train the ùmodel.
In a production environment, the recommendation algorithm needs to be improved (lighter data structure, cython, ...) for models with more cards (above 1500/2000 cards) since it can take a few seconds to process.
The surprise came from the performance of the cosine similarity model compared to the probability based model. This can be partially explained with the binary format of the data (selected / not selected) and the standards (number of cards per deck).
Other models exist, it will be interesting to know what data they will be able to dig and process, and how they will perform.