Collaborative Filtering - xavierfeltin/mtg_data_mining GitHub Wiki

Many platforms (e-commerce, video on demands, ...) ask their users to rate their last purchase, the last film they saw ... These ratings are used after to search new items with similar ratings to suggest them to the customer.

To do these recommendations, collaborative filtering algorithms are used.

For an user, the algorithm aggregates all the movies that the user has seen. Movies have a positive or negative score depending of the recommendation left in the past by the user.

Then the algorithm search other users sharing similar recommendation score for movies in common. Based on this similarity, the algorithm will sugget the movies that the other users liked as well.

Approach

The selected approach is the Item to Item collaborative filtering published by Amazon.

The idea is to process similarity score only between items having common customers. It avoids a lot of unecessary processing since many items do not share common buyers.

The similarity score is based on the list of buyers for each item. A buyer can be rated positively or negatively based on his recommendation.

For example: Let us consdider a drill and a drill's battery on a e-commerce website. Here the following purchased information on these two products:

  • Customer A bought the drill and the drill's battery. He is satisfied of these two items.
  • Customer B bought only the drill and he is satisfied.
  • Customer C bought the two products. He is only satisfied of the drill.
  • Customer D only bought the battery. He is satisfied of his purchased.

A customer E is consulting the drill on the e-commerce website. The recommendation algorithm will consider to compare the drill with the drill's battery since other customers have bought them together.

The algorithm builds the list of buyers for these two item (1: customer happy, -1: customer unhappy, 0: customer did not buy the product)

  • Drill: [1,1,1,0]
  • Battery: [1,0,-1,1]

With these two lists, the algorithm processes the similarity score (cosine angle, ...). It obtains 0.8/1.0. Since the score is higher than other items in the catalog, the drill's battery appears in the recommendation list of customer E.

In Magic ?

A lot of platforms let players recommend Magic cards usually with a score from 1 to 5 stars. With this, it is possible to recommend cards the same way as Amazon recommends products.

However, this will not suggest cards that are played together in a deck. Indeed recommendation here are not linked to the notion of cards deck. It is the information that a card is present or not in a deck that is more adapted.

From this, the algorithm from Amazon is adapted as following:

for each card1 in catalog
  for each deck containing card1
    for each card2 in deck
      register card2 is played with card1
  
  for each card2
    Compute similarity between card1 and card2

The similarity is based on the list of decks in which the card is used. Two cards have a high similarity score if they are used in almost the same decks (a score of 1 means the two cards are used exactly in the same decks).

Limitations and complexity

As indicated in the publication, the processing time and memory usage can become cricital on big database. This problem can be reduced with different approaches:

  • Clustering the data to reduce the amount of data to process for each class
  • Multi-threading to process several items at the same time
  • Taking advantage of the particular data format for Magic

Clustering

Clustering implies usually to lose interactions between some data since they are seperated in different classes. In results, the recommendations will lose in accuracy.

On the contrary, in Magic, clustering the data will improve the accuracy of the suggestions. This is due to the fact Magic decks are split by the rules in different categories:

  • Game mode (Legacy, Vintage, Commander, ...)
  • Mana colors (Blue, White, Blue/White, ...)

When building a deck, a player is choosing a mode and a color. He wishes to see suggestions respecting these constraints. If similarity scores between cards were computed without taking into account these categories, the recommendations given to the player will be global and not adapted.

Multi-threading

Using the clustering "imposed" by Magic's rules make easy exploiting a multi-threading approach. A basic approach is to process the data of each class in a separated thread. The results of each class are consolidated together once all the threads are over.

Data format

The data used in this approach is finally a boolean value. True if the card is the deck, False otherwise. Moreover the list of decks in which a card is present is really sparse since a card is most of the time present in few decks.

Using this particularity, it is possible to encode the list of decks by keeping only the information in which deck the card is present. The decks were the card is absent can be easily deduced if needed.

If the similarity score is based on the cosine angle ( cos(vA,vB) = vA.vB / ||vA||*||vB||) the computation can be simplified as such:

  • The norm of each vector is simply the square number of decks where the card is present
  • The scalar product of the two vectors vA and vB is the number of decks in common between cardA and cardB

Results

Principles

The item-to-item approach generates score of similarity between a pair of cards. The score depends of how much these two cards share the same decks.

Here is a summary:

  • The two cards are sharing most of their decks
Card 2
Little played Played a lot
Card 1 Little Played Very high score Low score if ||card2|| >> ||card1||
Avg/High if ||card2|| > ||card1||
Played a lot Low score if ||card2|| >> ||card1||
Avg/High if ||card2|| > ||card1||
High to very high score
  • The two cards are sharing few or none of their decks
Card 2
Little played Played a lot
Card 1 Little Played Low to very low score Very low score
Played a lot Very low score Low to very low score

Example

For example, let us consider the card called "Pia Nalaar". It is a red mana colored card often played.

Here is two set of three recommendations for this card for Commander mode. The decks used for each set of recommendations are different. The first set is based only on full red colored decks, the second is based on red-black colored decks.

Recommendation for full red mana colored deck: Example 1

Recommendation for red-black mana colored deck (bi-color): Example 2

Let us consider that "Pia Nalaar" is a card played a lot. The item scores are average to very high. This means, the cards recommended with "Pia Nalaar" are played often as well and they are usually played with this card.

Even if the decks are different, some cards are common to both sets ("Abrade" and "Earthshaker Khenra"). What is interesting it is the presence of a black-red card in the second set. It illustrates the fact that cards have different synergies depending of the color and cards authorized.

If decks of different colors were mixed together, the suggestions will not have correctly taken into account these synergies.

However, since the sample of decks is small (500 deck), it is difficult to know if they are cards really linked to "Pia Nalaar" or if they are common red cards usually played in Commander for red and red-black colored decks and the decks are representing too much the "Pia Nalaar" card.

For the readers that consulted before the page about Association Rules, the "Aethersphere Harvester" has been suggested for a white/black/red colored decks (once again, the deck database used here is small).

Conclusion

Despite being a bit complex at first sight, Item-to_item approach is easy to exploit. It computes similarity scores based on the usage of cards in decks. With a score value, it is easy and fast to make suggestions to players.

Yet, the method requires a lot of processing for numerous data. So multi-threading and computing optimizations are important for production databases.

The approach described here favors cards with similar support across the decks. As said in the association rules page, at least 80% of the cards are present in less than 10% of the decks. The remaining cards are structural cards such as lands with less impact on the game. This feature of Magic the gathering that make the application of this solution relevant.

Despite that, synergies between a new card and an older card will not be very promoted by this approach at the beginning. This new combination will appear a bit later when more decks will become available, in other words when difference of support between the two cards will become smaller.

Webography

⚠️ **GitHub.com Fallback** ⚠️