BPR KNN - xavierfeltin/mtg_data_mining GitHub Wiki

Bayezian Personalized Ranking (BPR)

BPR is a generic optimization criterion for personalized ranking.

It is generic since it is not dependent to a particular model, such as k-nearest neighbors for example. It provides a criterion for ranking to associate to a model.

The criterion is derived from a Bayesian analysis of the personalized ranking problem.

Approach

A bit about Bayesian statistics

Bayesian statistics is a tool to update the belief upon a new evidence.

Let's consider a coin tossed several times and we count the number of heads and tails.

Frequentist probabilities are usually answering this kind of questions: "What is the probability of 4 heads out of 9 tosses(D) given the fairness of coin (θ)?"

Bayesian statistics will prefer to answer to this kind of questions: Given an outcome what is the probability of coin being fair (θ=0.5) ?

To answer this question, the Bayesian are using the following information:

The prior (actual belief): how much do we consider the coin being fair depending of what I already know about this coin ?
The new evidence: the probability of the data weighted by how strongly we believe in them (for example we get the serie of 4 heads out of 9 tosses)

They obtain from these information:

The posterior (belief): how much we consider the coin being faire after seeing the new evidence taking into account our previous belief about the coin ?

This can be sum up with the following formula (Bayes theorem): P(θ|D)=(P(D|θ) X P(θ))/P(D), with

θ the supposed degree of fairness of the coin (parameter)
D the number of heads obtained (outcome)
p(θ): the prior
p(D): the probability to obtain D (evidence)
p(D|θ): how likely is seeing D taking into account our actual belief of θ (likelyhood function)
p(θ|D): our new belief about the fairness of the coin (the posterior)

BPR Approach

For a recommendation model, the BPR criterion search to maximize the belief of this model ranking two items i and j taking into account a succession of new evidences.

BPR criterion is agnostic of the model used to determine the preference between two items.

The BPR approach tries to solve this problem with Bayesian statistics: P(θ|D)=(P(D|θ) X P(θ))/P(D), with:

θ: model parameter
D: a new evidence about the user's item preferences
p(θ): current belief on the ranking capabilities of the model (prior)
p(D): probability to meet the new evidence ( D )
p(D|θ): the likelyhood function of i being preferred over j depending of our current belief of the recommendation model
p(θ|D): the new belief on the model (posterior)

As said before, the BPR-criterion searches to maximise the belief on the ranking possibility of the recommendation model, that is to say the posterior p(θ|D). To do so, the BPR approach tries to optimize the model's parameter θ.

For this, the implicit feedbacks of a user are used (for example his purchase history or navigation history) as new evidences.

An item i is said to be preferred over an item j, if the user has already consulted i but not j. There is no particular preferences between two items already seen, or between two items not seen by the user.

To maximize the criterion, the publication suggests to use Stochastic Gradient Descent (SGD).

BPR used for k-nearest-neighbors (KNN)

As said earlier, BPR critierion is model agnostic. In this study, we use the KNN model with BPR.

KNN is based on a squared matrix C which the size is the number of items in the catalog. The matrix values are usually computed with cosine similarities between two items.

In BPR, this matrix C is the parameter θ described earlier. During the training process to maximize the BPR posterior, the matrix C learns at each iteration for N random navigation sessions, the preference between:

A particular (randomly selected) item i already seen by the user
A particular (randomly selected) item j not seen by the user
With all the items seen during this navigation session

At the end of the training process, the matrix C will contain the coefficients that will be used to make the top N recommendations.

The coefficients instead of being computed with cosine similarities, they are learned through training when optimizing the BPR posterior.

Learning Improvements

To improve the convergence's speed and the quality of the results, two improvements for the stochastic gradient descent are implemented:

Decay of the learning rate: it reduces the gradient's step when the results tend to stabilize to avoid missing the optima. The approach implemented multiply the learning rate by a decay factor when the results are stabilizing after a period of N epochs.
Momentum: it helps the gradient's vector to move in the correct direction helping to converge faster. The gradient remembers its previous step. When it reaches a local minima or a plateau (gradient of 0) and it does not the correct direction anymore, this previous step will help the gradient to move forward.

Application to Magic the Gathering

In the context of this project, for a mode and a color there is a list of existing decks. These decks match the implicit feedbacks of a user described in the publication.

Thus, during the training process to maximize the BPR posterior, the matrix C learns at each iteration for N random decks, the preference between:

A particular (randomly selected) card i selected in the deck
A particular (randomly selected) card j not selcted in the deck
With all the cards in the considered deck

The BPR criterion will be maximised with the decks of a particular mode and color. A card of each deck has been removed to form the testing set. The remaining cards of the decks are composing the training set.

To be consistent with the KNN model, the indicators used to measure the performance of this BPR-KNN model are the Hit Rate (HR) and the Average Reciprocal Hit Rank (ARHR).

The training is stopped when these two indicators had converged or if the maximum of iterations has been reached.

Comparison of models

The BPR-KNN is compared with the developped previously ItemmKNN approach. The ItemKNN is based on cosine similarity and is using with rows and coefficients normalization.

The different BPR-KNN's parameter values were obtained experimentally by trying to maximise HR and ARHR on Commander and Legacy decks. The BPR_KNN model is configured as follows:

BPR-KNN parameters:
- lambda for positive items: 0.01
- lambda for negative items: 0.005
Hyper parameters for training:
- batch size: 100
- initial learning_rate: 0.1
- minimal learning rate: 0.025
- learning rate decay: 0.5
- maximum number of epochs: 200
- number of epochs to detect stagnation: 20

The values of the HR and ARHR indicators used to compare the different models between them are the mean over 20 runs. At each run, different random training and testing sets are created.

To measure the significance at 95% of the difference of performance between two models, the paired t-test has been applied with a Bonferroni correction.

The information about the decks used to make this study are available on the ItemKNN.

HR of each model against all studied data sets with 95% of confidence: (in bold the best scores at 95% of confidence)

Mode	Color	BPR-KNN	ItemKNN	Mean difference
			D,S
Legacy	Blue Black Red	0.723	0.641	0.083
Legacy	White Blue Black Green	0.683	0.602	0.081
Legacy	Blue Black Red Green	0.774	0.749	0.025
Legacy	White Blue Black Red Green	0.724	0.684	0.04
Pauper	Red	0.875	0.864	0.011
Pauper	White Blue Black Red Green	0.723	0.647	0.076
Commander	Red	0.455	0.334	0.121
Commander	Blue Black Red	0.607	0.396	0.211
Commander	White Blue Black Green	0.668	0.444	0.223
Vintage	White Blue Black Red Green	0.679	0.674	0.005

ARHR of each model against all studied data sets with 95% of confidence: (in bold the best scores at 95% of confidence)

Mode	Color	BPR-KNN	ItemKNN	Mean difference
			D,S
Legacy	Blue Black Red	0.528	0.475	0.053
Legacy	White Blue Black Green	0.487	0.448	0.04
Legacy	Blue Black Red Green	0.602	0.506	0.096
Legacy	White Blue Black Red Green	0.553	0.548	0.006
Pauper	Red	0.732	0.623	0.109
Pauper	White Blue Black Red Green	0.551	0.49	0.061
Commander	Red	0.286	0.211	0.074
Commander	Blue Black Red	0.445	0.228	0.218
Commander	White Blue Black Green	0.515	0.267	0.247
Vintage	White Blue Black Red Green	0.523	0.494	0.029

BPR-KNN models show globally a clear improvement over ItemKNN model. The improvements are as much as in the number of items predicted as in the ranking of these recommendations.

The calculation of cosine similarity is a direct calculation without any optimization. It is faster and gives already relevant trends.

Using an optimization criterion dedicated to ranking is a step forward to modelize how items may be ranked to each other depending of the user preferences. Except in one specific configuration (Legacy - 5 colors), the indicators of the remaining configurations increase from 2% to 24%.

Recommendation Results

Let's take again the example of the deck from deckstat.net website.

The recommendations are this time made by the BPR-KNN model.

In yellow, the cards already selected by the player to build his deck. We are expecting some of the other cards in the top 10 recommendations made by the model.
Example deck

The top 10 recommendations made by the model are as follows: (in bold recommended cards matching the original deck)

Ranking	Multiverseid	Card's name
1	886	Underground Sea
2	887	Volcanic Island
3	884	Tropical Island
4	190393	Scalding Tarn
5	228255	Flusterstorm
6	386627	Polluted Delta
7	233041	Surgical Extraction
8	376559	Toxic Deluge
9	190413	Misty Rainforest
10	814	Red Elemental Blast

In green, the recommendations matching the original deck: Results

Compared to ItemKNN results, the recommended cards are recommended sooner (1st to 6th rank against 2nd to 8th rank).

The other recommended cards are slightly different. Flusterstorm (1st to 5th rank) and Toxic Deluge (5th to 8th rank) are still recommended.

The cards recommended instead of "Hydroblast" (blue counter spell) and "Bayou" (dark land) are of the same type (counter spell and land). Surgical Extraction is a more agressive spell but consistent with the spell orientation of this deck. Other results

Conclusion

The BPR-KNN approach provides better results than the ItemKNN one based on HR and ARHR indicators. The ranking criterion based on Bayesian statistics is performing great for cards selection.

Since there is an optimization step, the required processing time is larger than for ItemKNN. It is necessary to remain careful with the hyper-parameters used to tune the model. The values used for this project with a small number of decks may not be adequate for bigger data sets.

The architecture to use this approach in a real website remains the same as for ItemKNN. You may choose to predict on the server or the client side.

If the predictions are taking place on the client side, there may be a difficulty to send all the coefficients by JSON (for 600 cards, there is a matrix of 600x600 floats to send). With some math and the original publication, you may factorize the coefficient matrix to send fewer coefficients but there is an additional step to rebuild the matrix.

An interesting add-on to ItemKNN and BPR-KNN may be for a set of selected cards to identify which ones are contributing for each of the N recommendations given by the model. This should allow to add information on the recommendations such as: "Misty rainforest has been recommended since you selected in your deck XXXXX card and XXXXX card"

Webography

Publication

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI '09). AUAI Press, Arlington, Virginia, United States, 452-461.

Bayesian statistics

Optimization

Implementation

Example of implementation of BPR-KNN (in Python)