BPR KNN - xavierfeltin/mtg_data_mining GitHub Wiki
Bayezian Personalized Ranking (BPR)
BPR is a generic optimization criterion for personalized ranking.
It is generic since it is not dependent to a particular model, such as k-nearest neighbors for example. It provides a criterion for ranking to associate to a model.
The criterion is derived from a Bayesian analysis of the personalized ranking problem.
Approach
Bayesian statistics
A bit aboutBayesian statistics is a tool to update the belief upon a new evidence.
Let's consider a coin tossed several times and we count the number of heads and tails.
Frequentist probabilities are usually answering this kind of questions: "What is the probability of 4 heads out of 9 tosses(D) given the fairness of coin (θ)?"
Bayesian statistics will prefer to answer to this kind of questions: Given an outcome what is the probability of coin being fair (θ=0.5) ?
To answer this question, the Bayesian are using the following information:
- The prior (actual belief): how much do we consider the coin being fair depending of what I already know about this coin ?
- The new evidence: the probability of the data weighted by how strongly we believe in them (for example we get the serie of 4 heads out of 9 tosses)
They obtain from these information:
- The posterior (belief): how much we consider the coin being faire after seeing the new evidence taking into account our previous belief about the coin ?
This can be sum up with the following formula (Bayes theorem): P(θ|D)=(P(D|θ) X P(θ))/P(D), with
- θ the supposed degree of fairness of the coin (parameter)
- D the number of heads obtained (outcome)
- p(θ): the prior
- p(D): the probability to obtain D (evidence)
- p(D|θ): how likely is seeing D taking into account our actual belief of θ (likelyhood function)
- p(θ|D): our new belief about the fairness of the coin (the posterior)
BPR Approach
For a recommendation model, the BPR criterion search to maximize the belief of this model ranking two items i and j taking into account a succession of new evidences.
BPR criterion is agnostic of the model used to determine the preference between two items.
The BPR approach tries to solve this problem with Bayesian statistics: P(θ|D)=(P(D|θ) X P(θ))/P(D), with:
- θ: model parameter
- D: a new evidence about the user's item preferences
- p(θ): current belief on the ranking capabilities of the model (prior)
- p(D): probability to meet the new evidence ( D )
- p(D|θ): the likelyhood function of i being preferred over j depending of our current belief of the recommendation model
- p(θ|D): the new belief on the model (posterior)
As said before, the BPR-criterion searches to maximise the belief on the ranking possibility of the recommendation model, that is to say the posterior p(θ|D). To do so, the BPR approach tries to optimize the model's parameter θ.
For this, the implicit feedbacks of a user are used (for example his purchase history or navigation history) as new evidences.
An item i is said to be preferred over an item j, if the user has already consulted i but not j. There is no particular preferences between two items already seen, or between two items not seen by the user.
To maximize the criterion, the publication suggests to use Stochastic Gradient Descent (SGD).
BPR used for k-nearest-neighbors (KNN)
As said earlier, BPR critierion is model agnostic. In this study, we use the KNN model with BPR.
KNN is based on a squared matrix C which the size is the number of items in the catalog. The matrix values are usually computed with cosine similarities between two items.
In BPR, this matrix C is the parameter θ described earlier. During the training process to maximize the BPR posterior, the matrix C learns at each iteration for N random navigation sessions, the preference between:
- A particular (randomly selected) item i already seen by the user
- A particular (randomly selected) item j not seen by the user
- With all the items seen during this navigation session
At the end of the training process, the matrix C will contain the coefficients that will be used to make the top N recommendations.
The coefficients instead of being computed with cosine similarities, they are learned through training when optimizing the BPR posterior.
Learning Improvements
To improve the convergence's speed and the quality of the results, two improvements for the stochastic gradient descent are implemented:
- Decay of the learning rate: it reduces the gradient's step when the results tend to stabilize to avoid missing the optima. The approach implemented multiply the learning rate by a decay factor when the results are stabilizing after a period of N epochs.
- Momentum: it helps the gradient's vector to move in the correct direction helping to converge faster. The gradient remembers its previous step. When it reaches a local minima or a plateau (gradient of 0) and it does not the correct direction anymore, this previous step will help the gradient to move forward.
Application to Magic the Gathering
In the context of this project, for a mode and a color there is a list of existing decks. These decks match the implicit feedbacks of a user described in the publication.
Thus, during the training process to maximize the BPR posterior, the matrix C learns at each iteration for N random decks, the preference between:
- A particular (randomly selected) card i selected in the deck
- A particular (randomly selected) card j not selcted in the deck
- With all the cards in the considered deck
The BPR criterion will be maximised with the decks of a particular mode and color. A card of each deck has been removed to form the testing set. The remaining cards of the decks are composing the training set.
To be consistent with the KNN model, the indicators used to measure the performance of this BPR-KNN model are the Hit Rate (HR) and the Average Reciprocal Hit Rank (ARHR).
The training is stopped when these two indicators had converged or if the maximum of iterations has been reached.
Comparison of models
The BPR-KNN is compared with the developped previously ItemmKNN approach. The ItemKNN is based on cosine similarity and is using with rows and coefficients normalization.
The different BPR-KNN's parameter values were obtained experimentally by trying to maximise HR and ARHR on Commander and Legacy decks. The BPR_KNN model is configured as follows:
- BPR-KNN parameters:
- lambda for positive items: 0.01
- lambda for negative items: 0.005
- Hyper parameters for training:
- batch size: 100
- initial learning_rate: 0.1
- minimal learning rate: 0.025
- learning rate decay: 0.5
- maximum number of epochs: 200
- number of epochs to detect stagnation: 20
The values of the HR and ARHR indicators used to compare the different models between them are the mean over 20 runs. At each run, different random training and testing sets are created.
To measure the significance at 95% of the difference of performance between two models, the paired t-test has been applied with a Bonferroni correction.
The information about the decks used to make this study are available on the ItemKNN.
HR of each model against all studied data sets with 95% of confidence: (in bold the best scores at 95% of confidence)
Mode | Color | BPR-KNN | ItemKNN | Mean difference |
---|---|---|---|---|
D,S | ||||
Legacy | Blue Black Red | 0.723 | 0.641 | 0.083 |
Legacy | White Blue Black Green | 0.683 | 0.602 | 0.081 |
Legacy | Blue Black Red Green | 0.774 | 0.749 | 0.025 |
Legacy | White Blue Black Red Green | 0.724 | 0.684 | 0.04 |
Pauper | Red | 0.875 | 0.864 | 0.011 |
Pauper | White Blue Black Red Green | 0.723 | 0.647 | 0.076 |
Commander | Red | 0.455 | 0.334 | 0.121 |
Commander | Blue Black Red | 0.607 | 0.396 | 0.211 |
Commander | White Blue Black Green | 0.668 | 0.444 | 0.223 |
Vintage | White Blue Black Red Green | 0.679 | 0.674 | 0.005 |
ARHR of each model against all studied data sets with 95% of confidence: (in bold the best scores at 95% of confidence)
Mode | Color | BPR-KNN | ItemKNN | Mean difference |
---|---|---|---|---|
D,S | ||||
Legacy | Blue Black Red | 0.528 | 0.475 | 0.053 |
Legacy | White Blue Black Green | 0.487 | 0.448 | 0.04 |
Legacy | Blue Black Red Green | 0.602 | 0.506 | 0.096 |
Legacy | White Blue Black Red Green | 0.553 | 0.548 | 0.006 |
Pauper | Red | 0.732 | 0.623 | 0.109 |
Pauper | White Blue Black Red Green | 0.551 | 0.49 | 0.061 |
Commander | Red | 0.286 | 0.211 | 0.074 |
Commander | Blue Black Red | 0.445 | 0.228 | 0.218 |
Commander | White Blue Black Green | 0.515 | 0.267 | 0.247 |
Vintage | White Blue Black Red Green | 0.523 | 0.494 | 0.029 |
BPR-KNN models show globally a clear improvement over ItemKNN model. The improvements are as much as in the number of items predicted as in the ranking of these recommendations.
The calculation of cosine similarity is a direct calculation without any optimization. It is faster and gives already relevant trends.
Using an optimization criterion dedicated to ranking is a step forward to modelize how items may be ranked to each other depending of the user preferences. Except in one specific configuration (Legacy - 5 colors), the indicators of the remaining configurations increase from 2% to 24%.
Recommendation Results
Let's take again the example of the deck from deckstat.net website.
The recommendations are this time made by the BPR-KNN model.
In yellow, the cards already selected by the player to build his deck.
We are expecting some of the other cards in the top 10 recommendations made by the model.
The top 10 recommendations made by the model are as follows: (in bold recommended cards matching the original deck)
Ranking | Multiverseid | Card's name |
---|---|---|
1 | 886 | Underground Sea |
2 | 887 | Volcanic Island |
3 | 884 | Tropical Island |
4 | 190393 | Scalding Tarn |
5 | 228255 | Flusterstorm |
6 | 386627 | Polluted Delta |
7 | 233041 | Surgical Extraction |
8 | 376559 | Toxic Deluge |
9 | 190413 | Misty Rainforest |
10 | 814 | Red Elemental Blast |
In green, the recommendations matching the original deck:
Compared to ItemKNN results, the recommended cards are recommended sooner (1st to 6th rank against 2nd to 8th rank).
The other recommended cards are slightly different. Flusterstorm (1st to 5th rank) and Toxic Deluge (5th to 8th rank) are still recommended.
The cards recommended instead of "Hydroblast" (blue counter spell) and "Bayou" (dark land) are of the same type (counter spell and land). Surgical Extraction is a more agressive spell but consistent with the spell orientation of this deck.
Conclusion
The BPR-KNN approach provides better results than the ItemKNN one based on HR and ARHR indicators. The ranking criterion based on Bayesian statistics is performing great for cards selection.
Since there is an optimization step, the required processing time is larger than for ItemKNN. It is necessary to remain careful with the hyper-parameters used to tune the model. The values used for this project with a small number of decks may not be adequate for bigger data sets.
The architecture to use this approach in a real website remains the same as for ItemKNN. You may choose to predict on the server or the client side.
If the predictions are taking place on the client side, there may be a difficulty to send all the coefficients by JSON (for 600 cards, there is a matrix of 600x600 floats to send). With some math and the original publication, you may factorize the coefficient matrix to send fewer coefficients but there is an additional step to rebuild the matrix.
An interesting add-on to ItemKNN and BPR-KNN may be for a set of selected cards to identify which ones are contributing for each of the N recommendations given by the model. This should allow to add information on the recommendations such as: "Misty rainforest has been recommended since you selected in your deck XXXXX card and XXXXX card"
Webography
Publication
Bayesian statistics
Optimization
- Video on learning recommenders
- Difference between Gradient and Stochastic Gradient
- Stochastic Gradient with momentum
- Learning Rate Schedules and Adaptive Learning Rate Methods for Deep Learning