Training Hyper Params - DSE-capstone-sharknado/main GitHub Wiki

Fitting the model parameters. For this step, our method adopts the sampling scheme of BPR-MF implemented in MyMediaLite [9], i.e., during each iteration we sample |P| training tuples to update the model parameters ⇥, which we repeat for 100 iterations.

In all cases, regularization hyperparameters are tuned to perform the best on the validation set V. The best regularization hyperpa- rameter was �⇥ = 100 for WR-MF, and �⇥ = 1 for other MF- based methods. For visually-aware methods, the embedding matrix E and visual bias vector � are not regularized as they introduce only a constant (and small) number of parameters to the model. In TVBPR and TVBPR+, �E(t), w(t) and b(t) are regularized with regularization parameter 0.0001.

For each training triple (u, i, j), BPR-MF requires O(K) to up- date the parameters, while VBPR and TVBPR+ need to update the visual parameters as well. VBPR takes O(K +K0) in total to finish updating the parameters for each sampled training triple. Compared to VBPR, although there are more visual parameters to describe multiple fashion epochs, TVBPR+ only needs to update the param- eters associated with the epoch the timestamp tui falls into. This means that TVBPR+ exhibits the same time complexity as VBPR. Additionally, visual feature vectors (fi) from Deep CNNs turn out to be very sparse, which can significantly reduce the above worst- case running time.