Agenda 19th of May - SelinaBlijleven/WebshopRecommendations GitHub Wiki

Subject: Data analysis with pandas.

  • Low frequencies detected in some attributes: Should these be filtered/corrected?: Filtering will be applied to outlier datapoints. Colours will be grouped together and translated to Dutch where needed. Other attributes might be grouped together where useful.
    Update: Colours have been grouped together using color webs provided in the data. Some webs still have low frequency.
  • Similarity measure between products: Everything other than numbers will be translated into numbers or word vectors and normalization will be applied before calculating similarity between products.
  • Layout problems in figures: Crowded figures can be fixed by enlarging them and applying tight_layout() with matplotlib.
  • Update Justin on progress: Justin will be updated on progress later today (19-05-2016)