Agenda 19th of May - SelinaBlijleven/WebshopRecommendations GitHub Wiki
Subject: Data analysis with pandas.
- Low frequencies detected in some attributes: Should these be filtered/corrected?: Filtering will be applied to outlier datapoints. Colours will be grouped together and translated to Dutch where needed. Other attributes might be grouped together where useful.
Update: Colours have been grouped together using color webs provided in the data. Some webs still have low frequency. - Similarity measure between products: Everything other than numbers will be translated into numbers or word vectors and normalization will be applied before calculating similarity between products.
- Layout problems in figures: Crowded figures can be fixed by enlarging them and applying tight_layout() with matplotlib.
- Update Justin on progress: Justin will be updated on progress later today (19-05-2016)