Dataset World Food Facts - Rostlab/DM_CS_WS_2016-17 GitHub Wiki

Dataset World Food Facts

  • Proposer: Vivek Sethia- @vivek-sethia - [email protected]

  • Team Members: 0. Vivek Sethia @vivek-sethia 0. Muhammad Triwindu Prasetya @winduprasetya 0. Andreas Kammerloher @Odin94

  • Votes:

    1. 🙋 _ @winduprasetya
    2. 🙋 _ @Odin94
    3. 🙋 _ @ishaanraj

Summary

The Open Food facts is a database for food products and is for everyone to use and contribute. It contains information about origin of the various food items, allergens, ingredients, nutritional content etc. It is developed by a non-profit association of volunteers. This database can help us in planning our diets based on requirements of our body.

Prediction Goals

  • Classification of food based on vitamins or other minerals/nutrients.
  • Classifying countries based on nutritional intake/value
  • Grouping of food based on common ingredients.
  • Depicting carbon footprint of the food which might help us understand about global warming.
  • And so on and so forth.

Weekly Progress

Week 01 (W46 Nov16) World Food Facts -- Summary:

  • Maximum number of food entries are from France.
  • Outliers are present in different attributes , for. e.g Carbohydrates per 100g is more than 100.
  • Duplicate or redundant values present for attributes like Countries.
  • Some attributes are filled with values to be completed which needs to be investigated.
  • Other datasets required for verification and overcoming missing values.
  • Out of 16 million data cells, more than 10 million cells are missing.

Week 02 (W47 Nov23) World Food Facts -- Summary:

  • Cleaning the dataset
  • Relationship between the attributes such additives & allergens,sugars_100g & carbohydarats_100g, fat_100g & energy_100g
  • Classification of important and non-important attributes
  • Finding outliers using Box plot.

Week 03 (W48 Nov30) World Food Facts -- Summary:

  • Gathering and inspecting food classified as sweets
  • Wordcloud of packaging attribute to identify most common packaging used.
  • Sugar distribution within different food
  • High sugar content food across developing & developed countries

Week 04 05 (W49 W50 Dec7 Dec14) World Food Facts -- Summary:

  • Looked into ratios between column entries and get the unique numbers to candy in order to find more candies
  • Looked into specific types of candy to make finding more easier
  • Find a distribution of Sugar and Energy in beverages and categorizing the beverages
  • Find beverages the highest contains sugar in beverages by visualizing it.

Week 06 (W51 Dec21) World Food Facts -- Summary:

  • Investigated and categorized beverages
  • Built a random forest to predict chocolates

[Week 08 (W Jan11) World Food Facts] -- Summary:

  • Used precision, recall and f1-score to evaluate predictions
  • Visualized Chocolate attributes to find clusters for prediction quality
  • Created random forest for dark chocolate to specialize (instead of trying to predict all chocolate types at once)

[Week 09 (W Jan18) World Food Facts] -- Summary:

  • Visualized attributes/clusters of dark/white/milk chocolate and all foods in a more concise manner (all attributes on one graph)
  • Built random forests for white chocolate and milk chocolate

Week 10 (W Jan25) World Food Facts -- Summary:

  • Improved white-chocolate prediction quality
  • Simple GUI

Final Presentation: http://slides.com/viveksethia/world-food-facts

Long Description

It contains 105060 products and hence there are lots of areas to apply for Data Mining.

1 - Dataset Description:

  • Size: 250 MB approx
  • Attributes: > 150

2 - Attributes:

  • product_name (text)
  • generic_name (text)
  • quantity (text)
  • cities (text)
  • countries (text)
  • categories (text)
  • origins (text)
  • allergens (text)
  • additives (text)
  • And so on..

There are lot of food products that exist in the world. It would be interesting to explore the composition of the food, their similarities with others, and also looking at them from a health point of view.

Links / Data / Other