3.2.1.Lab : From Understanding to Preparation - sj50179/IBM-Data-Science-Professional-Certificate GitHub Wiki

From Understanding to Preparation

Estimated time needed: 20 minutes

Objectives

After completing this lab you will be able to:

  • Understand Data
  • Prepare Data for analysis and inference

Introduction

In this lab, we will continue learning about the data science methodology, and focus on the Data Understanding and the Data Preparation stages.

Table of Contents

  1. Recap
  2. Data Understanding
  3. Data Preparation

Recap

In Lab From Requirements to Collection, we learned that the data we need to answer the question developed in the business understanding stage, namely can we automate the process of determining the cuisine of a given recipe?, is readily available. A researcher named Yong-Yeol Ahn scraped tens of thousands of food recipes (cuisines and ingredients) from three different websites, namely:

https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0103EN-SkillsNetwork/labs/Module%202/images/lab2_fig3_allrecipes.png

www.allrecipes.com

https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0103EN-SkillsNetwork/labs/Module%202/images/lab2_fig4_epicurious.png

www.epicurious.com

https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0103EN-SkillsNetwork/labs/Module%202/images/lab2_fig5_menupan.png

www.menupan.com

For more information on Yong-Yeol Ahn and his research, you can read his paper on Flavor Network and the Principles of Food Pairing.

We also collected the data and placed it on an IBM server for your convenience.


Data Understanding

https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0103EN-SkillsNetwork/labs/Module%203/images/flowchart_data_understanding.png

Important note: Please note that you are not expected to know how to program in python. The following code is meant to illustrate the stage of data collection, so it is totally fine if you do not understand the individual lines of code. There will be a full course in this certificate on programming in python, Python for Data Science, which will teach you how to program in Python if you decide to complete this certificate.

DS0103EN-Exercise-From-Understanding-to-Preparation.ipynb