2. Tidying data for analysis - upalr/Python-camp GitHub Wiki

1 Tidy data

1.1 Tidy data

1 tidy-data

1.2 Motivation for tidy data

2 motivation-for-tidy-data

1.3 Principles of tidy data

Principles of tidy data

As a data scientist, you'll encounter data that is represented in a variety of different ways, so it is important to be able to recognize tidy (or untidy) data when you see it.

3 principals-of-tidy-data

Analysis 5: In our treatment data the columns don not represent different variables. They represent different values a and b for the variable treatment.

1.4 converting to tidy data

4 converting-to-tidy-data

1.4.1 Recognizing tidy data:

Link

1.5 converting to tidy data: problem and solution

5 converting-to-tidy-data-2

Info: you will practice melting a DataFrame using pd.melt(). There are two parameters you should be aware of: id_vars and value_vars. The id_vars represent the columns of the data you do not want to melt (i.e., keep it in its current shape), while the value_vars represent the columns you do wish to melt into rows. By default, if no value_vars are provided, all columns not set in the id_vars will be melted. This could save a bit of typing, depending on the number of columns that need to be melted.

1.6 Melting

Melting data is the process of turning columns of your data into rows of data.

6 melting

7 melting-2

More Pivoting and melting

2 Pivoting data

2.1 Pivot: un-melting data

While melting takes a set of columns and turns it into a single column, pivoting will create a new column for each unique value in a specified column.

  1. One reason we may want to pivot our data is to reshape our data from an analysis friendly shape into report friendly shape

or

  1. data set violates tidy data principal (rows do not contain observations)

8 pivot

2.1 Pivot: un-melting data (example)

9 pivot

2.2 Pivot

10 pivot

Info: .pivot_table() has an index parameter which you can use to specify the columns that you don't want pivoted: It is similar to the id_vars parameter of pd.melt(). Two other parameters that you have to specify are columns (the name of the column you want to pivot), and values (the values to be used when the column is pivoted).

More Pivoting and melting

2.3 Pivot (not work)

You may find the pivot method will not always work 11 pivot-not-work

2.3.1 Using pivot when you have duplicate entries

12 pivot-not-work

The solution is use pivot table method

2.4 Pivot table

13 pivot-table

14 pivot-table-2

3 Beyond melt and pivot

3.1 Beyond melt and pivot

15 beyond-melt-and-pivot

16 beyond-melt-and-pivot-2

3.2 Melting and parsing

17 melting-and-parsing

18 melting-and-parsing-2

More

More Pivoting and melting