2. Tidying data for analysis - upalr/Python-camp GitHub Wiki
1 Tidy data
1.1 Tidy data
1.2 Motivation for tidy data
1.3 Principles of tidy data
As a data scientist, you'll encounter data that is represented in a variety of different ways, so it is important to be able to recognize tidy (or untidy) data when you see it.
Analysis 5: In our treatment data the columns don not represent different variables. They represent different values a and b for the variable treatment.
1.4 converting to tidy data
1.4.1 Recognizing tidy data:
1.5 converting to tidy data: problem and solution
Info: you will practice melting a DataFrame using pd.melt(). There are two parameters you should be aware of: id_vars and value_vars. The id_vars represent the columns of the data you do not want to melt (i.e., keep it in its current shape), while the value_vars represent the columns you do wish to melt into rows. By default, if no value_vars are provided, all columns not set in the id_vars will be melted. This could save a bit of typing, depending on the number of columns that need to be melted.
1.6 Melting
Melting data is the process of turning columns of your data into rows of data.
2 Pivoting data
2.1 Pivot: un-melting data
While melting takes a set of columns and turns it into a single column, pivoting will create a new column for each unique value in a specified column.
- One reason we may want to pivot our data is to reshape our data from an analysis friendly shape into report friendly shape
or
- data set violates tidy data principal (rows do not contain observations)
2.1 Pivot: un-melting data (example)
2.2 Pivot
Info: .pivot_table() has an index parameter which you can use to specify the columns that you don't want pivoted: It is similar to the id_vars parameter of pd.melt(). Two other parameters that you have to specify are columns (the name of the column you want to pivot), and values (the values to be used when the column is pivoted).
2.3 Pivot (not work)
You may find the pivot method will not always work
2.3.1 Using pivot when you have duplicate entries
The solution is use pivot table method