Convert Pandas dataframe from wide to long format - lvphj/epydemiology GitHub Wiki
Converts a 'wide' dataframe to 'long' format.
This function meets a specific data situation where an individual
subject is described in a series of categorical variables and the
additional numerical values are given in a series of variables
which are entered as separate columns. Clearly, this situation is
exactly what Pandas' wide_to_long() function is designed to do. This
wrapper, however, automatically adds an appropriate prefix to the
column headings to act as stubname and renames the column to more
intuitive values.
This function also provides the opportunity for duplicate columns and rows (as
defined by a series of named columns) to be automatically aggregated before
the wide_to_long() function is applied.
The function retains all the named descriptive columns but then
converts the series of numeric columns to just two columns, one
called 'var' (phjPrefixStr) and the other called 'value' (phjValueStr).
As a visual example, the following dataframe (df) consists of two
categorical variables defining each row, namely cat1 and cat2. The
remaining columns (stuff, things and value) represent variables that
need to be used to convert to long function.
cat1 cat2 stuff things value
0 1 a 2230 489 143
1 2 b 1134 267 87
2 3 c 1165 256 25
3 4 d 1176 232 47
4 5 e 2279 478 86
5 6 f 1165 276 67
6 7 g 1176 287 87
7 8 h 3456 850 207
8 9 i 1156 245 34
9 10 j 1176 223 62
longdf = epy.phjWide2Long(phjDF = df,
phjReqVarList = ['cat1','cat2'],
phjReqVarAllPresent = True,
phjAutoAgg = True,
phjColAgg = 'sum', # Currently the only option
phjRowAgg = 'sum', # Currently the only option
phjNewCatColNameStr = 'category',
phjNewAggColNameStr = 'count',
phjDropZeros = False,
phjPrintResults = False)
print(longdf)
cat1 cat2 category count
0 1 a stuff 2230
1 1 a things 489
2 1 a value 143
3 2 b stuff 1134
4 2 b things 267
5 2 b value 87
6 3 c stuff 1165
7 3 c things 256
8 3 c value 25
9 4 d stuff 1176
10 4 d things 232
11 4 d value 47
12 5 e stuff 2279
13 5 e things 478
14 5 e value 86
15 6 f stuff 1165
16 6 f things 276
17 6 f value 67
18 7 g stuff 1176
19 7 g things 287
20 7 g value 87
21 8 h stuff 3456
22 8 h things 850
23 8 h value 207
24 9 i stuff 1156
25 9 i things 245
26 9 i value 34
27 10 j stuff 1176
28 10 j things 223
29 10 j value 62