Convert long dataframe to wide format containing binary variables - lvphj/epydemiology GitHub Wiki
phjLongToWideBinary()
import numpy as np
import pandas as pd
import collections
import epydemiology as epy
myDF = epy.phjLongToWideBinary(phjDF,
phjGroupbyVarName,
phjVariablesVarName,
phjValuesDict = {0:0,1:1},
phjPrintResults = False)
Description
This function converts a dataframe containing a grouping variable and a variable containing a series of factors that may or may not be present and converts to a wide dataframe containing a series of binary variables indicating whether the factor is present or not. For example, it converts:
X | Y | |
---|---|---|
0 | 1 | a |
1 | 1 | b |
2 | 1 | d |
3 | 2 | b |
4 | 2 | c |
5 | 3 | d |
6 | 3 | e |
7 | 3 | a |
8 | 3 | f |
9 | 4 | b |
to:
X | a | b | d | c | e | f | |
---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
1 | 2 | 0 | 1 | 0 | 1 | 0 | 0 |
2 | 3 | 1 | 0 | 1 | 0 | 1 | 1 |
3 | 4 | 0 | 1 | 0 | 0 | 0 | 0 |
Function parameters
-
phjDF
Dataframe containing a grouping variable and a variable containing categories.
-
phjGroupbyVarName
Name of grouping variable.
-
phjVariablesVarName
Name of variable containing category levels.
-
phjValuesDict (default = {0:0,1:1})
Dictionary to define how to represent '0' and '1' values.
-
phjPrintResults (default = False)
Print intermediate values. No effect in current function.
Exceptions raised
-
AssertionError
AssertionError raised if parameters passed to function are incorrect.
Other notes
None
Example
df = pd.DataFrame({'X':[1,1,1,2,2,3,3,3,3,4],
'Y':['a','b','d','b','c','d','e','a','f','b']})
newDF = epy.phjLongToWideBinary(phjDF = df,
phjGroupbyVarName = 'X',
phjVariablesVarName = 'Y',
phjValuesDict = {0:0,1:1},
phjPrintResults = False)
print('Original dataframe\n')
print(df)
print('\n')
print('New wide dataframe\n')
print(newDF)
This produces the following output:
Original dataframe
X Y
0 1 a
1 1 b
2 1 d
3 2 b
4 2 c
5 3 d
6 3 e
7 3 a
8 3 f
9 4 b
New wide dataframe
X a b d c e f
0 1 1 1 1 0 0 0
1 2 0 1 0 1 0 0
2 3 1 0 1 0 1 1
3 4 0 1 0 0 0 0