Update LUT to latest values - lvphj/epydemiology GitHub Wiki
def epy.phjUpdateLUTToLatestValues(phjDF,
phjIDVarName,
phjGroupbyVarName,
phjAddCountCol = True,
phjPrintResults = False)
Description
It is assumed that values in the phjIDVarName
indicate a relative order in which values have been added (i.e. rows with higher ID values will have been added later). The ``phjIDVarName``` variable is used to sort the dataframe rows within groupby groups.
Example
# Define 2 dataframes. The values in df2 will be used to update the values in
# the df1, either by changing values that already exist or by adding new values.
df1 = pd.DataFrame({'id':[1,2,3,4,5,6,7,8],
'name':['a','b','c','d','e','f','g','h'],
'value':[999,22,33,44,55,66,999,88]})
df2 = pd.DataFrame({'id':[9,10,11,12],
'name':['a','i','d','g'],
'value':[11,99,None,77]})
# Append df2 to the end of df1
df = df1.append(df2).sort_values(by = ['name','id'])
print('First dataframe')
print('---------------')
print(df1)
print('\n')
print('Second dataframe')
print('----------------')
print(df2)
print('\n')
print('Joined dataframes')
print('-----------------')
print(df)
The initial dataframes look like:
First dataframe
---------------
id name value
0 1 a 999
1 2 b 22
2 3 c 33
3 4 d 44
4 5 e 55
5 6 f 66
6 7 g 999
7 8 h 88
Second dataframe
----------------
id name value
0 9 a 11.0
1 10 i 99.0
2 11 d NaN
3 12 g 77.0
Joined dataframes
-----------------
id name value
0 1 a 999.0
0 9 a 11.0
1 2 b 22.0
2 3 c 33.0
3 4 d 44.0
2 11 d NaN
4 5 e 55.0
5 6 f 66.0
6 7 g 999.0
3 12 g 77.0
7 8 h 88.0
1 10 i 99.0
The first dataframe is updated as follows:
myDF = epy.phjUpdateLUTToLatestValues(phjDF = df,
phjIDVarName = 'id',
phjGroupbyVarName = 'name',
phjAddCountCol = True,
phjPrintResults = True)
This produces the following output:
Original sorted dataframe with count variable
---------------------------------------------
id name value n
0 1 a 999.0 2
0 9 a 11.0 2
1 2 b 22.0 1
2 3 c 33.0 1
3 4 d 44.0 2
2 11 d NaN 2
4 5 e 55.0 1
5 6 f 66.0 1
6 7 g 999.0 2
3 12 g 77.0 2
7 8 h 88.0 1
1 10 i 99.0 1
Updated dataframe
-----------------
id name value n
0 1 a 11.0 2
1 2 b 22.0 1
2 3 c 33.0 1
3 4 d NaN 2
4 5 e 55.0 1
5 6 f 66.0 1
6 7 g 77.0 2
7 8 h 88.0 1
8 10 i 99.0 1