Calculate relative risks for cross sectional or longitudinal studies - lvphj/epydemiology GitHub Wiki
Python function to calculate relative risks for cross-sectional or longitudinal studies for data stored in a Pandas dataframe.
phjRelativeRisk()
df = epy.phjRelativeRisk(phjDF,
phjCaseVarName,
phjCaseValue,
phjRiskFactorVarName,
phjRiskFactorBaseValue,
phjMissingValue = np.nan,
phjAlpha = 0.05,
phjPrintResults = False)
Description
This function can be used to calculate simple, univariable relative risk (risk ratios) and 95% confidence intervals for cross-sectional and longitudinal (cohort) studies. The function is passed a Pandas dataframe containing the data together with the name of the 'case' variable and the name of the potential risk factor variable. The function returns a Pandas dataframe based on a 2 x 2 or n x 2 contingency table together with columns containing the risk, risk ratios and 95% confidence intervals. The layout of the output table was designed to match teaching materials used as part of a veterinary epidemiology undergraduate course. Rows that contain a missing value in either the case variable or the risk factor variable are removed before calculations are made.
Function parameters
The function takes the following parameters:
-
phjDF
This is a Pandas dataframe that contains the data to be analysed. One of the columns should be a variable that indicates whether the row has disease (diseased) or not (healthy).
-
phjCaseVarName
Name of the variable that indicates whether the row has disease or is healthy.
-
phjCaseValue
The value used in phjCaseVarName variable to indicate disease (e.g. True, yes, 1, etc.)
-
phjRiskFactorVarName
The name of the potential risk factor to be analysed. This needs to be a categorical variable.
-
phjRiskFactorBaseValue
The level or stratum of the potential risk factor that will be used as the base level in the calculation of odds ratios.
-
phjMissingValue
-
phjAlpha
-
phjPrintResults
Exceptions raised
None
Returns
Pandas dataframe containing a cross-tabulation of the disease status and risk factor varible. In addition, risk, relative risk and 95% confidence interval of the relative risk is presented.
Other notes
None
Example
An example of the function in use is given below:
import pandas as pd
import epydemiology as epy
# Pretend this came from a cross-sectional study (even though it's the same example data as used for the case-control study above.
tempDF = pd.DataFrame({'caseN':[1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0],
'caseA':['y','y','y','y','y','y','y','y','n','n','n','n','n','n','n','n','n','n','n','n'],
'catN':[1,2,3,2,3,4,3,2,3,4,3,2,1,2,1,2,3,2,3,4],
'catA':['a','a','b','b','c','d','a','c','c','d','a','b','c','a','d','a','b','c','a','d'],
'floatN':[1.2,4.3,2.3,4.3,5.3,4.3,2.4,6.5,4.5,7.6,5.6,5.6,4.8,5.2,7.4,5.4,6.5,5.7,6.8,4.5]})
phjRRTable = epy.phjRelativeRisk( phjDF = tempDF,
phjCaseVarName = 'caseA',
phjCaseValue = 'y',
phjRiskFactorVarName = 'catA',
phjRiskFactorBaseValue = 'a')
pd.options.display.float_format = '{:,.3f}'.format
print(phjRRTable)
Output
caseA y n risk rr 95pcCI
catA
a 3 4 0.429 1.000 ---
b 2 2 0.500 1.167 [0.3177, 4.2844]
c 2 3 0.400 0.933 [0.2365, 3.6828]
d 1 3 0.250 0.583 [0.0872, 3.9031]