Calculate odds and odds ratios for case control studies - lvphj/epydemiology GitHub Wiki
Python function to calculate odds ratios for case-control studies using data stored in a Pandas dataframe.
phjOddsRatio()
df = epy.phjOddsRatio(phjDF,
phjCaseVarName,
phjCaseValue,
phjRiskFactorVarName,
phjRiskFactorBaseValue,
phjMissingValue = np.nan,
phjAlpha = 0.05,
phjPrintResults = False)
Description
This function can be used to calculate simple, univariable odds ratios and 95% confidence intervals for case-control studies. The function is passed a Pandas dataframe containing the data together with the name of the 'case' variable and the name of the potential risk factor variable. The function returns a Pandas dataframe based on a 2 x 2 or n x 2 contingency table together with columns containing the odds, odds ratios and 95% confidence intervals (Woolf). The layout of the output table was designed to match teaching materials used as part of a veterinary epidemiology undergraduate course. Rows that contain a missing value in either the case variable or the risk factor variable are removed before calculations are made.
Function parameters
The function takes the following parameters:
-
phjDF
This is a Pandas dataframe that contains the data to be analysed. One of the columns should be a variable that indicates whether the row is a case or a control.
-
phjCaseVarName
Name of the variable that indicates whether the row is a case or a control.
-
phjCaseValue
The value used in phjCaseVarName variable to indicate a case (e.g. True, yes, 1, etc.)
-
phjRiskFactorVarName
The name of the potential risk factor to be analysed. This needs to be a categorical variable.
-
phjRiskFactorBaseValue
The level or stratum of the potential risk factor that will be used as the base level in the calculation of odds ratios.
-
phjMissingValue
-
phjAlpha
-
phjPrintResults
Exceptions raised
None.
Returns
Pandas dataframe containing a cross-tabulation of the case and risk factor varible. In addition, odds, odds ratios and 95% confidence interval (Woolf) of the odds ratio is presented.
Other notes
None.
Example
An example of the function in use is given below:
import pandas as pd
import epydemiology as epy
tempDF = pd.DataFrame({'caseN':[1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0],
'caseA':['y','y','y','y','y','y','y','y','n','n','n','n','n','n','n','n','n','n','n','n'],
'catN':[1,2,3,2,3,4,3,2,3,4,3,2,1,2,1,2,3,2,3,4],
'catA':['a','a','b','b','c','d','a','c','c','d','a','b','c','a','d','a','b','c','a','d'],
'floatN':[1.2,4.3,2.3,4.3,5.3,4.3,2.4,6.5,4.5,7.6,5.6,5.6,4.8,5.2,7.4,5.4,6.5,5.7,6.8,4.5]})
phjORTable = epy.phjOddsRatio( phjDF = tempDF,
phjCaseVarName = 'caseA',
phjCaseValue = 'y',
phjRiskFactorVarName = 'catA',
phjRiskFactorBaseValue = 'a')
pd.options.display.float_format = '{:,.3f}'.format
print(phjORTable)
Output
caseA y n odds or 95pcCI_Woolf
catA
a 3 4 0.750 1.000 ---
b 2 2 1.000 1.333 [0.1132, 15.7047]
c 2 3 0.667 0.889 [0.0862, 9.1622]
d 1 3 0.333 0.444 [0.0295, 6.7031]