Calculate odds and odds ratios for case control studies - lvphj/epydemiology GitHub Wiki

Python function to calculate odds ratios for case-control studies using data stored in a Pandas dataframe.

phjOddsRatio()

df = epy.phjOddsRatio(phjDF,
                      phjCaseVarName,
                      phjCaseValue,
                      phjRiskFactorVarName,
                      phjRiskFactorBaseValue,
                      phjMissingValue = np.nan,
                      phjAlpha = 0.05,
                      phjPrintResults = False)

Description

This function can be used to calculate simple, univariable odds ratios and 95% confidence intervals for case-control studies. The function is passed a Pandas dataframe containing the data together with the name of the 'case' variable and the name of the potential risk factor variable. The function returns a Pandas dataframe based on a 2 x 2 or n x 2 contingency table together with columns containing the odds, odds ratios and 95% confidence intervals (Woolf). The layout of the output table was designed to match teaching materials used as part of a veterinary epidemiology undergraduate course. Rows that contain a missing value in either the case variable or the risk factor variable are removed before calculations are made.

Function parameters

The function takes the following parameters:

phjDF

This is a Pandas dataframe that contains the data to be analysed. One of the columns should be a variable that indicates whether the row is a case or a control.
phjCaseVarName

Name of the variable that indicates whether the row is a case or a control.
phjCaseValue

The value used in phjCaseVarName variable to indicate a case (e.g. True, yes, 1, etc.)
phjRiskFactorVarName

The name of the potential risk factor to be analysed. This needs to be a categorical variable.
phjRiskFactorBaseValue

The level or stratum of the potential risk factor that will be used as the base level in the calculation of odds ratios.
phjMissingValue
phjAlpha
phjPrintResults

Exceptions raised

None.

Returns

Pandas dataframe containing a cross-tabulation of the case and risk factor varible. In addition, odds, odds ratios and 95% confidence interval (Woolf) of the odds ratio is presented.

Other notes

None.

Example

An example of the function in use is given below:

import pandas as pd
import epydemiology as epy

tempDF = pd.DataFrame({'caseN':[1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0],
                       'caseA':['y','y','y','y','y','y','y','y','n','n','n','n','n','n','n','n','n','n','n','n'],
                       'catN':[1,2,3,2,3,4,3,2,3,4,3,2,1,2,1,2,3,2,3,4],
                       'catA':['a','a','b','b','c','d','a','c','c','d','a','b','c','a','d','a','b','c','a','d'],
                       'floatN':[1.2,4.3,2.3,4.3,5.3,4.3,2.4,6.5,4.5,7.6,5.6,5.6,4.8,5.2,7.4,5.4,6.5,5.7,6.8,4.5]})

phjORTable = epy.phjOddsRatio( phjDF = tempDF,
                               phjCaseVarName = 'caseA',
                               phjCaseValue = 'y',
                               phjRiskFactorVarName = 'catA',
                               phjRiskFactorBaseValue = 'a')

pd.options.display.float_format = '{:,.3f}'.format

print(phjORTable)

Output

caseA  y  n  odds    or       95pcCI_Woolf
catA                                      
a      3  4 0.750 1.000                ---
b      2  2 1.000 1.333  [0.1132, 15.7047]
c      2  3 0.667 0.889   [0.0862, 9.1622]
d      1  3 0.333 0.444   [0.0295, 6.7031]