Calculate relative risks for cross sectional or longitudinal studies - lvphj/epydemiology GitHub Wiki

Python function to calculate relative risks for cross-sectional or longitudinal studies for data stored in a Pandas dataframe.

phjRelativeRisk()

df = epy.phjRelativeRisk(phjDF,
                         phjCaseVarName,
                         phjCaseValue,
                         phjRiskFactorVarName,
                         phjRiskFactorBaseValue,
                         phjMissingValue = np.nan,
                         phjAlpha = 0.05,
                         phjPrintResults = False)

Description

This function can be used to calculate simple, univariable relative risk (risk ratios) and 95% confidence intervals for cross-sectional and longitudinal (cohort) studies. The function is passed a Pandas dataframe containing the data together with the name of the 'case' variable and the name of the potential risk factor variable. The function returns a Pandas dataframe based on a 2 x 2 or n x 2 contingency table together with columns containing the risk, risk ratios and 95% confidence intervals. The layout of the output table was designed to match teaching materials used as part of a veterinary epidemiology undergraduate course. Rows that contain a missing value in either the case variable or the risk factor variable are removed before calculations are made.

Function parameters

The function takes the following parameters:

phjDF

This is a Pandas dataframe that contains the data to be analysed. One of the columns should be a variable that indicates whether the row has disease (diseased) or not (healthy).
phjCaseVarName

Name of the variable that indicates whether the row has disease or is healthy.
phjCaseValue

The value used in phjCaseVarName variable to indicate disease (e.g. True, yes, 1, etc.)
phjRiskFactorVarName

The name of the potential risk factor to be analysed. This needs to be a categorical variable.
phjRiskFactorBaseValue

The level or stratum of the potential risk factor that will be used as the base level in the calculation of odds ratios.
phjMissingValue
phjAlpha
phjPrintResults

Exceptions raised

None

Returns

Pandas dataframe containing a cross-tabulation of the disease status and risk factor varible. In addition, risk, relative risk and 95% confidence interval of the relative risk is presented.

Other notes

None

Example

An example of the function in use is given below:

import pandas as pd
import epydemiology as epy

# Pretend this came from a cross-sectional study (even though it's the same example data as used for the case-control study above.
tempDF = pd.DataFrame({'caseN':[1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0],
                       'caseA':['y','y','y','y','y','y','y','y','n','n','n','n','n','n','n','n','n','n','n','n'],
                       'catN':[1,2,3,2,3,4,3,2,3,4,3,2,1,2,1,2,3,2,3,4],
                       'catA':['a','a','b','b','c','d','a','c','c','d','a','b','c','a','d','a','b','c','a','d'],
                       'floatN':[1.2,4.3,2.3,4.3,5.3,4.3,2.4,6.5,4.5,7.6,5.6,5.6,4.8,5.2,7.4,5.4,6.5,5.7,6.8,4.5]})

phjRRTable = epy.phjRelativeRisk( phjDF = tempDF,
                                  phjCaseVarName = 'caseA',
                                  phjCaseValue = 'y',
                                  phjRiskFactorVarName = 'catA',
                                  phjRiskFactorBaseValue = 'a')

pd.options.display.float_format = '{:,.3f}'.format

print(phjRRTable)

Output

caseA  y  n  risk    rr            95pcCI
catA                                     
a      3  4 0.429 1.000               ---
b      2  2 0.500 1.167  [0.3177, 4.2844]
c      2  3 0.400 0.933  [0.2365, 3.6828]
d      1  3 0.250 0.583  [0.0872, 3.9031]