Plot annual disease trend - lvphj/epydemiology GitHub Wiki
A function to plot the annual trend of disease diagnoses.
phjAnnualDiseaseTrend()
def phjAnnualDiseaseTrend(phjDF,
phjYearVarName,
phjPositivesVarName = None,
phjNegativesVarName = None,
phjTotalVarName = None,
phjConfIntMethod = 'normal',
phjAlpha = 0.05,
phjPlotProportions = True,
phjPlotPrediction = True,
phjGraphTitleStr = None,
phjPrintResults = False)
Description
Annual disease trends are often expressed as a proportion of positive test results per year. However, it is often difficult to determine whether there is an increasing (or decreasing) trend over time. The purpose of this function is to calculate a linear trend (of log odds of disease) over successive years (together with 95% confidence intervals). The method fits a logistic regression model to binary outcome data using year
as the independant outcome variable. The model coefficients are used to calculate the proportion (or probability) of disease for each year. The line is linear for the log odds but the conversion of log odds to probability results in a curved line being produced. The confidence intervals for the line are calculated using the delta method, similar to the method used by Stata to calculate marginal probabilities.
Function parameters
-
phjDF
Pandas dataframe containing disease outcome data.
-
phjYearVarName
Name of the variable containing year data (as an integer).
-
phjPositivesVarName (default = None)
Name of the variable containing the number of positive cases in each given year. (N.B. the sum of the positive cases and negative controls is assumed to equal the total sample size. As a result, only two of the three variables need to passed to the function. If all three variables are passed, the function will check that the sum of positives and negatives equals the total; if not, an error will be generated).
-
phjNegativesVarName (default = None)
Name of the variable containing the number of negative controls in each given year. (N.B. the sum of the positive cases and negative controls is assumed to equal the total sample size. As a result, only two of the three variables need to passed to the function. If all three variables are passed, the function will check that the sum of positives and negatives equals the total; if not, an error will be generated).
-
phjTotalVarName (default = None)
Name of the total sample size for each given year. (N.B. the sum of the positive cases and negative controls is assumed to equal the total sample size. As a result, only two of the three variables need to passed to the function. If all three variables are passed, the function will check that the sum of positives and negatives equals the total; if not, an error will be generated).
-
phjConfIntMethod (default = 'normal')
The method required to calculated the confidence intervals for proportions. Default is assumed to be 'normal'. Other options are available.
-
phjAlpha (default = 0.05)
Desired value for type I error.
-
phjPlotProportions (default = True)
Boolean variable to plot a bar chart for proportion of cases.
-
phjPlotPrediction (default = True)
Boolean variable to plot a trend line for proportions (based on log odds) together with 95% confidence intervals.
-
phjGraphTitleStr (default = None)
String variable containing the title for the graph.
-
phjPrintResults (default = False)
Indicates whether intermediate results (including the returned dataframe) should be printed to screen as the function progresses.
Exceptions raised
None
Returns
Pandas dataframe containing a tabulation of linear trend of log odds expressed as a proportion together with 95% confidence intervals. The function also plots the graph if requested.
Other notes
None.
Example
An example of the function in use is given below:
phjDiseaseDF = pd.DataFrame({'year':[2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018],
'positive':[18,34,24,26,30,27,36,17,18,15,4],
'negative':[1695,1733,1929,1517,1449,1329,1130,928,753,496,325]})
phjDiseaseDF = epy.phjAnnualDiseaseTrend(phjDF = phjDiseaseDF.loc[phjDiseaseDF['year'] < 2018,:],
phjYearVarName = 'year',
phjPositivesVarName = 'positive',
phjNegativesVarName = 'negative',
phjTotalVarName = None,
phjConfIntMethod = 'normal',
phjAlpha = 0.05,
phjPlotProportions = True,
phjPlotPrediction = True,
phjGraphTitleStr = None,
phjPrintResults = False)
This produces the following output:
Optimization terminated successfully.
Current function value: 0.091853
Iterations 22