Home - lvphj/epydemiology GitHub Wiki
Welcome to the ePydemiology wiki!
The ePydemiology package is a library of Python functions that was originally written to simplify some commonly-encountered data wrangling and analysis requirements of SAVSNET (the Small Animal Veterinary Surveillance Network) or to accompany teaching of basic epidemiology for veterinary undergraduate research projects, ensuring that the output of the functions matched the teaching materials. The functions make extensive use of Pandas dataframes as the option of choice for handling data. If the functions are useful, please feel free to use them.
The following functions are available:
Loading and retrieving data
1. Load data from a named cell range in an Excel workbook
myDF = epy.phjReadDataFromExcelNamedCellRange()
myConn = epy.phjConnectToDatabase()
3. Load data from a MySQL or SQL SERVER database
myDF = epy.phjGetDataFromDatabase()
Miscellaneous functions
4. Load text from a text file or argument into a Python string variable
myString = epy.phjGetStrFromArgOrFile()
5. Load text from a text file (e.g. a SQL query or regular expression) into a Python string variable
myString = epy.phjReadTextFromFile()
6. Create a named group regex from individual regexes
myRegexStr = epy.phjCreateNamedGroupRegex()
or, if phjRegexPreCompile
is set to True
:
myCompiledRegexObj = epy.phjCreateNamedGroupRegex()
7. Find regular expression named-group matches in a dataframe column
myDF = epy.phjFindRegexNamedGroups()
8. Identify the maximum level of taxonomic detail in a classification
myDF = epy.phjMaxLevelOfTaxonomicDetail()
9. Reverse map a categorical variable based on dictionary values
myDF = epy.phjReverseMap()
10. Retrieve unique values from dataframes
myDF = epy.phjRetrieveUniqueFromMultiDataFrames()
11. Update dataframe with new values
myDF = epy.phjUpdateLUT()
12. Update LUT to latest values
myDF = epy.phjUpdateLUTToLatestValues()
Matrix functions
13. Convert columns of binary data to a square matrix containing co-occurrences
myArr = epy.phjBinaryVarsToSquareMatrix()
14. Convert a long dataframe to wide format containing binary variables
myDF = epy.phjLongToWideBinary()
Plotting proportions
15. Calculate and plot a series of binomial proportions
myDF = epy.phjCalculateBinomialProportions()
16. Calculate and plot multinomial proportions
myDF = epy.phjCalculateMultinomialProportions()
17. Calculate binomial confidence intervals in summary table
myDF = epy.phjCalculateBinomialConfInts()
18. Convert a disease summary table to a dataframe of binary outcomes
myDF = epy.phjSummaryTableToBinaryOutcomes()
myDF = epy.phjAnnualDiseaseTrend()
Postcode-related functions
myDF = epy.phjCleanUKPostcodeVariable()
21. Add a postcode variable formatted to 7 characters
myDF = epy.phjPostcodeFormat7()
myDF = epy.phjConvertOSGridRefToLatLong)
23. Convert placename conjunctions to lowercase
myDF = epy.phjSetPlacenameConjunctionsToLower)
Select data
24. Generate a matched or unmatched case-control dataset
myDF = epy.phjGenerateCaseControlDataset()
25. Select matched or unmatched case-control data
myDF = epy.phjSelectCaseControlDataset()
26. Collapse a dataframe based on patient ID variable
myDF = epy.phjCollapseOnPatientID()
Clean data
myDF = epy.phjParseDateVar()
28. Convert a UK (day first) date string to consistent format
myDF = epy.phjUKDateStrToDatetime()
29. Strip white space from strings in object columns of Pandas dataframe
myDF = epy.phjStripWhiteSpc()
30. Extracts minimum repeating string from string variable
myDF = epy.phjAddColumnOfMinRepeatingString()
31. Aggregate duplicate columns and rows in Pandas dataframe
myDF = epy.phjAggDupColsAndRows()
32. Convert Pandas dataframe from wide to long format
myDF = epy.phjWide2Long()
Explore data
33. View a plot of log odds against mid-points of categories of a continuous variable
myOddsRatioTable = epy.phjViewLogOdds()
34. Categorise a continuous variable using predefined breaks, quantiles or optimised break positions
myDF = epy.phjCategoriseContinuousVariable()
or, if phjReturnBreaks
is set to True
:
myDF,myBreaks = epy.phjCategoriseContinuousVariable()
Epidemiology-related functions
35. Calculate odds and odds ratios for case-control studies
myDF = epy.phjOddsRatio()
36. Calculate relative risks for cross-sectional or longitudinal studies
myDF = epy.phjRelativeRisk()