View a plot of log odds against mid points of categories of a continuous variable - lvphj/epydemiology GitHub Wiki

Python function to view a plot of log odds against the mid-points of categories generated from a continuous variable in a Pandas dataframe.

phjViewLogOdds()


df = epy.phjViewLogOdds(phjDF,
                        phjBinaryDepVarName = None,
                        phjCaseValue = 1,
                        phjContIndepVarName = None,
                        phjMissingValue = 'missing',
                        phjNumberOfCategoriesInt = 5,
                        phjNewCategoryVarName = None,
                        phjCategorisationMethod = 'jenks',
                        phjNewCategoryNamesList = None,
                        phjGroupNameVar = None,
                        phjAlpha = 0.05,
                        phjPrintResults = False)

Description

Function parameters

Exceptions raised

None

Returns

Pandas dataframe containing a tabulation of the log odds for a categorised variable.

Other notes

See comments relating to phjCategoriseContinuousVariable() function.

Example

An example of the function in use is given below:

# Define example dataset
phjTempDF = pd.DataFrame({'binDepVar':['yes']*50000 + ['no']*50000,
                          'riskFactorCont':np.random.uniform(0,1,100000)})

with pd.option_context('display.max_rows', 10, 'display.max_columns', 5):
    print(phjTempDF)

    
# View log odds
phjTempDF = epy.phjViewLogOdds(phjDF = phjTempDF,
                               phjBinaryDepVarName = 'binDepVar',
                               phjContIndepVarName = 'riskFactorCont',
                               phjCaseValue = 'yes',
                               phjMissingValue = 'missing',
                               phjNumberOfCategoriesInt = 8,
                               phjNewCategoryVarName = 'categoricalVar',
                               phjCategorisationMethod = 'quantile',
                               phjGroupNameVar = None,
                               phjPrintResults = False)

with pd.option_context('display.max_rows', 10, 'display.max_columns', 10):
    print(phjTempDF)

Output

                 yes    no      odds        or      95pcCI_Woolf   logodds  \
categoricalVar                                                               
0               6371  6385  0.997807  1.018299  [0.9693, 1.0698] -0.002195   
1               6184  6311  0.979876  1.000000               --- -0.020329   
2               6334  6313  1.003326  1.023932  [0.9745, 1.0758]  0.003321   
3               6239  6299  0.990475  1.010816  [0.9619, 1.0622] -0.009571   
4               6254  6123  1.021395  1.042371  [0.9918, 1.0955]  0.021169   
5               6155  6276  0.980720  1.000861  [0.9524, 1.0518] -0.019468   
6               6190  6133  1.009294  1.030022  [0.9800, 1.0826]  0.009251   
7               6273  6160  1.018344  1.039258  [0.9889, 1.0922]  0.018178   

                      se  95CI_llimit  95CI_ulimit  catMidpoints  
categoricalVar                                                    
0               0.017708    -0.036902     0.032512      0.062007  
1               0.017893    -0.055399     0.014741      0.187506  
2               0.017784    -0.031536     0.038178      0.312504  
3               0.017862    -0.044579     0.025437      0.437502  
4               0.017978    -0.014068     0.056406      0.562501  
5               0.017939    -0.054628     0.015692      0.687499  
6               0.018017    -0.026061     0.044563      0.812497  
7               0.017937    -0.016979     0.053335      0.937496