Cost Prediction Model Based on HCUP NIS Data - sarigopiram/Data-Mining GitHub Wiki

INTRODUCTION

Financial Impact - What gets measured gets attention! The objective of the analysis is to predict the Total cost of care for patients based on their demographic, diagnosis/ procedures undergone, comorbidity conditions as well as the hospital dimensions. The model can be of use to Insurance firms so that they will be well informed about the cost of the transition care involved based on the demographic, hospital type and profile of the patient also to the hospitals to predict the % spill of cost above the health insurance coverage and take appropriate measure.

DATA PREPARATION

As opposed to the data preparation in Bayes algorithm, all continuous variables and categorical variables were retained as they are in the master datasets. To handle missing values, We followed simple imputation strategies as imputing continuous values by their mean and factor variables by the highest frequent factor. Also the dependent variable(Totchg in dollars ) was right skewed, hence transformed to log for better distribution and assessment.

Feature Selection: Information Gain based feature selection was chosen for linear regression. The cut off was 0.04.

Splitting the dataset: The dataset was split into Training, Validation and Test partitions. The training and validation together consisted on 80% of the dataset and Test was 20%. From the training and validation partition, 90% followed the training partition and 10% fell into validation set.

IMPLEMENTATION OVERVIEW

Choice of programming language: R
External libraries used: VIM, Hmisc, VIM, FSelector, e1071, caret
Code:

CORRELATION ANALYSIS

Correlation Analysis can be summarized as below

REGRESSION RESULTS

  • 62% of the cost involved is explained by cost prediction model (Indicated by an R_SQUARE of 0.6197), also the error is normally distributed and the residulas were randomly distributed.
  • Factors Increasing Cost are Number of procedures, Number of diagnosis, Presence of Chronic conditions, Involves an Operating Room Procedure

INTERPRETATION OF RESULTS

The analysis showed that treatments for organ failures, myeloproliferative diseases and poorly differentiated neoplasms, Alchohol /drug induced mental illness, fluid and electrolyte disorders, deficiency anemia and diabetes, open heart procedures, coronary angioplasty, arthroplasty knee and spinal fusion.