Exploratory Analysis of HCUP NIS Data : Clustering to identify Patient Groupings - sarigopiram/Data-Mining GitHub Wiki

INTRODUCTION

The objective of the analysis is to identify the natural groupings of Patient based on initial diagnosis, cost and demographic features.

CLUSTER FACTORS

The cluster factors chosen were

  • Age of Patient
  • Neonatal age if applicable
  • Whether the patient is female
  • Whether the patient had come in on elective admission
  • Length of stay
  • Number of Chronic diseases diagnosed
  • Number of Procedures undergone
  • Major Operating room Procedure involved or not
  • Number of days from admission to principal procedure.
  • Total charges

IMPLEMENTATION OVERVIEW

Clustering Algorithm: K means Algorithm Number of clusters chosen: 5 Choosing number of ks based on elbow method on the SSE plot. Implementation Language: R https://github.com/sarigopiram/Data-Mining/blob/master/hcup_nis_2012/Cluster.r

CLUSTER RESULTS

INTERPRETATION OF CLUSTER RESULTS

Although, 85% of the patients belong to cluster 1, it can be noticed that the average cost involved in cluster 5 is > $2M. This could be due to the combination of factors like length of stay, high average number of chronic conditions and high presence of major operating room procedure and avg number of days from admission Patients with age ranging from 35-45 are prone to have a greater length of stay and hence greater cost associated. Hence, clusters 3 and 5 are highly probable to fall in other transfers (disposition type) It is interesting to note that 99% of cost associated is obtained from 18 % of the patient population. Hence it is evident that length of stay, operating room procedure and a non elective admission can be considered the risk factors for disposition types into routine and other transfers