how to deal with ICD code - arborzhang/ukb-tracking GitHub Wiki

One important part of my project is to extract diseases based on their ICD diagnosis code. There has been several rounds of test. Here is the workflow I used so far. 1.Firstly, load the datasets (name=data). 2.Secondly, only keep the ICD variables (id, #p41270# is the variable recording all the ICD code, separated by |). So there are three components of this variable p41270, including ICD syndromes, code, and dates. I wrote a function (subset_icd), which could be used to split this variable into individual ICD syndromes, code and dates corresponding to each id. Then the dataset is transform to long format dataset, so within each id, the code and ICD syndromes are ordered by dates of ICD diagnosis. (The dataset name is icd_data.)
3. Thirdly, use function (icd_search) to screen diagnosis on the 80 LTC list, and get the cleaned dataset df_long. This step aims to take out ICD code/syndromes not included on the 80 LTC, such as fractures. Then take out duplicated ICD code. 4. Fourthly, transform the icd_data to wide format, so each id contains all the complications, both before and after entering the cohort (df_wide_comp). In this dataset, each id is recoded with complication name and dates. 5. create lexis for the original dataset, named L. then merge this dataset with the complication dataset (df_wide_comp), get the merged dataset rcompl, contains one record per person with 50 relevant complications and dates. 6. make a table of the number of persons who have each of the complications before, at and after the baseline.