Step by step process - neurohub/neurohub_documentation GitHub Wiki
The ICD and IDP(imaging-derived phenotypes) information are available in current.csv
file, a big CSV table (separate from the bulk data).
Available on /lustre03/project/6008063/neurohub/ukbb/new/Tabular/current.csv
, you will find the most updated UK Biobank data set available on Beluga.
The command will show the headers of the CSV file, the complete list of data fields with their column number.
xsv headers current.csv
Currently 21,407 data fields are associated with about 0.5 million UK Biobank subjects.
3. Diagnoses - main ICD10 (Data field 41202)
The first question would be to know if the data field is available in our data set.
You can search for the data field using the grep
command:
xsv headers current.csv | grep 41202
Files that contain data on an individual participant are named in this fashion:
<EID>''<FIELD-ID>''<INSTANCE-ID>''<ARRAY-ID>
The outcome will be the 41202 data field associated with the array, here a total of 79 arrays. (The first column is the data field column number.)
18504 41202-0.1
18505 41202-0.2
18506 41202-0.3
18507 41202-0.4
.......
18579 41202-0.76
18580 41202-0.77
18581 41202-0.78
18582 41202-0.79
You can use xsv
and select the data from the range of column associated, here column 18503 to 18582.
xsv select 18503-18582 current.csv
To get the subject number (EID) you can select 1, (as the patient number is the first column of the file)
xsv select 1,18503-18582 current.csv
xsv select 1,18503-18582 current.csv | grep U071
6. Extract subjects correlating with the second-scan date ( Data field 41280)
xsv headers current.csv | grep 41280
grep
patients diagnosed in 2021 for example:
xsv select 1,19888-20146 current.csv | grep 2021
xsv select 1,18503-18582,1,19888-20146 current.csv | grep U071 | grep 2021-06
5061 participants with an ICD (International Classification Disease) code of U07.1 (covid tested positive)