Step by step process - neurohub/neurohub

The ICD and IDP(imaging-derived phenotypes) information are available in current.csv file, a big CSV table (separate from the bulk data).

1. `current.csv`

Available on /lustre03/project/6008063/neurohub/ukbb/new/Tabular/current.csv, you will find the most updated UK Biobank data set available on Beluga.

2. `xsv` with the option `headers`

The command will show the headers of the CSV file, the complete list of data fields with their column number.

xsv headers current.csv

Currently 21,407 data fields are associated with about 0.5 million UK Biobank subjects.

3. Diagnoses - main ICD10 (Data field 41202)

The first question would be to know if the data field is available in our data set.

You can search for the data field using the grep command:

xsv headers current.csv | grep 41202

Files that contain data on an individual participant are named in this fashion:

<EID>''<FIELD-ID>''<INSTANCE-ID>''<ARRAY-ID>

The outcome will be the 41202 data field associated with the array, here a total of 79 arrays. (The first column is the data field column number.)

   18504  41202-0.1
   18505  41202-0.2
   18506  41202-0.3
   18507  41202-0.4
.......
   18579  41202-0.76
   18580  41202-0.77
   18581  41202-0.78
   18582  41202-0.79

4. Extract the 41202 data

You can use xsv and select the data from the range of column associated, here column 18503 to 18582.

xsv select 18503-18582 current.csv

To get the subject number (EID) you can select 1, (as the patient number is the first column of the file)

xsv select 1,18503-18582 current.csv

5. Subjects with code U071

xsv select 1,18503-18582 current.csv | grep U071

6. Extract subjects correlating with the second-scan date ( Data field 41280)

xsv headers current.csv | grep 41280

grep patients diagnosed in 2021 for example:

xsv select 1,19888-20146 current.csv | grep 2021

7. Combine both results by selecting subjects coded U071 after 2021-06 (for example)

xsv select 1,18503-18582,1,19888-20146 current.csv | grep U071 | grep 2021-06

8. What we have in Beluga

5061 participants with an ICD (International Classification Disease) code of U07.1 (covid tested positive)

Step by step process - neurohub/neurohub_documentation GitHub Wiki

1. `current.csv`

2. `xsv` with the option `headers`

3. Diagnoses - main ICD10 (Data field 41202)

4. Extract the 41202 data

5. Subjects with code U071

6. Extract subjects correlating with the second-scan date ( Data field 41280)

7. Combine both results by selecting subjects coded U071 after 2021-06 (for example)

8. What we have in Beluga

Questions or Issues?

⚠️ GitHub.com Fallback ⚠️

Step by step process - neurohub/neurohub_documentation GitHub Wiki

1. current.csv

2. xsv with the option headers

3. Diagnoses - main ICD10 (Data field 41202)

4. Extract the 41202 data

5. Subjects with code U071

6. Extract subjects correlating with the second-scan date ( Data field 41280)

7. Combine both results by selecting subjects coded U071 after 2021-06 (for example)

8. What we have in Beluga

Questions or Issues?

⚠️ **GitHub.com Fallback** ⚠️

1. `current.csv`

2. `xsv` with the option `headers`

⚠️ GitHub.com Fallback ⚠️