SQL: Confirm Counts - SeanBeagle/DataScienceJournal GitHub Wiki
QUESTION:
When I’m doing unique counts on CF ASIDs in the isolates from metadata, there were 188 unique patient ASIDs for M. avium, M. chimaera, M. intracellulare or M.chimaera-intracellulare. Can you double check those numbers?
- We are looking for the
COUNT
of unique orDISTINCT
patients_id's
- The species to include must be translated into a list of their
taxon
identifiers - 'CF' in the question indicates that the samples were
submitted_by
'CFRDP'
SELECT COUNT(DISTINCT(patient_id))
FROM isolate
WHERE
taxon IN ('MINT', 'MAV', 'MCHIM', 'MCHIM-MINT')
AND submitted_by='CFRDP';
188
ANSWER:
188
Samples submitted by the CFRDP that belong to the Mycobacterium Avium Complex (MAC) were identified.
FOLLOW UP:
I think I know the issue. The unique counts of patients across the mac species the actual unique patient asids were 188 but if we just summed the unique patients in each species then that number does equal 198. just some patients are double or even triple counted because they have two or three of the MAC species.
To check this hypothesis we will use the same query as before, however, we will look at unique patients within taxon groups using GROUP BY
.
SELECT taxon, COUNT(DISTINCT(patient_id)) count
FROM isolate
WHERE
taxon IN ('MINT', 'MAV', 'MCHIM', 'MINT-MCHIM')
AND submitted_by='CFRDP'
GROUP BY taxon;
taxon | count
-----------|------
MAV | 94
MCHIM | 35
MINT | 68
MINT-MCHIM | 2
ANSWER:
199 (94+35+68+2)
samples have unique patients when looking within taxon groups.