SQL: Confirm Counts - SeanBeagle/DataScienceJournal GitHub Wiki

QUESTION:

When I’m doing unique counts on CF ASIDs in the isolates from metadata, there were 188 unique patient ASIDs for M. avium, M. chimaera, M. intracellulare or M.chimaera-intracellulare. Can you double check those numbers?

  • We are looking for the COUNT of unique or DISTINCT patients_id's
  • The species to include must be translated into a list of their taxon identifiers
  • 'CF' in the question indicates that the samples were submitted_by 'CFRDP'
SELECT COUNT(DISTINCT(patient_id)) 
FROM isolate 
WHERE 
  taxon IN ('MINT', 'MAV', 'MCHIM', 'MCHIM-MINT') 
  AND submitted_by='CFRDP';

188

ANSWER:

188 Samples submitted by the CFRDP that belong to the Mycobacterium Avium Complex (MAC) were identified.

FOLLOW UP:

I think I know the issue. The unique counts of patients across the mac species the actual unique patient asids were 188 but if we just summed the unique patients in each species then that number does equal 198. just some patients are double or even triple counted because they have two or three of the MAC species.

To check this hypothesis we will use the same query as before, however, we will look at unique patients within taxon groups using GROUP BY .

SELECT taxon, COUNT(DISTINCT(patient_id)) count 
FROM isolate 
WHERE 
  taxon IN ('MINT', 'MAV', 'MCHIM', 'MINT-MCHIM') 
  AND submitted_by='CFRDP' 
GROUP BY taxon;

taxon      | count
-----------|------
MAV        | 94
MCHIM      | 35
MINT       | 68
MINT-MCHIM | 2

ANSWER:

199 (94+35+68+2) samples have unique patients when looking within taxon groups.