Python: Reformat Filenames - SeanBeagle/DataScienceJournal GitHub Wiki

RENAME FILES TO MATCH EXPECTED CONVENTION

We are expecting this filename convention:

<sampleName>_<sampleNumber>_<readNumber>_001.fastq.gz

18-260-SW-C-1-4-30_S75_R2_001.fastq.gz

The Problem

If underscores are used in the sampleName, it throws off the pipeline, which is expecting to split on underscore and assign sample_name as the first item and sample_number as the second. All of our hyphens were converted to underscores!

18_260_SW_C_1_4_30_S75_R2_001.fastq.gz

HINT: Although the files have extra underscores, they all end with _<sampleName>_<readNumber>_001.fastq.gz

TEST YOUR FUNCTION

Write the function rename() to take in a filename, and return the corrected filename

def rename(filename):
    # Your code goes here
    return filename


test_cases = [
    ('18_260_SW_C_1_4_30_S75_R2_001.fastq.gz', '18-260-SW-C-1-4-30_S75_R2_001.fastq.gz'),
    ('19_M_JT_2_B_1_37_S122_R2_001.fastq.gz', '19-M-JT-2-B-1-37_S122_R2_001.fastq.gz'),
    ('KPH_56_S258_R2_001.fastq.gz', 'KPH-56_S258_R2_001.fastq.gz'),
    ('18_261_SW_A_1_1_30_S76_R1_001.fastq.gz', '18-261-SW-A-1-1-30_S76_R1_001.fastq.gz'), 
    ('19_M_UHMC_13_A_1_30_S181_R1_001.fastq.gz', '19-M-UHMC-13-A-1-30_S181_R1_001.fastq.gz'),
    ('18_KAR_2_2_30_S82_R1_001.fastq.gz', '18-KAR-2-2-30_S82_R1_001.fastq.gz')
]

try:
    for test_case in test_cases:
        assert rename(test_case[0]) == test_case[1], f'ERROR: {rename(test_case[0])} != {test_case[1]}'
except AssertionError as e:
    print(e)
else:
    print('PASS: You did it!')
⚠️ **GitHub.com Fallback** ⚠️