Python: Reformat Filenames - SeanBeagle/DataScienceJournal GitHub Wiki
<sampleName>
_<sampleNumber>
_<readNumber>
_001.fastq.gz
If underscores are used in the sampleName
, it throws off the pipeline, which is expecting to split on underscore and assign sample_name
as the first item and sample_number
as the second. All of our hyphens were converted to underscores!
HINT: Although the files have extra underscores, they all end with _<sampleName>
_<readNumber>
_001.fastq.gz
def rename(filename):
# Your code goes here
return filename
test_cases = [
('18_260_SW_C_1_4_30_S75_R2_001.fastq.gz', '18-260-SW-C-1-4-30_S75_R2_001.fastq.gz'),
('19_M_JT_2_B_1_37_S122_R2_001.fastq.gz', '19-M-JT-2-B-1-37_S122_R2_001.fastq.gz'),
('KPH_56_S258_R2_001.fastq.gz', 'KPH-56_S258_R2_001.fastq.gz'),
('18_261_SW_A_1_1_30_S76_R1_001.fastq.gz', '18-261-SW-A-1-1-30_S76_R1_001.fastq.gz'),
('19_M_UHMC_13_A_1_30_S181_R1_001.fastq.gz', '19-M-UHMC-13-A-1-30_S181_R1_001.fastq.gz'),
('18_KAR_2_2_30_S82_R1_001.fastq.gz', '18-KAR-2-2-30_S82_R1_001.fastq.gz')
]
try:
for test_case in test_cases:
assert rename(test_case[0]) == test_case[1], f'ERROR: {rename(test_case[0])} != {test_case[1]}'
except AssertionError as e:
print(e)
else:
print('PASS: You did it!')