Home - SeanBeagle/DataScienceJournal GitHub Wiki
Data Science Journal
This wiki contains notes and tutorials on how Python and SQL can be used to clean, validate, and analyze data.
Validate Sample Sheet
pandas
regex
assert
This is an example of validating the data to be included in an Illumina sequencing sample sheet. I was asked to ensure that all sample names were unique and conformed to Illumina's formatting guidelines. I also verified that sample names started with the corresponding raw sample names.
Confirm Counts in DB
SQL
This is a simple example of replying to a request for data analysis and verification using SQL. I was asked to cross-validate the results that another analyst was finding in a shared dataset.