Home - SeanBeagle/DataScienceJournal GitHub Wiki

Data Science Journal

This wiki contains notes and tutorials on how Python and SQL can be used to clean, validate, and analyze data.

Validate Sample Sheet

pandas regex assert

This is an example of validating the data to be included in an Illumina sequencing sample sheet. I was asked to ensure that all sample names were unique and conformed to Illumina's formatting guidelines. I also verified that sample names started with the corresponding raw sample names.

Confirm Counts in DB

SQL

This is a simple example of replying to a request for data analysis and verification using SQL. I was asked to cross-validate the results that another analyst was finding in a shared dataset.