PPS (yeast) preliminary analysis - RyogaLi/PPS GitHub Wiki

Flowchart

fasta

  1. Using all the sequences (including backbone) as refernece file: most of the reads will align to backbone which result a high mapping rate. sequences count: 21981
  2. Using only the targeted sequences and remove duplicated genes: low alignment rate, since reads will only be aligned to ORFs. sequences count: 7797
  3. Separate the original fasta sequence into HIP and SUP(PROTOGEN,SGD). Sequences in PROTOGEN are small ORFs which has a poor alignment rate. sequences count: hip - other -

fastq

  1. Read 1 and Read 2 can be combined into one file since we are not dealing with barcodes

Comparing different type of reference files

  1. Alignment rate compare **BLUE: with backbone; GREEN: without backbone:
  2. Percent of genes that are found using different alignment rate cut off
next: plot percent recovered using reference file

Gene counts in each well after variant calling

  1. With old reference file
  2. With new reference file

Variant calling

  1. SNP and INDEL counts