PPS (yeast) preliminary analysis - RyogaLi/PPS GitHub Wiki
Flowchart
fasta
- Using all the sequences (including backbone) as refernece file: most of the reads will align to backbone which result a high mapping rate.
sequences count: 21981
- Using only the targeted sequences and remove duplicated genes: low alignment rate, since reads will only be aligned to ORFs.
sequences count: 7797
- Separate the original fasta sequence into HIP and SUP(PROTOGEN,SGD). Sequences in PROTOGEN are small ORFs which has a poor alignment rate.
sequences count: hip - other -
fastq
- Read 1 and Read 2 can be combined into one file since we are not dealing with barcodes
Comparing different type of reference files
- Alignment rate compare **BLUE: with backbone; GREEN: without backbone:
- Percent of genes that are found using different alignment rate cut off
next: plot percent recovered using reference file
Gene counts in each well after variant calling
- With old reference file
- With new reference file
Variant calling
- SNP and INDEL counts