Experimental References - ababaian/serratus GitHub Wiki

Experimental and in-development sequence references

Reference sequences used in experiments and their associated files are stored in the working directory: s3://serratus-public/seq/

CoV Pan-Genome Series

  • ~/seq/cov0 : All CoV sequences from NCBI

    • NCBI search: "(Coronaviridae) AND "viruses"[porgn:txid10239]"
    • Date Accessed: 2020/03/30
    • Results: 33296
  • ~/seq/cov1r : Initial pan-coronavirus genome

    • Based off of cov0 with non-CoV accessions removed and polyA[10+] masked
    • See: ~/serratus/notebook/200408_cov1_pangenome.ipynb for make commands
    • Date: 2020/04/08
    • cov01r contains reverse non-compliment control sequences
  • ~/seq/cov2r : Refined pan-coronavirus genome

    • Based off of cov0
    • Removed poly-nt tracts of 10+
    • Blacklisted 6 non-CoV accessions
    • Pruned
    • See: ~/serratus/notebook/200420_cov2_pangenome.ipynb notebook for commands
    • Date: 2020/04/20
    • cov2.fa : Masked pan-genome
    • cov2r.fa : Masked pan-genome with reverse non-compliment controls
  • ~/seq/cov2m : Pan-coronavirus genome + mega0 (see flom1)

    • Based off of cov0
      • Removed poly-nt tracts of 10+
    • Blacklisted 6 non-CoV accessions + additional false-positive regions
    • See: ~/serratus/notebook/200509_cov2m_CoVpan_and_mega_genome.ipynb notebook for generation script
    • Date: 2020/05/09
    • cov2m.fa : Hard masked pan-genome
    • cov2m.unmasked.fa : Unmasked pan-genome
  • ~/seq/cov3 : Pan-Coronavirus genome

    • Major revision from cov2
      • CovRef2 : RefSeq representative genomes for CoV
      • cov0 :
      • Removed poly-nt tracts of 10+
    • Blacklisted 6 non-CoV accessions + additional false-positive regions
    • See: `` notebook for generation script
    • Date: 2020/05/09

FLOM Series

See: (Full Length Only Mega-Reference](https://github.com/ababaian/serratus/wiki/FLOM-reference)

Other