Biological problems - skurvits/NN-for-virus-prediction GitHub Wiki

Idea 1: Baltimore classification

The majority of viruses in the ViralMiner dataset are DNA viruses. Viruses can be most broadly classified by the Baltimore classification model. This model separates all viruses based on their genome structure as following:

  • I: dsDNA viruses (e.g. Adenoviruses, Herpesviruses, Poxviruses)
    
  • II: ssDNA viruses (+ strand or "sense") DNA (e.g. Parvoviruses)
    
  • III: dsRNA viruses (e.g. Reoviruses)
    
  • IV: (+)ssRNA viruses (+ strand or sense) RNA (e.g. Coronaviruses, Picornaviruses, Togaviruses)
    
  • V: (−)ssRNA viruses (− strand or antisense) RNA (e.g. Orthomyxoviruses, Rhabdoviruses)
    
  • VI: ssRNA-RT viruses (+ strand or sense) RNA with DNA intermediate in life-cycle (e.g. Retroviruses)
    
  • VII: dsDNA-RT viruses DNA with RNA intermediate in life-cycle (e.g. Hepadnaviruses)
    

Our dataset had 9 dsDNA Virus families, 6 ssDNA families and 1 ssRNA virus family. Would it be possible to classify viruses further by the Baltimore classification?

Idea 2: Transposons detection

Idea 3: Data generation classes