Test - xniu7/jhuclass.bigdata.project GitHub Wiki

Test notice:

Our server runs on ec2 micro instance with 700mb memory and 8g disk (free)
For the low performance of micro instance, we recommend user download and test small dataset like Clade: Other, Genome:Sea hare, or insect, worms, although we can download and run any large dataset on big instances.
If any user download several large datasets, like human ~3G, which may use up all disk resources, you can login the instance and delete them in ~/sequence.
If the website crashed, you can start it by: cd ~/bigdata/bigdata_project/ nohup python manage.py runserver 0.0.0.0:8080 &
Before running any fasta format test, make sure the selected fasta dataset have been downloaded first.
All tested fastq files are already downloaded in ~/fastq

Test explanation:

Fasta Format:
ATCG_count will return 4 values which are the number of A,T,C,G of the whole fasta sequences.
Segments Length will return several values which are the numbers of the length of each segment in fasta file.
GC Content will return several values which are the gc percentages of each segment in fasta file. Protein Unit will return 5 values which are the number of possible proteins in each of top 5 segments.
Fastq Format:
Length = 1000 means top 1000 reads. We use this to avoid memory used-up, although we can run over 10,000,000 reads in large instances
Assembly will return an assembled fragment of reads. The number of overlaps in the assembled fragment is displayed in different color.
Ham Distance will return Hamilton distances of each two reads.
Edit Distance will return Edit distances of each two reads. The edit distance is much smaller than ham distance. Because it is more suitable to tell the distance of reads during genome assembly .