Tutorial - nselem/orthocore GitHub Wiki

The Blast file

This blast file will be needed for orthocores. Please follow this instructions.

0. Create a folder with a copy of all you RAST files (<RAST_Id>.faa)

mkdir RAST
cp *faa RAST/.
cd RAST

1. Rename sequences adding the Job id after the fasta header

ls *.faa | while read line ; do perl -p -i -e "s/(.*)/\$1\|${line%.faa}/ if />/" $line; done

2. Create a concatenates DB

A fasta file with all sequences
cat *.faa > Concatenados.faa

Creating blastdb
makeblastdb -in Concatenados.faa -dbtype prot -out Concatenados.db

blastp all vs all (Concatenados vs Concatenados)
blastp -db Concatenados.db -query Concatenados.faa -outfmt 6 -evalue .001 -num_threads 4 -out Concatenados

To save time, it is desirable to parallelize this process if you have the opportunity. (Use several queries against Concatenados.db)

Run orthocores

Run orthocores images
docker run -i -t -v /mypath/mydir:/usr/src/CORE nselem/orthocores /bin/bash

Main scripts

If there is no core this script will scan where the core stop existing, we have two algorithms:
growing Grow the genome set until there is no core (depends on the order you set the genomes).
removing Removes genomes one by one looking for a non empty core.
scan.pl -rast_ids Cmm_Ago.csv -my_blast LoreAgosto.BLAST -set_name scaneando -mode g

Or try to find the core of all genomes:
CoreCluster.pl -rast_ids Cmm_Jul2016RAST.Ids -v -set_name todos -my_blast Coretodos.blast

Testing scripts:

Without a blast file:
2_OrthoGroups.pl -e_core 0.001 -list 373161,373159 -num 38 -rast_ids Cmm_Jul2016RAST.Ids -outname todos -name todos by default runs on steps of two, but this can be modified

With a blastfile:
2_OrthoGroups.pl -e_core 0.001 -list 373161,373159 -num 38 -rast_ids Cmm_Jul2016RAST.Ids -outname todos -name todos -blast Lore.blast

my_blast All vs aLL blast (see below blast section)
rast_ids Cvs file with Job Id, Genome Id and Organism Name from RAST
set_name output folder name
⚠️ **GitHub.com Fallback** ⚠️