Tutorial - nselem/corason GitHub Wiki

CORASON Manual

CORe Analysis of Syntenic Orthologs Natural Product BGC

Index Required Files Execute ClusterTools

$ corason.pl -rast_ids rastIds -q query -s special_org [-hv] [-e_value query_evalue] [-c number_of_genes_on_cluster] [-b bit_score] [-e_cluster cluster_e_value] [-e_core core_e_value] [-l genome_selected] [-rescale number]

DesD

DesD reduced

Argument Default Description
--rast_ids Required (No Default) RAST ids tab-separated table Columns: Job id\tGenome id\tOrganism name
--queryfile,-q Required (No default) Aminoacid sequence on fasta file
--special_org,-s Required (No default) Job Id (from RAST) for the cluster where your query belongs
--e_value 1E-15 (float) E value. Minimal for a gene to be considered a hit
--bitscore,-b Default: 0 (Positive integer) After one run look into file .BLAST.pre to be more restrictive on hits
--cluster_radio -c 10 (Positive integer) Number of genes in the neighborhood to analize
--e_cluster 1E-3 (float) e-value for sequences from reference cluster, values above it will be colored
--e_core 1E-3 (float) e-value for Best Bidirectional Hits used to cunstruct genomic core from clusters
--list ls GENOME/*.faa (string separated by "," or ":". Example 1,2,4:6 produce a search on genomes 1,2,4,5,6) Leaving this option empty will conduce to search on all genomes in GENOME directory
--rescale,r 85000 (integer) Increasing this number will show a bigger cluster region with smaller genes.
--verbose,v 0 If you would like to read more output from scripts. Most of the time only useful if you would like script debugging
--help,h 0 Short help about arguments.
.

Remarks: For float values (as e_value, e_core etc) 0.001 will work, but .001 won't do it.

TUTORIAL

This is a detailed CORASON tutorial to find syntenic clusters and sort them phylogenetically.

When we run the commands:
$ docker run -i -t -v $(pwd):/home/output nselem/evodivmet /bin/bash
$ corason.pl -q DesC.query -rast_ids RAST_CORASON -s 242137

First we are opening a docker container that has CORASON installed, and second we are runing CORASON with arguments:

  • Query file (-q): DesC.query
  • Rast Ids file (-rast_ids): RAST.IDs
  • Special organism (-s): 242137

Where special organism number is the corresponding RAST JobId from example CORASON genome database. On this tutorial
has 14 Actinobacteria genomes annotated by RAST.

  1. Input Files
  2. Required arguments
  3. Optional Arguments

Rast Ids rast_ids special_org query

Genome Database

List
Saving time by manually choosing just a few organisms from genome database. Set $LIST=”num1,num2,...numn”

Installing a new genome database Dependencies MyRast (tested on ubuntu)

Creación de base de datos. Input: RAST.IDs RAST user password

El usuario y el password de RAST de donde serán descargados los genomas deben estar en globals en las variables PASS USER. Los archivos descargados se numerarán según su orden de archivo RAST.IDs e.g. los archivos de anotaciones correspondientes al JobId 288178 se descargarán como 1.faa y 1.txt. Los archivos correspondientes a 288231 como 3.faa, 3.txt.

Para agregar genomas a la base de datos: Agregas su descripciòn en el archivo RAST.IDs Guardas el archivo de RAST faa y txt en el folder GENOMAS, usando el nùmero de lìnea del ARCHIVO RAST.IDs

CORASON input required files

To use CORASON you have to modify an archive named globals.pm. It’s a text file, so you can modify it with your prefered text editor. Or use nano.

file.query

A single protein fasta file that contains your query protein. Filename extension .query is mandatory. 3. Execute ClusterTools Script: perl CoreCluster.pl Once you have written your preferences on the globals.pm file just run on terminal $perl CoreCluster.pl

  1. Outputs Files: RightNames.txt, Concatenados.svg, NAME/FUNCTION

RightNames.txt This file is the core-cluster concatenated fasta file.

Concatenados.svg This is the browsable graphic of your related clusters. NAME/FUNCTION

  1. Installation Dependencias: módulo de perl SVG Para instalarlo en MAC OS X 10.6.8 $sudo perl -MCPAN -e shell $install SVG

  2. CORASON architecture GENOMES RAST.IDs

CoreCluster.pl
1_Context_text.pl
Concatenador.pl
ReadingInputs.pl
header.pl 1_MakeBlast.pl
Rename_Ids_Star_Tree.pl
header2.pl 2.Batch_RetrieveFiles.pl
EliminadorLineas.pl
SearchAminoacidsFromCore.pl
multiAlign_gb.pl 2_OrthoGroups.pl
allvsall.pl ChangeName.pl
ReadReaccion.pl
changeNamesWC.pl RenamePrincipalHits.pl readTree.pl converter.pl 3_Draw.pl

Get line number on RAST.IDs for the organism on the query file. $ grep -n 'org' RAST.IDs

Example $ grep -n 'coelicolor' RAST.IDs 515:242137 6666666.112876 Streptomyces coelicolor A3(2) NC_003888.3 515

⚠️ **GitHub.com Fallback** ⚠️