BiG SCAPE for visualizing Biosynthetic Gene Clusters - meyermicrobiolab/Meyer_Lab_Resources GitHub Wiki

BiG-SCAPE is a tool used for examining Biosynthetic Gene Clusters (BGCs) from (meta)genomes and groups them into Gene Cluster Families(GCFs). Run your (meta)genomes through antiSMASH first, and use the antiSMASH files as input for BiG-SCAPE. The output of BiG-SCAPE includes an interactive html interface.

There are 4 main steps for using BiG-SCAPE.

  1. Install Curl
  2. Install Docker
  3. Install BiG-SCAPE
  4. Running BiG-SCAPE
  5. Corason

BiG-SCAPE is installed on the newer iMac in the lab, skip down to Step 4 to run the program.

1. Install Curl

Curl is a terminal command for transferring information through URL's. You can check if you already have curl by using

  curl -V

If you don't already have it you can download it here

2. Installing Docker

The makers of BiG-SCAPE recommend using a virtualization software called Docker to run the program. This is most likely due to the setup involved in the program. Linked is the Docker installation page. For more information check out their tutorial.

3. Installing BiG-SCAPE

Open up your terminal, or command line and create a directory for your BiG-SCAPE executable. Note that the ~ denotes your HOME directory, but you can place the file anywhere for organizational purposes.

mkdir -p ~/BiGSCAPE/bin # not required if file already exists 
curl -q https://git.wageningenur.nl/medema-group/BiG-SCAPE/raw/master/run_bigscape > ~/BiGSCAPE/bin/run_bigscape 
chmod a+x ~/BiGSCAPE/bin/run_bigscape 
~/BiGSCAPE/bin/run_bigscape  #if you are inside of your BiGSCAPE/bin folder you can simply use ./run_bigscape to run the executable

4. Running BiG-SCAPE

First, check to make sure that Docker is running. Then separate your GenBank files into some kind of input folder, in the example, this folder is called gbks but you could name it anything. While you can run BiG-SCAPE on whole genomes, it is preferable to use the GenBank files of clusters produced by antiSMASH. You can name your output file anything that you like.

~/BiGSCAPE/bin/run_bigscape gbks outputFile 
#if you are inside of your BiGSCAPE/bin folder you can simply use ./run_bigscape to run the executable

Results

BiG-SCAPE produces a wealth of files. Click on the index.html found within your outputFile to be taken to an offline interactive webpage for you to view your results. Because this page is offline, if you add any files to the gbks folder after you've run BiG-SCAPE, you will need to rerun the program to see the updated changes.

Corason is the component of Big-SCAPE that produces the phylogeny trees for BiG-SCAPE but it does not do as complex identification like BiG-SCAPE. Use Corason if you want to reproduce BiG-SCAPE figures for download.You can use Corason and the files created by BiG-SCAPE to produce the phylogenic trees in an SVG format to use in reports. If you've already installed Curl and Docker in the steps above, no need to repeat them. However, if you have not, please look above for installation instructions.

Installing Corason

Installation is very similar to installing BiG-Scape.

 mkdir -p ~/Corason/bin # not required if you already have that 
 curl -q https://raw.githubusercontent.com/nselem/corason/master/run_corason > ~/Corason/run_corason 
 chmod a+x ~/Corason/bin/run_corason 
 ~/Corason/bin/run_corason~/bin/run_corason

Running Corason

Corason takes slightly different inputs than BiG-SCAPE. The main difference is that is requires a .fasta file of the particular enzyme that you are trying to identify. First, look at the cluster that you are trying to produce the figure for (you can click on the family link to automatically grab all the regions in the cluster) from the BiG-SCAPE results.

Then go to the figure to choose your enzyme.

We can see now that we are looking at the PF00501 AMP-binding enzyme (underlined in red) and that it is found in the Ga0138902_1054, Ga0138904_1043, Ga138900_1001, Ga138901_1022, and Halo_scaffold_1 regions (circled in red). Now we need to go to the gbks folder and pull out the regions that the enzyme is found in as denoted by the names circled in red and put them into a new folder, lets say corasonGbks. We can find the .fasta file in the output/cache/domains folders from the output produced by BiG-SCAPE. Now that we have identified out files we can run Corason by running

 ~/Corason/bin/run_corason~/bin/run_corason PF00501.27.fasta corasonGbks corasonGbks/Halo_scaffold_1.region001.gbk -g

Note that we have given Corason the enzymes fasta file, the specific clusters gbks files, and one specific region from that same corasonGbks folder. The last file can be chosen at random and is mainly used by Corason for comparison had we not identified the cluster with BiG-SCAPE first.

Results

The output will be stored in a enzymename.fasta-output. The SVG file that we are looking for is titled Joined.svg and is in this folder.

Further Reading/Resources

⚠️ **GitHub.com Fallback** ⚠️