Canu - weaponsforge/fastractor GitHub Wiki

canu notes

Notes on how to install and use canu(https://canu.readthedocs.io/en/latest/quick-start.html) on Centos 8.

Content

Requirements

The following requirements and dependencies were used for this project. Other system and software configurations are open for testing.

  1. Virtual Box 6.14 (for Windows OS)
  2. Windows 10 Pro (host OS)
    • Version 1909 (OS Build 18363.1082)
    • Processor: Intel(R) Core(TM) i7-6700HQ
    • CPU @2.60GHz 2.60 GHz
    • GPU: NVIDIA GeForce GTX 1060, 6 GB Dedicated GPU Memory
    • Memory: 16 GB
    • System type: 64-bit OS, x64-based processor
  3. CentOS Linux release 8.1.1911 (Core) - VM (guest OS) running on VMWare Player
    • Memory: 8 GB
    • Processors: 4
    • Hard Disk: 40 GB
    • kernel 4.18.0-147.5.1.el8_2.x86_64
  4. canu v.2.1 (binary release) dependencies
    • Gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4)
    • Perl v5.26.3
    • Java jdk 8
    • Gnuplot 5.2

Install the canu Dependencies

System Update

(Optional) The system may need to be updated to ensure the latest security and binary packages, if its not yet updated. The Centos OS used for this demo was already updated, and skipped these steps.

  1. Check if there are available updates.
    sudo yum update
  2. Update the OS kernel package.
    sudo yum update -y kernel
  3. Update all packages.
    sudo yum update
  4. Reboot.
    sudo shutdown -r now

canu Dependencies Installation

The following dependencies must first be installed and configured before proceeding to use canu.

  1. Install gcc and perl.
    • No need to install these dependencies because Centos 8.0 already has gcc 8.3.1 and perl 5.26.3 pre-installed.
  2. Install Java JDK 8
    • Install the openjdk version

      sudo yum install -y java-1.8.0-openjdk
      sudo yum install -y java-1.8.0-openjdk-devel
      java -version
      
    • Set the java environment variables

      cat <<EOF | sudo tee /etc/profile.d/java8.sh
      export JAVA_HOME=/usr/lib/jvm/jre-openjdk
      export PATH=\$PATH:\$JAVA_HOME/bin
      export CLASSPATH=.:\$JAVA_HOME/jre/lib:\$JAVA_HOME/lib:\$JAVA_HOME/lib/tools.jar
      EOF
      
    • Activate the Java environment.
      source /etc/profile.d/java8.sh

    • Verify the installed Java version: java -version

      // The above command should output something similar:
      openjdk version "1.8.0_265"
      OpenJDK Runtime Environment (build 1.8.0_265-b01)
      OpenJDK 64-bit Server VM (build 25.265-b01, mixed mode)
      
  3. Install gnuplot. Version 5.2 patch 4 was used for this demo.
    sudo yum install gnuplot

Install canu

  1. Download the the canu binary release for Linux. canu 2.1 was used for this demo.
    wget https://github.com/marbl/canu/releases/download/v2.1/canu-2.1.Linux-amd64.tar.xz
  2. Install from the binary distribution. The following command will install canu on /home/adminuser/canu-2.1/bin:
    tar -xJf canu-2.1.Linux-amd64.tar.xz
  3. Verify that the canu installation file canu-2.1/bin/canu is present. If there is no canu-2.1/bin/ directory or the canu-2.1/bin/canu file is missing, retrace the previous installation steps first for errors before proceeding to #4.
  4. Permanently add canu's bin directory to the PATH environment variable to make canu available globally from the command line.
    • Create a canu.sh file.
      sudo nano /etc/profile.d/canu.sh
    • Encode your canu's installation path and save.

      INFO: Take note to use your canu's full installation directory for the PATH variable. The sample code uses canu installed in /home/adminuser/canu-2.1/bin.
      export PATH=$PATH:/home/adminuser/canu-2.1/bin

    • Source out the canu.sh file.
      source /etc/profile.d/canu.sh

Download a Dataset

  1. Navigate back to /home/adminuser.
    cd ~/
  2. Download a sample dataset to test with canu.
    curl -L -o pacbio.fastq http://gembox.cbcb.umd.edu/mhap/raw/ecoli_p6_25x.filtered.fastq (233 MB)
  3. Verify that a file pacbio.fastq is downloaded on /home/adminuser/pacbio.fastq

canu Sample Usage

  1. Process the dataset. pacbio.fastq's directory path may need to be specified, if canu is called from another directory other than the directory where pacbio.fastq was downloaded.

    canu \
     -p ecoli -d ecoli-pacbio \
     genomeSize=4.8m \
     -pacbio-raw pacbio.fastq
    

    INFO: if canu is not yet globally available, use its full installation path instead:

    /home/adminuser/canu-2.1/bin/canu \
     -p ecoli -d ecoli-pacbio \
     genomeSize=4.8m \
     -pacbio-raw pacbio.fastq
    
  2. Wait for the process to finish.

References

1(https://github.com/marbl/canu/releases) - canu binary releases
2(https://canu.readthedocs.io/en/latest/) - canu documentation

@weaponsforge
20201015