install juicer - BiocottonHub/BioSoftware GitHub Wiki

juicer介绍

juicer是一款非常实用的Hi-C软件,通过简单的设置参数,就能处理巨大的Hi-C数据,这款软件就涵盖了一下功能:

  • 直接将原始数据处理成指定精度的Hi-C交互数据
  • 使用juicer-tools等工具,鉴定TAD、染色质loop等

1.安装

1.1依赖项目

  • GNU核心命令cat啥的,基本上centos系统就行
  • BWA 用于序列比对
  • java 1.7 or java 1.8
  • juicer Tools jar 进行下游的TAD鉴定、染色质loop识别时需要用到GPU计算
  • CUDA 并行计算GPU
  • 软件包自带CUDA.7的编译库,也可以从这下载JCuda
  • 当然为了获得最好的计算性能,建议使用高性能的GPU集群进行计算

1.2特定的集群

目前juicer支持以下几种集群,在进行分析时需要使用juicer包内对应的scripts

  • OpenLava
  • LSF
  • SLURM
  • GridEngine (Univa, etc. any flavor)

1.3目录结构

  • scripts/ 放置juicer-Tools
  • reference/ 存放参考基因组文件和BWA索引文件
  • restriction_sites/ 限制性酶切等文件,没有的话,跑的时候用-s none参数
  • sample/fastq/ 测序数据文件

2.测试

# 克隆仓库
git clone [email protected]:aidenlab/juicer.git --depth=1 

2.1创建工作目录、构造,目录结构

以后就在~/HiCSoftware/juicer这个目录下跑juicer

cd home
mkdir -p HiCSoftware/juicer
cd HiCSoftware/juicer
##构造目录结构和下载数据
mkdir references; cd references
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.amb
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.ann
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.bwt
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.pac
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.sa
## 下载酶切数据
mkdir ../restriction_sites; cd ../restriction_sites
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/restriction_sites/hg19_MboI.txt

## 建立对应集群版本的脚本软连接
cd ../
ln -s ~/github/juicer/LSF/scripts/ scripts
cd scripts
wget https://hicfiles.tc4ga.com/public/juicer/juicer_tools.1.9.9_jcuda.0.8.jar
ln -s 绝对路径/juicer_tools.1.7.6_jcuda.0.8.jar juicer_tools.jar
cd ..

## 创建样品目录和测序数据目录
mkdir HIC003; cd HIC003
mkdir fastq; cd fastq
wget http://juicerawsmirror.s3.amazonaws.com/opt/juicer/work/HIC003/fastq/HIC003_S2_L001_R1_001.fastq.gz
wget http://juicerawsmirror.s3.amazonaws.com/opt/juicer/work/HIC003/fastq/HIC003_S2_L001_R2_001.fastq.gz
cd .. ##当前位于样品目录
## 运行测试数据、一定要使用绝对路径
~/HiCSoftware/juicer/scripts/juicer.sh -D ~/HiCSoftware/juicer

3.集群版本参数说明

  • -p 染色体长度文件,绝对路径
  • -z 基因组fa文件,绝对路径,bwa索引需要和fasta文件在同一文件夹
  • -s 酶切类型 "HindIII" or 默认为 "MboI",'none
  • -d 指定样本目录,fastq文件夹需要在目录下,最终会生成aligned文件
  • -t 指定线程数
  • -C 并行运算,拆分测序数据时,每份大小,默认90000000,必须是4的倍数
  • -D 设置工作目录,里面需要包含scripts/ references/ and restriction_sites/ 这些文件夹
  • -q 设置比对时的队列,队列占用时间比较短
  • -L 设置处理hic 文件时,长时间占用的队列
  • -S 分阶段的跑
    • "merge"
    • "dedup"
    • "final"
    • "postproc"
    • "early"
./scripts/juicer.sh -d /public/home/zpliu/HiCSoftware/juicer/test  -z /public/home/zpliu/HiCSoftware/juicer/references/hg19.fa -p /public/home/zpliu/HiCSoftware/juicer/chromsome.bed  -s none   -D /public/home/zpliu/HiCSoftware/juicer/ -q  q2680v2  -L q2680v2

4.报错

ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'seq/bwa/0.7.8' 修改对应的脚本,与集群中bwa的版本对应即可

## 修改 script脚本中74行
load_bwa="module load seq/bwa/0.7.8"
load_bwa="module load BWA/0.7.17"

在脚本中module load 其他软件的时候同样检查一下

load_java="module load dev/java/jdk1.7"
load_cuda="module load dev/cuda/7.0.28"

5.cpu版本

适用于小样本数据

将整个CPU目录建立为scripts软连接,类似集群版本的操作 `

./scripts/juicer.sh -d /public/home/zpliu/HiCSoftware/juicer/test  -z /public/home/zpliu/HiCSoftware/juicer/references/hg19.fa -p /public/home/zpliu/HiCSoftware/juicer/chromsome.bed  -s none   -D /public/home/zpliu/HiCSoftware/juicer/ -t 5

6.输出结果

sample/aligned目录下生成.hic文件

对于中间文件可以使用cleanup.sh脚本进行删除

.
    ├── abnormal.sam
    ├── collisions_dups.txt
    ├── collisions_nodups.txt
    ├── collisions.txt
    ├── dups.txt
    ├── header
    ├── inter_30_contact_domains
    ├── inter_30.hic
    ├── inter_30_hists.m
    ├── inter_30.txt
    ├── inter.hic
    ├── inter_hists.m
    ├── inter.txt
    ├── merged_nodups.txt
    ├── merged_sort.txt
    ├── opt_dups.txt
    └── unmapped.sam
⚠️ **GitHub.com Fallback** ⚠️