install juicer - BiocottonHub/BioSoftware GitHub Wiki
juicer是一款非常实用的Hi-C软件,通过简单的设置参数,就能处理巨大的Hi-C数据,这款软件就涵盖了一下功能:
- 直接将原始数据处理成指定精度的Hi-C交互数据
- 使用juicer-tools等工具,鉴定TAD、染色质loop等
- GNU核心命令
cat
啥的,基本上centos系统就行 - BWA 用于序列比对
- java 1.7 or java 1.8
- juicer Tools jar 进行下游的TAD鉴定、染色质loop识别时需要用到GPU计算
- CUDA 并行计算GPU
- 软件包自带CUDA.7的编译库,也可以从这下载JCuda
- 当然为了获得最好的计算性能,建议使用高性能的GPU集群进行计算
目前juicer支持以下几种集群,在进行分析时需要使用juicer包内对应的scripts
- OpenLava
- LSF
- SLURM
- GridEngine (Univa, etc. any flavor)
- scripts/ 放置juicer-Tools
- reference/ 存放参考基因组文件和BWA索引文件
- restriction_sites/ 限制性酶切等文件,没有的话,跑的时候用
-s none
参数 - sample/fastq/ 测序数据文件
# 克隆仓库
git clone [email protected]:aidenlab/juicer.git --depth=1
以后就在~/HiCSoftware/juicer
这个目录下跑juicer
cd home
mkdir -p HiCSoftware/juicer
cd HiCSoftware/juicer
##构造目录结构和下载数据
mkdir references; cd references
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.amb
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.ann
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.bwt
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.pac
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/references/Homo_sapiens_assembly19.fasta.sa
## 下载酶切数据
mkdir ../restriction_sites; cd ../restriction_sites
wget https://s3.amazonaws.com/juicerawsmirror/opt/juicer/restriction_sites/hg19_MboI.txt
## 建立对应集群版本的脚本软连接
cd ../
ln -s ~/github/juicer/LSF/scripts/ scripts
cd scripts
wget https://hicfiles.tc4ga.com/public/juicer/juicer_tools.1.9.9_jcuda.0.8.jar
ln -s 绝对路径/juicer_tools.1.7.6_jcuda.0.8.jar juicer_tools.jar
cd ..
## 创建样品目录和测序数据目录
mkdir HIC003; cd HIC003
mkdir fastq; cd fastq
wget http://juicerawsmirror.s3.amazonaws.com/opt/juicer/work/HIC003/fastq/HIC003_S2_L001_R1_001.fastq.gz
wget http://juicerawsmirror.s3.amazonaws.com/opt/juicer/work/HIC003/fastq/HIC003_S2_L001_R2_001.fastq.gz
cd .. ##当前位于样品目录
## 运行测试数据、一定要使用绝对路径
~/HiCSoftware/juicer/scripts/juicer.sh -D ~/HiCSoftware/juicer
-
-p
染色体长度文件,绝对路径 -
-z
基因组fa文件,绝对路径,bwa索引需要和fasta文件在同一文件夹 -
-s
酶切类型 "HindIII" or 默认为 "MboI",'none -
-d
指定样本目录,fastq文件夹需要在目录下,最终会生成aligned文件 -
-t
指定线程数 -
-C
并行运算,拆分测序数据时,每份大小,默认90000000,必须是4的倍数 -
-D
设置工作目录,里面需要包含scripts/ references/ and restriction_sites/
这些文件夹 -
-q
设置比对时的队列,队列占用时间比较短 -
-L
设置处理hic 文件时,长时间占用的队列 -
-S
分阶段的跑- "merge"
- "dedup"
- "final"
- "postproc"
- "early"
./scripts/juicer.sh -d /public/home/zpliu/HiCSoftware/juicer/test -z /public/home/zpliu/HiCSoftware/juicer/references/hg19.fa -p /public/home/zpliu/HiCSoftware/juicer/chromsome.bed -s none -D /public/home/zpliu/HiCSoftware/juicer/ -q q2680v2 -L q2680v2
ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'seq/bwa/0.7.8'
修改对应的脚本,与集群中bwa的版本对应即可
## 修改 script脚本中74行
load_bwa="module load seq/bwa/0.7.8"
load_bwa="module load BWA/0.7.17"
在脚本中module load 其他软件的时候同样检查一下
load_java="module load dev/java/jdk1.7"
load_cuda="module load dev/cuda/7.0.28"
适用于小样本数据
将整个CPU目录建立为scripts软连接,类似集群版本的操作 `
./scripts/juicer.sh -d /public/home/zpliu/HiCSoftware/juicer/test -z /public/home/zpliu/HiCSoftware/juicer/references/hg19.fa -p /public/home/zpliu/HiCSoftware/juicer/chromsome.bed -s none -D /public/home/zpliu/HiCSoftware/juicer/ -t 5
在 sample/aligned
目录下生成.hic
文件
对于中间文件可以使用cleanup.sh
脚本进行删除
.
├── abnormal.sam
├── collisions_dups.txt
├── collisions_nodups.txt
├── collisions.txt
├── dups.txt
├── header
├── inter_30_contact_domains
├── inter_30.hic
├── inter_30_hists.m
├── inter_30.txt
├── inter.hic
├── inter_hists.m
├── inter.txt
├── merged_nodups.txt
├── merged_sort.txt
├── opt_dups.txt
└── unmapped.sam