ex101 - nibb-gitc/gitc2021mar-rnaseq GitHub Wiki
å ã« ~/gitc/data/HN ã«ç§»åãã
1. ã¢ããã¿ãŒé€å»åŸã®ïŒã€ã®ãªãŒããã¡ã€ã« etec_1.cut.fq, etec_2.cut.fqãããããã single end readã®ããŒã¿ãšèŠãªããŠãããã³ã°ããpaired endãšããŠãããããå Žåãšæ¯èŒããŠã¿ããã
-
etec_1.cut.fq, etec_2.cut.fq ãsingle end readã®ããŒã¿ãšèŠãªããŠbowtie2ã§etec ããªãã¡ã¬ã³ã¹ãšããŠãããã³ã°ããçµæããã¡ã€ã«etec_bowtie2_single.sam ã«åºåããããã®éããªãŒããã¡ã€ã«ã¯ã«ã³ãåºåãã§è€æ°æå®ã§ããããšã䜿ãã
-
åºåãã¡ã€ã«ã®è¡æ°ããetec_bowtie2.samãšæ¯èŒããã
-
ããããã®ãã¡ã€ã«ã®å é 20è¡ãheadã§åºåããŠæ¯èŒãã以äžã®ç¹ã«ã€ããŠéããè«ããã
a) ãããè¡
b) åºåããããªãŒãã®äžŠã³
c) ãã©ã°å€
d) ãªãŒããããããããäœçœ®
e) ãã¢ãšãªããªãŒããããããããäœçœ®
2. åã³etec_1.cut.fq ãš etec_2.cut.fqãpaired endãšããŠetecã«å¯ŸããŠãããã³ã°ãããããã®éãªãã·ã§ã³ãšã㊠-I 100 -X 200 ãæå®ãããã
- ãããã®ãªãã·ã§ã³ã¯ã©ãããæå³ãæã£ãŠãããã
- ãã®ã³ãã³ãããåºåãã¡ã€ã«ãetec_bowtie2_X200.sam ãšããŠå®è¡ããã
- åºåãã¡ã€ã«ã®è¡æ°ã¯ãetec_bowtie2.samãšæ¯ã¹ãŠå€åãããã
- ãã¡ã€ã«ã®å 容ã以äžã®ã³ãã³ãã§æ¯èŒããã©ããå€ãã£ããæ€èšããããã ããdiffã¯2ã€ã®ãã¡ã€ã«ãè¡ããšã«æ¯èŒããŠç°ãªãè¡ãåºåããã³ãã³ãã§ã'<'ã§å§ãŸãè¡ã¯æåã®ãã¡ã€ã«ã'>'ã§å§ãŸãè¡ã¯2çªç®ã®ãã¡ã€ã«ã®ã¿ã«åºçŸããè¡ã瀺ãããŸããless ã®-Sãªãã·ã§ã³ã¯ãé·ãè¡ãæãè¿ããã«è¡šç€ºããããšãæç€ºããã
$ diff etec_bowtie2.sam etec_bowtie2_X200.sam | less -S
3. samtools view ã®æ©èœã䜿ã£ãŠetec_bowtie2_sorted.bam ãã以äžã®éºäŒåã«ãããããããªãŒããåãåºããŠæ°ãæ°ãããæœåºãããè¡ãæ°ããã«ã¯ã wc ã³ãã³ãã䜿ãããšããŸãããããã³ã°ã¯ãªãªãã£ã20以äžãšããæ¡ä»¶ãã€ãããšæ°ãå€åãããã
æè²äœå | éå§äœçœ®-çµäºäœçœ® | éºäŒåå | |
1) | ETEC_chr | 336 - 2798 | thrA |
2) | ETEC_chr | 4518271 - 4522299 | rpoB |
4. etec_bowtie2_sorted.bam ãããsamtools view -f ã䜿ã£ãŠãã¢ãååšããŠäž¡æ¹ãšããããããããŠããªããªãŒããæœåºããŠæ°ãæ°ããïŒäžããFLAGå€ãããã€ã«ãªãã®ããæºåç·šããã¹ã99ããŒãžã®è¡šããèãããïŒã
reference ATGA-TGGTGTCGA
read ATGGGTGGAG--GA
6. åããã£ã¬ã¯ããªã«ããGTF圢åŒã®ãã¡ã€ã«sox6.gtfã«é¢ããŠã以äžã®åãã«UNIXã³ãã³ãgrep, wc, sort, awkãçšããŠçããã
- ãã©ã³ã¹ã¯ãªããNM_001145811.1ã«é¢ããè¡ã®ã¿ãæãåºããsox6_tr1.gtfãšããŠä¿åããã以äž2)-4)ã¯ãã®ãã¡ã€ã«ã察象ã«èª¿ã¹ãã
- ãã®ãã©ã³ã¹ã¯ãªããã«ã¯ããã€ã®exonãå«ãŸããŠãããã
- ãã®ãã©ã³ã¹ã¯ãªããã«é¢ããæ å ±ã®åè¡ããéå§äœçœ®ïŒãªãã¡ã¬ã³ã¹é åäžå·Šç«¯ã®äœçœ®ïŒã転åãããåãã®é ã«äžŠã¶ããã«äžŠã¹æ¿ããã
- ãã®ãã©ã³ã¹ã¯ãªããã®CDSã®é·ãã®åãèšç®ããã
- sox6.gtfã«ã¯äœçš®é¡ã®ãã©ã³ã¹ã¯ãªãããå«ãŸããŠããããïŒãã³ãïŒtranscript_id ã®ã«ã©ã ãæãåºãããŠããŒã¯ãªè¡ã®æ°ãæ°ããããŠããŒã¯ãªè¡ã¯sort -u ãçšããŠæœåºã§ããïŒ
7. ãªã¢ãŒãèšç®æ©äžã§åŸãããçµæãããŒã«ã«èšç®æ©ã«è»¢éããããŒã«ã«äžã§è§£æãè¡ãç·Žç¿ããããã以äžãé ã«å®è¡ããã
-
samtools stats ã³ãã³ãã¯ãBAMãã¡ã€ã«ããæ§ã ãªçµ±èšæ å ±ãåãåºãããšãã§ããããã®ã³ãã³ãã etec_bowtie2_sorted.bam ã«å¯ŸããŠå®è¡ããçµæãlessã§èŠãŠã¿ããããã®çµæã«ã¯æ§ã ãªæ å ±ãå«ãŸããŠããããè¡ã®å é ã®æååã§æ å ±ãåºå¥ããŠåãåºãããšãã§ããããã«ãªã£ãŠãããããã§ã¯å é ã COV ã§å§ãŸããcoverageïŒãªãã¡ã¬ã³ã¹é åã®åäœçœ®ã«ãªãŒããäœéã«ãããããããïŒã®ååžã®æ å ±ãåãåºãããUnixã³ãã³ãã䜿ã£ãåãåºãæ¹ã¯ãçµæãã¡ã€ã«ã®äžã«æžãããŠããããã®çµæãetec_bowtie2_cov.txtãšãããã¡ã€ã«ã«ä¿åãããã
-
次ã«ããããscpã䜿ã£ãŠããŒã«ã«ã®èšç®æ©ã«è»¢éããããæ°ããã¿ãŒããã«ãéããå¿ èŠã«å¿ããŠãã£ã¬ã¯ããªãäœã£ãŠããã«ç§»åãããscpã³ãã³ãã§ãbias5äžã®ãã¡ã€ã«ïŒããŒã ãã£ã¬ã¯ããªããã®ãã¹ã§æå®ïŒãã«ã¬ã³ããã£ã¬ã¯ããª(.)ã«è»¢éããã
-
Rãç«ã¡äžããŠã転éãããã¡ã€ã«ãèªã¿èŸŒã¿ãïŒã«ã©ã ç®(coverage)ãšïŒã«ã©ã ç®(positionæ°)ã®ãããããäœæããããããããã®typeã¯lineã§ãY軞ã¯å¯Ÿæ°ããšã£ãŠè¡šç€ºããããšã
8. qsubã³ãã³ãã䜿ã£ãŠãBIASã·ã¹ãã ã®ã¯ã©ã¹ã¿ãŒèšç®æ©äžã§ãžã§ããå®è¡ããŠã¿ããããããæ¬ã·ã¹ãã ã«ãããæ¬æ¥ã®äœ¿ãæ¹ã§ããã以äžãé ã«å®è¡ããã
-
ãšãã£ã¿ã䜿ã£ãŠä»¥äžã®ã¹ã¯ãªãã(exec_bowtie2.sh)ãäœæããã
#!/bin/sh #PBS -l ncpus=4 cd ${PBS_O_WORKDIR} bowtie2 -p ${NCPUS} -x etec -1 etec_1.cut.fq -2 etec_2.cut.fq -S etec_bowtie2_2.sam
-
qsubã§å®è¡ããã
$ qsub exec_bowtie2.sh
-
qstat -u ãŠãŒã¶å
ãå®è¡ãããäœãåºåãããªããã°ããžã§ãã¯çµäºããŠããã
-
bowtie2ãå®è¡ããã³ãã³ã $ bowtie2 -x etec -U etec_1.cut.fq,etec_2.cut.fq -S etec_bowtie2_single.sam
-
è¡æ°ã®ã«ãŠã³ã $ wc etec_bowtie2.sam etec_bowtie2_single.sam
è¡æ°ã¯ã©ã¡ãã100009 è¡ã§åãããªããè¡æ°ã®ãã¡9è¡ã¯ãããè¡ãæ®ãã®100000è¡ããããã³ã°çµæã§ãåãªãŒãã«ã€ããŠå¿ ã1è¡ã®ãããã³ã°çµæãããã
- åãã¡ã€ã«å é 20è¡ã®è¡šç€º $ head -20 etec_bowtie2.sam etec_bowtie2_single.sam
headã§å é 20è¡ã衚瀺ããçµæãããetec_bowtie2.sam (以äžpairedãšåŒã¶)ãšã etec_bowtue2_single.sam (以äžsingleãšåŒã¶)ãšã®éã«ä»¥äžã®ãããªéãã芳å¯ãããã
a) ãããè¡ã¯@PGè¡ã®CLïŒã³ãã³ãã©ã€ã³ïŒã®ã¿ãç°ãªããããšã¯åãã
b) 1ã«ã©ã ç®ã®ãªãŒãé ååããpairedã§ã¯åãªãŒãã«ã€ã2è¡ç¶ããŠåºåãããã®ã«å¯ŸããŠãsingleã§ã¯1è¡ãã€ããåºåãããŠããªããpairedã§ã¯2ã€ã®ãã¡ã€ã«ãåæã«èªã¿èŸŒãŸãã察å¿ãããªãŒãã察ãšããŠæ±ãããŠããã®ã«å¯ŸããŠãsingleã§ã¯åãã¡ã€ã«ãç¬ç«ãªãã®ãšããŠé 次åŠçãããŠããã
c) 2ã«ã©ã ç®ã®ãã©ã°ã¯ãsingleã§ã¯0, 4, 16ã®å€ããšãã®ã«å¯Ÿããpairedã§ã¯89, 73, 133ãªã©ã®å€ããšã£ãŠãããã·ã³ã°ã«ã®å Žåã¯ã1ãšãªããã©ã°ã¯4(ã»ã°ã¡ã³ããããããããªãã£ã)ãŸãã¯16(ééã«ãããããã)ã®ããããã®ã¿ã§ããã®ã«å¯Ÿãããã¢ã®å Žåã«ã¯ããå€ãã®æ å ±ãæ ŒçŽããããããç°ãªãå€ãšãªãã
d) 4ã«ã©ã ç®ïŒããããããäœçœ®ïŒã¯ãç°ãªã£ãŠããå Žåãšåãå Žåãšãããã5ã«ã©ã ç®ïŒãããã³ã°ã¯ãªãªãã£;MAPQïŒã42ã«ãªã£ãŠãããšãã«ã¯4ã«ã©ã ç®ãåãã«ãªã£ãŠããç¹ã«æ³šæããããMAPQãé«ãå Žåã¯ããŠããŒã¯ã«ããããããããšãæå³ããŠãããäœçœ®ã¯ã€ãã«åãã«ãªãããMAPQãäœãå Žåã¯å®éã«ã¯è€æ°ç®æã«ããããããŠããããã®äžã®äžã€ãã©ã³ãã ã«éžã°ããŠãããããã§ããã¢ã§ç §åããéã«å¥ã®äœçœ®ã«ãããããããšèããããããªããçžè£éããããããå Žåã¯ã10ã«ã©ã ç®ã¯çžè£éã®é åãã11ã«ã©ã ç®ã¯ã¯ãªãªãã£å€ãéåãã«è¡šç€ºãããŠããã
e) 7-9ã«ã©ã ç®ã®ããã¢ãšãªããã©ã°ã¡ã³ãã«é¢ããæ å ±ããsingleã®æ¹ã§ã¯åºåãããªãïŒãã¹ãŠ* 0 0ãšãªã£ãŠããïŒã
-
ãªãã·ã§ã³ -I 100 -X 200 ã¯ããªãŒã察ãããããããšãã®ãã©ã°ã¡ã³ãé·ã100ãã200ã® éã®å€ã§ããããšãæç€ºããïŒããã©ã«ãã¯0ãã500ïŒã
-
bowtie2ãå®è¡ããã³ãã³ã
$ bowtie2 -x etec -1 etec_1.cut.fq -2 etec_2.cut.fq -S etec_bowtie2_X200.sam -I 100 -X 200
$ wc etec_bowtie2.sam etec_bowtie2_X200.sam
è¡æ°ã¯ãããã100009è¡ã§åããããªãã¡ããªãŒã察ã«ã€ããŠã®æ¡ä»¶ãå€ããŠãããã©ã«ãã§ã¯ãã¹ãŠã®è¡ãåºåãããã®ã§ãè¡æ°ã¯å€ãããªããæ¡ä»¶ãæºãããã©ããã¯ãã©ã°ã®å€ã§è¡šãããã
- 2ã€ã®SAMãã¡ã€ã«ã®éãã衚瀺
$ diff etec_bowtie2.sam etec_bowtie2_X200.sam | less -S
diffã³ãã³ãã§è¡šç€ºããetec_bowtie2.sam (以äždefaultãšåŒã¶)ãšã etec_bowtie2_X200.sam (以äžX200ãšåŒã¶)ãšã§ç°ãªãè¡ã«ã€ããŠãäžè¬çã«ä»¥äžã®ãããªç¹åŸŽã芳å¯ãããïŒè¥å¹²ã®äŸå€ã¯ããïŒã
- ç°ãªãè¡ã«ãããŠã¯ã2ã«ã©ã ç®ïŒãã©ã°ïŒã®å€ãX200ã®æ¹ãdefaultãã2å°ãããªã£ãŠããããã¢ãªãŒãã®ééãåããæ£ãããããããããã©ããã¯ããã©ã°ã®2ãããç®ïŒ2鲿°ã®10ãããªãã¡10鲿°ã®2ïŒã§ç€ºããããX200ã®æ¹ãééã«å¯Ÿããæ¡ä»¶ãå³ãããããdefaultã§æ£ãããããããããšå€å®ããããã®ããX200ã§ã¯æ£ãããªããšå€å®ãããããšãããããã®å Žåã«ãã©ã°ã®2ãããç®ã1ãã0ã«å€åããçµæãå€ã2å°ãããªã£ãã
- ç°ãªãè¡ã«ãããŠã¯ã9ã«ã©ã ç®ã®çµ¶å¯Ÿå€ïŒãã©ã°ã¡ã³ãã®é·ãïŒã200ãã倧ããã100ããå°ãããªã£ãŠããããã®ãããªå Žåã«ãããŠã-I 100 -X 200ã®æ¡ä»¶ãæºãããªããªãã®ã§ãã©ã°ã®å€ãå€åããã
-
$ samtools view etec_bowtie2_sorted.bam ETEC_chr:336-2798 | wc
-
$ samtools view etec_bowtie2_sorted.bam ETEC_chr:4518271-4522299 | wc
æ°ã¯ãããã6åãš195åã ãããã³ã°ã¯ãªãªãã£ã20以äžãšããæ¡ä»¶ãä»ããã«ã¯ãã³ãã³ãã«-q 20ãå ããããã®å Žåã¯ãçµæã¯å€ãããªãã
$ samtools view -f 13 etec_bowtie2_sorted.bam | wc
ãã¢ãªãŒãããã(1)ãèªèº«ãããããããŠããªã(4)ãçžæãããããããŠããªã(8)ããã©ã°ã¯åèš13ã ãã®ãããå šãŠãç«ã£ãŠãããã©ã°å€ãæã€ãªãŒããæ€çŽ¢ãããæ°ã¯380ããªããæ¡ä»¶ãæºãããã©ã°ã¯ããããããããå®éã«åºåãããççŸããªããã©ã°ã¯ä»¥äžã®2éã
01001101 = 77
PAIRED,UNMAP,MUNMAP,READ1
10001101 = 141
PAIRED,UNMAP,MUNMAP,READ2
4M1I5M2D2M
-
$ grep 'NM_001145811\.1' sox6.gtf > sox6_tr1.gtf
ã.ãã¯æ£èŠè¡šçŸã§ãä»»æã®ïŒæåãã衚ãã®ã§ããããæã¡æ¶ããŠã.ããšããæåã®ã¿ã«ããããããããã«ã.ãã®åã«\ïŒããã¯ã¹ã©ãã·ã¥ïŒãã€ããŠããããã ãããã®å Žåã¯ã€ããªããŠãçµæã¯å€ãããªãã -
15å
$ grep exon sox6_tr1.gtf | wc
-
$ sort -k 4,4nr sox6_tr1.gtf
-
2,403 bp
$ awk '$3=="CDS"{sum+=($5-$4+1)} END{print sum}' sox6_tr1.gtf
-
4çš®é¡
$ awk '{print $16}' sox6.gtf | sort -u | wc
-
bias5äžã§å®è¡
[bias5]$ samtools stats etec_bowtie2_sorted.bam | grep ^COV | cut -f2- > etec_bowtie2_cov.txt
-
ããŒã«ã«èšç®æ©ã§å®è¡
[local]$ scp [email protected]:data/IU/etec_bowtie2_cov.txt . ïŒUSERNAMEã¯èªåã®ãŠãŒã¶åãå ¥ããïŒ
-
ããŒã«ã«èšç®æ©ã§Rãç«ã¡äžããŠä»¥äžãå®è¡ããã
> setwd("DIRECTORY")ããïŒDIRECTORYã¯ãã¡ã€ã«ã転éãããã£ã¬ã¯ããªåãå ¥ããïŒ > cov <- read.table("etec_bowtie2_cov.txt", sep="\t", header=FALSE) > plot(cov[,2], cov[,3], type="l", log="y", xlab="coverage", ylab="number of positions")