pl1 RNA seq basic - shujishigenobu/omics-collab-cm-nibb GitHub Wiki
ãã®æŒç¿ã§ã¯ã·ãã€ãããºããææã«ãRNA-seqè§£æã®ãã€ãã©ã€ã³ãåŠã¶ãã·ãã€ãããºã(Arabidopsis thaliana)ã¯ã¢ãã©ãç§ã®æ€ç©ã§ã¢ãã«æ€ç©ãšããŠåºãç ç©¶ãããŠãããä»åã®å®éšã§ã¯ãææç°ãªãïŒæ¡ä»¶ã§çè²ããæ€ç©ã®éºäŒåçºçŸãæ¯èŒããããã«RNA-seqè§£æãè¡ã£ããæ¡ä»¶ã¯ã2D sample (2-day darkïŒ2æ¥éæç°å¢ã§çè²ãããé»è²èœçã)ãš2D2L sample (2-day dark + 2-day light; 2æ¥éæç°å¢ã§çè²ãããåŸã«2æ¥éæç°å¢ã§çè²ãããç·åèœçãïŒã®ïŒã€ã簡䟿ã®ãããåè 2D2Lãµã³ãã«ããæãæ¡ä»¶ãåŸè 2Dãµã³ãã«ããæãæ¡ä»¶ãšã以äžèšããææïŒæ¡ä»¶ãããããç¹°ãæ¿ããå®éšã3åè¡ã£ã(biological replicates)ãã€ãŸãïŒæ¡ä»¶x3å埩ïŒïŒãµã³ãã«ããšãªããããããã®ãµã³ãã«æ€ç©ãã宿³ã«åŸã£ãŠRNAãæœåºããã·ã§ãŒããªãŒã忬¡äžä»£ã·ãŒã±ã³ãµãŒIlluminaçšã®RNA-seqã©ã€ãã©ãªãäœæããã宿ããã©ã€ãã©ãªã¯Illumina瀟ã®MiSeqã§paired-endïŒã€ã³ãµãŒãã®äž¡ç«¯ãèªãïŒã®æ¡ä»¶ã§76bpãã€ã·ãŒã¯ãšã³ã¹ãããä»åã®æŒç¿ã§æäŸããã·ãŒã±ã³ã¹ããŒã¿ã¯ãã·ãŒã±ã³ãµãŒããåŸãããçãªãŒãããã§ã«ååŠç (cutadaptãå©çšããŠç¡é§ãªã¢ããã¿ãŒé åã質ã®äœãé åãé€å»)ããŠããã
ãã®æŒç¿ã®ç®æšã¯ãHISAT2 -> StringTie -> edgeR ã®ãã€ãã©ã€ã³ã®åŠç¿ãéããŠãæææ¡ä»¶ã®å·®ã§çºçŸã®å·®ãããéºäŒå=DEG (differential expressed genes)ãåå®ããããšã§ããã
ãã®ãã€ãã©ã€ã³ã¯åºç€çç©åŠç ç©¶æ ã²ãã ã€ã³ãã©ããã£ã¯ã¹ãã¬ãŒãã³ã°ã³ãŒã¹ïŒGITCïŒã®æŒç¿åé¡ case2 ãããŒã¹ã«ãäœæããã
ïŒåèïŒ
- ã·ãã€ãããºã Arabidopsis thaliana ãšã¯ïŒ@åºç€çç©åŠç ç©¶æ
- ã¢ãã«çç©ã®äžçãã·ãã€ãããºããïŒåç» åºç€çç©åŠç ç©¶æ)
以éã®æé ã¯Set up AWSã®äœæ¥ã宿œæžã¿ã§ããããšãåæã§ãããŸã å®è¡ç°å¢ãæºåããŠããªãå Žåã¯Set up AWSãåèã«ç°å¢æ§ç¯ãè¡ã£ãŠãã ããã
äºåã«ç°å¢æ§ç¯ãæºåãã€ã³ã¹ã¿ã³ã¹ã忢ããŠããå Žåã¯ã€ã³ã¹ã¿ã³ã¹ã®èµ·åãããŠãã ããã
SSH Remote Connectionãåèã« SSH ã³ãã³ãã§ä»®æ³ãã·ã³ãžãªã¢ãŒãæ¥ç¶ãè¡ã£ãŠãã ããã
SSH ã³ãã³ãã§ãªã¢ãŒãæ¥ç¶åŸã以äžã®ã³ãã³ãã§æŒç¿ã§å¿ èŠãªãœãããŠã§ã¢ãã€ã³ã¹ããŒã«æžã¿ã®ä»®æ³ç°å¢ãžåãæ¿ãããŠãã ããã
conda activate tutorial-rnaseq
-
- ãªãŒããã²ãã ã«ãããããããœãããŠã§ã¢ã¯ãhisat2ã䜿ãã
-
- ãããã³ã°çµæããéºäŒåããšã«ãªãŒãæ°ãã«ãŠã³ãããããœãããŠã§ã¢ã¯ãstringtieã䜿ãã
-
- ïŒã§åŸãããã«ãŠã³ãããŒã¿ã«åºã¥ããŠæææ¡ä»¶ã®ïŒçŸ€éæ¯èŒã®çµ±èšè§£æãè¡ãããœãããŠã§ã¢ã¯ãedgeRã䜿ãã
ããŒã¿ãã¡ã€ã«ã以äžããååŸããŠãã ããã
ããäžã®URLã«ããŸãæ¥ç¶ã§ããªãå Žåã¯ãGoogle DriveããååŸããŠãã ããã
- pl1-data.tar.gz (261M)
è§£åãããšãdataãã©ã«ããçæããããdataãã©ã«ã以äžã«æ¬ãã³ãºãªã³å®ç¿ã«äœ¿ãããŒã¿ãã¡ã€ã«ãå«ãŸãããdataãã©ã«ãããšã䜿ããããå Žæã«ç§»åããã以äžã§ã¯ãããŒã ãã£ã¬ã¯ããªïŒ~/ïŒã®çŽäžã«dataãã©ã«ããç§»åããŠãããã®ãšããŠè§£èª¬ããã
Input reads
ãã¡ã€ã«ã¯ã~/data/reads ã«ãããpaired-end ã·ãŒã±ã³ã¹ããŠããã®ã§ïŒæçã®äž¡æ¹ããèªãã§ããïŒãïŒãµã³ãã«ãããããïŒã€ãã€ãã¡ã€ã«ããã(Read1, Read2ãããã¯forward, reverseãšåŒã°ããïŒã
- 2D_rep1: å®éšæ¡ä»¶ïŒæ, ç¹°ãè¿ãå®éš#1: 2D_rep1_R1.fastq, 2D_rep1_R2.fastq [R1: Read1 (Fwd); R2: Read2 (Rev) 以äžåæ§]
- 2D_rep2: å®éšæ¡ä»¶ïŒæ ç¹°ãè¿ãå®éš#2: 2D_rep2_R1.fastq, 2D_rep2_R2.fastq
- 2D_rep3: å®éšæ¡ä»¶ïŒæ, ç¹°ãè¿ãå®éš#3: 2D_rep3_R1.fastq, 2D_rep3_R2.fastq
- 2D2L_rep1: å®éšæ¡ä»¶ïŒæ, ç¹°ãè¿ãå®éš#1: 2D2L_rep1_R1.fastq, 2D2L_rep1_R2.fastq
- 2D2L_rep2: å®éšæ¡ä»¶ïŒæ, ç¹°ãè¿ãå®éš#2: 2D2L_rep2_R1.fastq, 2D2L_rep2_R2.fastq
- 2D2L_rep3: å®éšæ¡ä»¶ïŒæ, ç¹°ãè¿ãå®éš#3, 2D2L_rep3_R1.fastq, 2D2L_rep3_R2.fastq
å®éšæ¡ä»¶æ£ç¢ºã«ã¯ã2D=2æ¥éæã2D2L=2æ¥éæã®åŸ2æ¥éæ
Reference
- genome sequence: genome.fa -- ã·ãã€ãããºãã®ã²ãã ã·ãŒã±ã³ã¹ãFASTA ãã©ãŒãããã
- gene annotation: genes.gtf -- ã·ãã€ãããºãã®éºäŒåã¢ãããŒã·ã§ã³æ å ±ãGTF ãã©ãŒãããã
Software
- hisat2
- stringtie
- samtools
- edgeR
äžèšç°å¢æ§ç¯ã®ã¹ãããã§ã€ã³ã¹ããŒã«æžã¿ã
äœæ¥ãã£ã¬ã¯ããªãäœæãã以äžã®è§£æã¯ãã®äžã§äœæ¥ãããã
$ mkdir project-1
$ cd project-1
ãã¡ã€ã«ã«ã¢ã¯ã»ã¹ããããããã«ã·ã³ããªãã¯ãªã³ã¯ã貌ã£ãŠãããã
ln -s ../data/genome.fa
ln -s ../data/genes.gtf
ln -s ../data/reads
ã€ã³ã¹ããŒã«ããcondaç°å¢ãactivateããïŒãŸã ã§ããã°ïŒ
conda activate tutorial-rnaseq
è§£æå¯Ÿè±¡ã®ã·ãŒã±ã³ã¹ãªãŒãã®åºæ¬æ å ±ïŒãªãŒãã®æ¬æ°ãé·ããªã©ïŒãååŸãããseqkitã® stats ãµãã³ãã³ãã䟿å©ã
seqkit stats reads/*.fastq.gz
åºåçµæã®äŸ
file format type num_seqs sum_len min_len avg_len max_len
data/reads/2D2L_rep1_R1.fastq.gz FASTQ DNA 539,633 40,955,449 50 75.9 76
data/reads/2D2L_rep1_R2.fastq.gz FASTQ DNA 539,633 40,933,587 50 75.9 76
data/reads/2D2L_rep2_R1.fastq.gz FASTQ DNA 479,469 36,390,170 50 75.9 76
data/reads/2D2L_rep2_R2.fastq.gz FASTQ DNA 479,469 36,371,085 50 75.9 76
data/reads/2D2L_rep3_R1.fastq.gz FASTQ DNA 403,488 30,623,819 50 75.9 76
data/reads/2D2L_rep3_R2.fastq.gz FASTQ DNA 403,488 30,610,016 50 75.9 76
data/reads/2D_rep1_R1.fastq.gz FASTQ DNA 377,791 28,676,854 50 75.9 76
data/reads/2D_rep1_R2.fastq.gz FASTQ DNA 377,791 28,657,121 50 75.9 76
data/reads/2D_rep2_R1.fastq.gz FASTQ DNA 328,491 24,922,717 50 75.9 76
data/reads/2D_rep2_R2.fastq.gz FASTQ DNA 328,491 24,919,185 50 75.9 76
data/reads/2D_rep3_R1.fastq.gz FASTQ DNA 430,418 32,661,169 50 75.9 76
data/reads/2D_rep3_R2.fastq.gz FASTQ DNA 430,418 32,648,214 50 75.9 76
hisat2ã§æ€çŽ¢ããããã«ã¯ãªãã¡ã¬ã³ã¹ã²ãã ã®é åãã€ã³ããã¯ã¹åããæºåãå¿ èŠ
[Ohmura]ãã¡ã€ã«ãã¹ä¿®æ£: ./genome.fa
(ã·ã³ããªãã¯ãªã³ã¯) ã../data/genome.fa
ã®ã©ã¡ãããž
hisat2-build ../genome.fa genome
genome.1.ht2ããªã©ïŒã€ã®ãã¡ã€ã«ãçæãããã
hisat2ã䜿ã£ãŠãªãŒãé åãã²ãã ïŒäžèšã§äœæããã€ã³ããã¯ã¹ãå©çšïŒã«ãããã³ã°ããããŸããhisat2ã³ãã³ãã®äœ¿ãæ¹ã確èªã
hisat2 --help
Usage:
hisat2 [options]* -x <ht2-idx> {-1 <m1> -2 <m2> | -U <r>} [-S <sam>]
<ht2-idx> Index filename prefix (minus trailing .X.ht2).
<m1> Files with #1 mates, paired with files in <m2>.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<m2> Files with #2 mates, paired with files in <m1>.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<r> Files with unpaired reads.
Could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
<sam> File for SAM output (default: stdout)
<m1>, <m2>, <r> can be comma-separated lists (no whitespace) and can be
specified many times. E.g. '-U file1.fq,file2.fq -U file3.fq'.
以äžç¥
-x ã®åŸã«å ã»ã©äœæããã€ã³ããã¯ã¹åãä»åã¯paired-endã·ãŒã±ã³ã¹ãªã®ã§ã-1ãš-2ãªãã·ã§ã³ã§ããããRead1, Read2ã®ãã¡ã€ã«åãæå®ãåºåã¯samãã©ãŒãããã§æ¬²ããã®ã§ã-Sã®åŸã«åºåå ã®ãã¡ã€ã«åã
ïŒäŸïŒ
hisat2 -p 4 --dta -x genome -1 reads/2D_rep1_R1.fastq.gz -2 reads/2D_rep1_R2.fastq.gz -S 2D_rep1.sam
ãã®ä»ã®ãªãã·ã§ã³ã®èª¬æ
- -p: number of alignment threadsãå©çšããŠããèšç®æ©ç°å¢ã«åãããŠèšå®ãããä»å䜿ã£ãŠããEC2ã¯vCPUãïŒã€ãªã®ã§ïŒä»¥äžãæå®ã
- --dta: report alignments tailored for transcript assemblers including StringTie. ãã®åŸãStringTieã䜿ãã®ã§ããã®ãªãã·ã§ã³ãã€ããã
ä»ã®ïŒã€ãåæ§ã«ã
hisat2 -p 4 --dta -x genome -1 reads/2D_rep2_R1.fastq.gz -2 reads/2D_rep2_R2.fastq.gz -S 2D_rep2.sam
hisat2 -p 4 --dta -x genome -1 reads/2D_rep3_R1.fastq.gz -2 reads/2D_rep3_R2.fastq.gz -S 2D_rep3.sam
hisat2 -p 4 --dta -x genome -1 reads/2D2L_rep1_R1.fastq.gz -2 reads/2D2L_rep1_R2.fastq.gz -S 2D2L_rep1.sam
hisat2 -p 4 --dta -x genome -1 reads/2D2L_rep2_R1.fastq.gz -2 reads/2D2L_rep2_R2.fastq.gz -S 2D2L_rep2.sam
hisat2 -p 4 --dta -x genome -1 reads/2D2L_rep3_R1.fastq.gz -2 reads/2D2L_rep3_R2.fastq.gz -S 2D2L_rep3.sam
hisat2å®è¡æã«mappning statsæ å ±ãreportããããããã¯ãããã³ã°çãªã©å®éšã®æåŠãè©äŸ¡ã§ããæ°å€ã§ããã®ã§èšé²ããŠãããšããããããã¯ä»¥äžã®ããã«ã--new-summary --summary-fileã®ãªãã·ã§ã³ãä»ããŠãããšãmappning statsæ å ±ãæå®ãããã¡ã€ã«ã«åºåãããã
$ hisat2 -p 4 --dta --new-summary --summary-file 2D_rep1.summary -x genome -1 reads/2D_rep1_R1.fastq.gz -2 reads/2D_rep1_R2.fastq.gz -S 2D_rep1.sam
hisat2ã®çµæãã¡ã€ã«ãSAMãã©ãŒãããããBAMã«å€æãããããã«ã€ã³ããã¯ã¹ãäœæããããããã®åŠçã«ã¯ãsamtoolsã䜿ãã
samtools sort -@ 3 -o 2D_rep1.sorted.bam 2D_rep1.sam
samtools index 2D_rep1.sorted.bam
ä»ã®ïŒã€ãåæ§ãã¡ãªã¿ã«ãã·ã§ã«ã¹ã¯ãªããã§ïŒã€èªåã§åŠçããã«ã¯ä»¥äžã®ããã«ããã
for f in *sam
do
samtools sort -@ 3 -o `basename $f .sam`.sorted.bam $f
done
for f in *bam
do
samtools index $f
done
ïŒç·šé泚ïŒãã®ã»ã¯ã·ã§ã³ã¯æ¹å€ãæ€èšäžãIGVã®ããŒã«ã«ç°å¢ã®ã€ã³ã¹ããŒã«ã®èª¬æãäžã§ãŸã è¡ã£ãŠããªããAWSããããŒã¿ããŠã³ããŒãã«ã¯è²»çšãããããbam fileã®åèšãµã€ãºã調ã¹ããš300MBãããã転éæé5å以äžãªã®ã§èš±å®¹ç¯å²ããSSïŒ
ã²ãã ãã©ãŠã¶IGVäžã§ããããã³ã°çµæãå¯èŠåããã
- æå
ã®ãã·ã³ã«å
šãŠã®
sorted.bam
åã³ãsorted.bam.bai
ãã¡ã€ã«ãscp
ã³ãã³ãã§è»¢éããŠããããšã- å®è¡ã³ãã³ãäŸ:
scp -i ~/.ssh/handson.pem ubuntu@[instance-public-dns-name]:/home/ubuntu/project-1/*.bam* .
- å®è¡ã³ãã³ãäŸ:
- IGVãç«äžããã
- ã¡ãã¥ãŒ
Genomes > Load Genome From File...
ã§ïœenome.fa
ãéžæãFile > Load from File ...
ã§genes.gtf
ãéžæ -
File > Load from File ...
ã§äœè£œãã.sorted.bam
ãèªã¿èŸŒã - é©åœã«ãºãŒã ã¢ããããã

stringtie -p 4 -e -G genes.gtf -o stringtie_count/2D_rep1/2D_rep1.gtf 2D_rep1.sorted.bam
stringtie -p 4 -e -G genes.gtf -o stringtie_count/2D_rep2/2D_rep2.gtf 2D_rep2.sorted.bam
stringtie -p 4 -e -G genes.gtf -o stringtie_count/2D_rep3/2D_rep3.gtf 2D_rep3.sorted.bam
stringtie -p 4 -e -G genes.gtf -o stringtie_count/2D2L_rep1/2D2L_rep1.gtf 2D2L_rep1.sorted.bam
stringtie -p 4 -e -G genes.gtf -o stringtie_count/2D2L_rep2/2D2L_rep2.gtf 2D2L_rep2.sorted.bam
stringtie -p 4 -e -G genes.gtf -o stringtie_count/2D2L_rep3/2D2L_rep3.gtf 2D2L_rep3.sorted.bam
-e ãªãã·ã§ã³ãã€ããããšã«ããã-Gã§æå®ããgtfãã¡ã€ã«ã«èšèŒãããŠããéºäŒåã¢ãã«ã®ã¿ãè§£æå¯Ÿè±¡ã«ãããïŒãã-eãã€ããªããã°æ°èŠã®éºäŒåãæ¢çŽ¢ããïŒããããã®ã³ãã³ãã«ããããã®åŸã®è§£æã«çšããã«ãŠã³ãæ å ±ãå«ãŸããæ°ããªgtfãã¡ã€ã«ãã-oã§æå®ãããã¡ã€ã«ãžåºåãããã
äžèšã®stringtieã®çµæã®gtfåºåãããéºäŒåããšã«ãªãŒããäœåããã®ãã®ã«ãŠã³ãããŒã¿ãããŒãã«åœ¢åŒã§åºåãããã€ãŸã,
è¡ïŒéºäŒåæ° x åïŒæ¡ä»¶æ°
ã®ããŒãã«ïŒmatrixïŒãäœæããããã®åœ¢åŒã®ããŒãã«ã¯ãå€ãã®éºäŒåçºçŸè§£æãçºçŸå€åã®çµ±èšè§£æããããã¯ãŒã¯è§£æã®ããŒã«ã§å ¥åæ å ±ãšããŠæ±ããããã
stringtieã®gtfåºåãã«ãŠã³ãããŒã¿ã«å€æããã«ã¯ãstrinttieã®éçºè ãæäŸããŠãããprepDE.pyãã¹ã¯ãªãããå©çšãããä»åã¯biocondaã»ããã¢ããæã«ã€ã³ã¹ããŒã«æžã¿ã§ããããã¹ãéã£ãŠããã
prepDE.py -i stringtie_count
以äžã®ïŒã€ã®çµæãã¡ã€ã«ãçæãããã
- gene_count_matrix.csv
- transcript_count_matrix.csv
åè ãgeneïŒéºäŒåïŒåäœã®ã«ãŠã³ãããŒã¿ãåŸè ãtranscriptïŒmRNAãã¹ãã©ã€ã·ã³ã°ããªã¢ã³ãšãååšããå Žåã¯å¥ã ã«ã«ãŠã³ã)åäœã®ã«ãŠã³ãããŒã¿ãå€ãã®è§£æã¯éºäŒååäœã§è¡ãã®ã§ãäž»ã«åè ã®ãã¡ã€ã«ã䜿ãã
less ã³ãã³ã, wc ã³ãã³ãã§å 容確èª
(ex)
gene_id,2D2L_rep1,2D2L_rep2,2D2L_rep3,2D_rep1,2D_rep2,2D_rep3
AT1G01020|ARV1,8,8,4,5,0,19
AT1G01060|LHY,3,3,0,0,0,4
AT1G01070|AT1G01070,11,9,4,0,0,0
AT1G01040|DCL1,36,18,22,24,42,18
AT1G01046|MIR838A,0,0,1,0,0,0
AT1G01050|AtPPa1,67,70,54,45,33,25
AT1G01080|AT1G01080,76,58,58,68,38,55
...
wc gene_count_matrix.csv
33603 33680 1140286 gene_count_matrix.csv
ã«ã³ãåºåãã®CSVãã©ãŒãããã§ããããšã33602 x 6 ã® matrixã§ããããšããããã
edgeR ã«ããçºçŸå€åè§£æ -- 2D2L / 2D ã®æ¡ä»¶ã®éãã§çµ±èšçã«ææã«çºçŸãç°ãªãéºäŒåãåå®ãã
åã¹ããããŸã§ã§åŸãããéºäŒåçºçŸã«ãŠã³ãããŒã¿ã«åºã¥ããŠã2D2L ãš 2D ã®ïŒã€ã®æ¡ä»¶ã§çµ±èšçã«ææã«çºçŸã¬ãã«ãç°ãªãéºäŒåãèŠã€ãããããã§ã¯ãRã®ã©ã€ãã©ãªã®äžã€ã§ããedgeRãçšããŠïŒçŸ€éæ¯èŒã®ãã€ãã©ã€ã³ãRç°å¢ã§å®æœããã
ã³ãã³ãã©ã€ã³ã«ãŠRç°å¢ãèµ·åããã
$ R
以äžRç°å¢
> library(edgeR) # edgeRã©ã€ãã©ãªãèªã¿èŸŒã
# åã¹ããããŸã§ã§åŸãããéºäŒåããšã®ãªãŒãã«ãŠã³ãããŒã¿ïŒCSVãã©ãŒãããïŒãèªã¿èŸŒããããŒã¿ãã¬ãŒã ã«æ ŒçŽãããã
> dat <- read.csv("gene_count_matrix.csv",row.names=1)
> head(dat) # é©åã«å
¥åã§ããã確èª
X2D2L_rep1 X2D2L_rep2 X2D2L_rep3 X2D_rep1 X2D_rep2 X2D_rep3
MSTRG.1|ARV1 8 8 4 5 0 19
AT1G01060|LHY 3 3 0 0 0 4
AT1G01070|AT1G01070 11 9 4 0 0 0
MSTRG.6|DCL1 36 18 22 24 42 18
MSTRG.6|MIR838A 0 0 1 0 0 0
MSTRG.7|AtPPa1 67 70 54 45 33 25
# ããããedgeRã䜿ã£ãè§£æ
> grp <- c(rep("2D2L", 3), rep("2D", 3)) # ïŒãµã³ãã«ãããããïŒæ¡ä»¶ã®ã©ã¡ãã«å±ãããå®çŸ©
> grp
[1] "2D2L" "2D2L" "2D2L" "2D" "2D" "2D"
> D <- DGEList(dat, group=grp) # å
ã»ã©èªã¿èŸŒãã ã«ãŠã³ãããŒã¿ã®ããŒã¿ãã¬ãŒã ãedgeRã§æ±ãããã®ãªããžã§ã¯ãã«å€æããã
> D <- calcNormFactors(D, method="TMM") # Normalization ïŒæšæºåïŒ
> D <- estimateCommonDisp(D) # å®ããŒã¿ãã確çååžïŒè² ã®äºé
ååžïŒã®ãã©ã¡ãŒã¿æšå® step-1
> D <- estimateTagwiseDisp(D) # å®ããŒã¿ãã確çååžïŒè² ã®äºé
ååžïŒã®ãã©ã¡ãŒã¿æšå® step-2
# ãããŸã§ã§æºåå®äºãäžèšã¹ãããã§æšå®ãããåçš®ãã©ã¡ãŒã¿ã確èªããŠããã
> D$samples # normalization ã®å¹æã確èª
group lib.size norm.factors
X2D2L_rep1 2D2L 987874 1.0267383
X2D2L_rep2 2D2L 824228 0.9305706
X2D2L_rep3 2D2L 689697 0.9888950
X2D_rep1 2D 597188 1.0250815
X2D_rep2 2D 516354 1.1027216
X2D_rep3 2D 655218 0.9363031
> D$common.dispersion #è² ã®äºé
ååžã®common dispersion
[1] 0.243539
> summary(D$tagwise.dispersion) #è² ã®äºé
ååžã®tagããšïŒä»åã®å Žåå ŽåéºäŒåããšã®ïŒdispersion
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.003805 0.218418 0.318444 1.599216 2.219825 9.752049
ããã§ã¯ãããããçºçŸå€åéºäŒå(Differential Expressed Genes; DEG)ã®åå®ãedgeRã®exactTest颿°ã䜿ããããã¯äžèšã®ããã«æšå®ãããããŒã¿ã®ç¢ºçååžã«åºã¥ããŠãçµ±èšçã«ææã«æ¡ä»¶éã§çºçŸã¬ãã«ãç°ãªããã©ãããtestããã
> de.2D.vs.2D2L <- exactTest(D, pair=c("2D", "2D2L"))
# pvalueãäžäœã®éºäŒåãèŠãŠã¿ã
> topTags(de.2D.vs.2D2L)
Comparison of groups: 2D2L-2D
logFC logCPM PValue FDR
AT2G34430|LHB1B1 10.394710 10.665673 1.995311e-138 6.704643e-134
AT3G01500|CA1 4.743305 11.015099 5.666909e-135 9.520974e-131
AT5G54770|THI1 6.910262 10.734379 2.778929e-131 3.112586e-127
AT2G38530|LTP2 -5.467124 11.523626 1.033709e-127 8.683676e-124
AT4G14130|XTR7 -6.262324 9.393107 5.761200e-126 3.871757e-122
AT5G54190|PORA -12.891697 9.390253 1.750147e-123 9.801404e-120
AT3G21720|ICL -6.505456 11.583401 8.206907e-123 3.939550e-119
AT3G54890|LHCA1 5.230979 11.440747 2.381888e-121 1.000453e-117
AT3G47470|LHCA4 3.859215 11.471514 2.015963e-102 7.526709e-99
AT2G05070|LHCB2 5.400649 9.067715 1.141347e-99 3.835153e-96
# èšç®çµæãã¿ãåºåãããã¹ãã«åºåããã
> tmp <- topTags(de.2D.vs.2D2L, n=nrow(de.2D.vs.2D2L))
> write.table(tmp$table, "de.2D.vs.2D2L.txt", sep="\t", quote=F)
Rç°å¢ããæããŠãde.2D.vs.2D2L.txtããã¡ã€ã«ãçæãããŠãããã確èªãäžèº«ãlessçã§ç¢ºèªã
DEGéºäŒåã®æ€èšŒãããšãã°TopTagsã§ç¬¬ïŒäœã®âAT2G34430|LHB1B1âéºäŒåã«çç®ãããããã®éºäŒåã¯ããªãŒãã«ãŠã³ãã¯ä»¥äžã®ããã«ãªã£ãŠããã
(Rç°å¢ã®å Žå)
> dat["AT2G34430|LHB1B1",]
X2D2L_rep1 X2D2L_rep2 X2D2L_rep3 X2D_rep1 X2D_rep2 X2D_rep3
AT2G34430|LHB1B1 2743 2875 2227 2 2 0
(ã³ãã³ãã©ã€ã³ã®å Žå)
$ grep "^AT2G34430|LHB1B1" gene_count_matrix.csv
AT2G34430|LHB1B1,2743,2875,2227,2,2,0
ã€ãŸãã2D2Læ¡ä»¶ïŒææ¡ä»¶ïŒã§ã¯çºçŸã¬ãã«ãé«ãã2Dæ¡ä»¶ïŒææ¡ä»¶ïŒã§ã»ãšãã©çºçŸããŠããªããã©ã®ãããªæ©èœã®éºäŒåãªã®ã§ããããïŒã¢ãã«æ€ç©ã§ããã·ãã€ãããºãã§ã¯éºäŒåæ å ±ãèç©ãããŠããã®ã§NCBIã®ããŒã¿ããŒã¹ã§ç¢ºèªããŠã¿ãã
- https://www.ncbi.nlm.nih.gov/gene/ -- NCBI Gene ããŒã¿ããŒã¹
äžèšããŒã¿ããŒã¹ã®æ€çŽ¢ãŠã£ã³ããŠã«ãAT2G34430ãã®IDãå ¥åããŠã¿ããã
ãlight-harvesting chlorophyll-protein complex II subunit B1ããšã¢ãããŒã·ã§ã³ãããŠããã説æãèªããšãèç·äœå éšã®ã¯ãããã£ã«ãæ§æããå åæã«é¢ããã¿ã³ãã¯è³ªãã³ãŒãããéºäŒåã§ããããšãããããææ¡ä»¶ã§ã®ã¿çºçŸãé«ãã£ãããšãçŽåŸããçµæã§ããã
hisat2
https://ccb.jhu.edu/software/hisat2/index.shtml
stringtie
http://www.ccb.jhu.edu/software/stringtie/
edgeR
http://bioconductor.org/packages/release/bioc/html/edgeR.html
è«ææ å ±
Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown https://www.nature.com/articles/nprot.2016.095