Entry 6: Rmd & Finding a Dataset (Week 3, 4 lecture notes) - bcb420-2025/Izumi_Ando GitHub Wiki

R Markdown Good Practices

Rmd Cheat Sheet by posit

1 - Table of Contents

if you set toc to true in your yaml header, it will generate it for you by picking up on the # headers

# example yaml header from lecture3_Rmarkdown_tips.pdf
---
title: "Something Fun"
output:
  html_document:
    toc: true
    toc_depth: 2
bibliography: my_bibliography.bib
csl: biomed-central.csl
---

2 - Code Chunk Specs

make sure you control your code output. you do not want certain messages printing for the viewer, some things are just for you. (ex: debugging print statements)
you can see options and defaults by running str(knitr::opts_chunk$get()) (taken from cheat sheet linked above), there is also a table on the cheatsheet as well

3 - Bibtex

doesn't have to be this but is a useful tool

How to use Bibtex

Step 1: create a bib file.
Step 2: add the bib file name and citation style (predifined styles here) into yaml header. not sure if you add the file to the workspace or you just refer to it. will figure out.
Step 3: add Bibtex citation to your bib file.
Step 4: add citation by adding [@tag_for_publication] to your text, and the in-text citation and references list will be generated.

Types of Expression Data

main ones: microarray, bulk RNAseq, single cell RNAseq
microarray uses chips with oligonucleotide probes
single cell is specialized version of bulk RNA seq

Main focus of this course is bulk RNA seq

bulk RNAseq types: short / long read, direct read but majority is short read illumina
considerations: # of samples, sample prep method, read depth, single or paired reads
processing: alignment & assembly > quantification > normalization & filtering
if you are curious, the tools that can be used for each step are listed in the slides

screenshot from lecture3_types_expression_data.pdf

Slide from lecture 3