Entry 6: Rmd & Finding a Dataset (Week 3, 4 lecture notes) - bcb420-2025/Izumi_Ando GitHub Wiki

R Markdown Good Practices

Rmd Cheat Sheet by posit

1 - Table of Contents

  • if you set toc to true in your yaml header, it will generate it for you by picking up on the # headers
# example yaml header from lecture3_Rmarkdown_tips.pdf
---
title: "Something Fun"
output:
  html_document:
    toc: true
    toc_depth: 2
bibliography: my_bibliography.bib
csl: biomed-central.csl
---

2 - Code Chunk Specs

  • make sure you control your code output. you do not want certain messages printing for the viewer, some things are just for you. (ex: debugging print statements)
  • you can see options and defaults by running str(knitr::opts_chunk$get()) (taken from cheat sheet linked above), there is also a table on the cheatsheet as well

3 - Bibtex

  • doesn't have to be this but is a useful tool

How to use Bibtex

  • Step 1: create a bib file.
  • Step 2: add the bib file name and citation style (predifined styles here) into yaml header. not sure if you add the file to the workspace or you just refer to it. will figure out.
  • Step 3: add Bibtex citation to your bib file.
  • Step 4: add citation by adding [@tag_for_publication] to your text, and the in-text citation and references list will be generated.

Types of Expression Data

  • main ones: microarray, bulk RNAseq, single cell RNAseq
  • microarray uses chips with oligonucleotide probes
  • single cell is specialized version of bulk RNA seq

Main focus of this course is bulk RNA seq

  • bulk RNAseq types: short / long read, direct read but majority is short read illumina
  • considerations: # of samples, sample prep method, read depth, single or paired reads
  • processing: alignment & assembly > quantification > normalization & filtering
  • if you are curious, the tools that can be used for each step are listed in the slides

screenshot from lecture3_types_expression_data.pdf

Slide from lecture 3