Entry 4: R Basics Notes - bcb420-2025/Izumi_Ando GitHub Wiki

Chapter 2 : Installing R and RStudio

⏰ estimated time for completion : actual time taken - 30 mins : 50 mins

  • skipped the first few tasks because R, RStudio, and Docker are all installed on my machine

Task 4 & 5

Workflow Code

docker run -e PASSWORD=izumita -v "$(pwd)":/home/rstudio/projects -p 8787:8787 risserlin/bcb420-base-image:winter2025-arm64
# opened RStudio on http://localhost:8787/ (acc: rstudio, psw: izumita)

Workflow Notes

  • Docker documentation on accessing local files from containers : from what I understand, when you use -v /HOST/PATH:/CONTAINER/PATH the container will have access to /HOST/PATH which in my case would be my entire local machine since I ran the docker in my home directory
  • created an R notebook through the RStudio window opened through the docker and was able to access it locally as well!
  • R Notebook on GitHub

Chapter 3 : Set up R

⏰ estimated time for completion : actual time taken - 30 mins : 25 mins

  • function ls("package") lists all the functions in a package, list.files lists all the files in the working directory like the ls command in unix
  • however, ls() lists everything in the current R environment! (variables, dataframes etc)
  • rm(item) will remove item from the R environment (not a file)
  • it is generally good practice not to save workspaces because corruptions are hard to debug
  • instead, write up scripts that make it easy to recreate environments so you can easily execute and get back to where you were

Task 6 - Git

  • set up a new version control project R_Exercise-BasicSetup & walked through it
  • learned how to export Rhistory from the upper right panel! you can also run previous lines of commands through the history tab
  • opening a file and a project are two different things!

Task 7 - Working Directory

Ran the following in the console

> getwd()
[1] "/Users/izumiando/Documents/BCB420/R_Exercise-BasicSetup"
> setwd("~")
> getwd()
[1] "/Users/izumiando"
> setwd("Documents/BCB420")

Task 8 - .Rprofile

  • looked up Startup in help documentation, explains how projects are set up when they are started up
  • viewed the .Rprofile file in the R_Exercise-BasicSetup R project, contents pasted below
# .Rprofile
# This script is executed during startup

# define init function
init <- function() {
    # when executed, run the .init.R script
    source(".init.R")
}

# welcome user and prompt to execute init() function
cat("\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n")
cat("    =================")
cat("\n\n")
cat("        WELCOME !\n")
cat("\n")
cat("  Type  'init()'  to begin\n\n")
cat("\n")
cat("    =================")
cat("\n\n")

Chapter 4 : Console, Scripts, and Notebooks

⏰ estimated time for completion : actual time taken - 15 mins : 3 mins

  • overview of console & notebooks, advantages of using these. nothing new.

Chapter 5 : Getting Help for R

⏰ estimated time for completion : actual time taken - 15 mins : 14 mins

  • there are many resources within R and also on the internet to take advantage of

Task 9 - R help

  • help(item) = ?item : they pull up help documentation for a function or operation
  • ? also works on operators, ex: ?"%in%"
  • ??item : helps you find things that are close to it, for example when ?item does not yield results
  • apropos("word") : returns all functions that contain the string "word", you can also use regex "^word" returns functions that start with "word", "word$" will return all functions that end with "word"

Task 10 - reproducible example

Notes from resources : How to write a reporducible example and stack overflow thread on minimal reproducible examples (MREs)

  • use dput(data) this converts data into a reproducible script which you can include in what you are sharing
  • use minimal datasets, runnable code, and necessary settings for environment (such as sessionInfo() or seed)

Chapter 6 : Basics of R syntax

⏰ estimated time for completion : actual time taken - 10 mins : 15 mins

Task 11, 12 - test the operators and variables

  • %/% : integer division
  • %% : modulo, returns remainder
    skipping other operators that I am familiar with
  • tail(var, num) : returns the last num items in var
  • be careful of reserved variable names, see ?reserved
  • try not to use variables that are the same as parameter names

Chapter 7 : R scalars and vectors

⏰ estimated time for completion : actual time taken - 20 mins : 28 mins

  • a given object can be examined by mode(), typeof(), and class()
  • == comparisons between an int and float of the same number will return TRUE but NOT in function identical
  • (x <- 17) : adding parenthesis around assignments will print out the value
  • elements in vectors will be coerced into the same type

Indexing & Subsetting

  • starts at 1 not 0
  • head(), tail() could be useful
  • you can also index by range myVec[1:4], myVec[4:1] (backwards) or using seq(from=x, to=y, by=z)
  • negative indexing will omit the element at that index
  • you can subset by boolean vectors myVec[myVec > 4] only returns elements that are larger than 4
  • you can also subset by names (if available) like "summary(myVec)["Median"]`. In this case, summary(myVec) is a vector with a "Median" element
  • "[" is an operator!
  • you can also add new items at indexes beyond the length of your current vector (ex, if length(x) is 3, you can run x[10] <- 10). empty slots will be filled with NA

Task 13 - scalars

ran on RStudio

  • use the objectInfo() in the R_Exercise-BasicSetup R project to examine different objects
  • code for objectInfo() provided in the resource. this is good to have.
## function (x) 
## {
##     cat("object contents:")
##     print(x, digits = 22)
##     cat("\nstructure of object:\n")
##     str(x)
##     if (!is.list(x)) {
##         cat("\nmode:   ", mode(x), "\n")
##         cat("typeof: ", typeof(x), "\n")
##         cat("class:  ", class(x), "\n")
##     }
##     if (!is.null(attributes(x))) {
##         cat("\nattributes:\n")
##         attributes(x)
##     }
## }

Task 14 - vectors

  • you can run mathematical operations on vectors : they will be applied to each element
  • given a vector a, dim(a) <- c(2,4) creates a matrix with the contents of a with the given dimensions
  • dim() can also return the dimensions of the input

Task 15 - matrices

  • matrix indexing / subsetting can be done by m[row, column] either with scalars or vectors/ranges

Chapter 8 : Data Frames

⏰ estimated time for completion : actual time taken - 20 mins : 12 mins

Task 16 - Basic operations

  • make it a habit to make sure stringsAsFactors = FALSE to avoid issues.
  • basic data frame operations below
# for a data frame called df
rownames(df) # either to return row names or assign (you need <- assignment for this)
nrow(df)
ncol(df)
x <- df[2,] # assigning the second row to x
# assume the name of the second row is "two"
df["two", ] # returns the second row
df[-2, ] # removes the second row
df <- rbind(df, x) # adds the removed row back in at the bottoma

Task 17 - modify data frame

  • renamed a single row name in the df by 1) extracting the row names, modifying the element that needs to be changed, reassigning the modified vector as the row names of the data frame

Chapter 9 : Lists

⏰ estimated time for completion : actual time taken - 20 mins : 15 mins

Task 18

  • you can index lists by [x](/bcb420-2025/Izumi_Ando/wiki/x) where x is a numerical index or by $ like a data frame column

Task 19

pACYC184 <- list(size=4361, marker=c("Amp", "Tet"), ori="ColE1")
pUC19 <- list(size=2686, marker="ampicillin", ori="ColE1", accession="L01397", BanI=c(235, 408, 550, 1647) )
plasmidDB <- list()
plasmidDB["pUC19"](/bcb420-2025/Izumi_Ando/wiki/"pUC19") <- pUC19
plasmidDB["pACYC184"](/bcb420-2025/Izumi_Ando/wiki/"pACYC184") <- pACYC184
plasmidDB["pBR322"](/bcb420-2025/Izumi_Ando/wiki/"pBR322") <- list(size=4361, marker=c("Amp", "Tet"), ori="ColE1")
(sizes <- unlist(lapply(plasmidDB, function(x) x$size)))
min(sizes)

Chapter 10 : Subsetting and filerting R objects

⏰ estimated time for completion : actual time taken - 15 mins : 20 mins

Task 20

  • you can subset columns or rows using vectors to take out multiple sections
  • you can also subset using booleans
# returns all names in rows that are not "ColE1" in the "Ori" column
plasmidData$Name[plasmidData$Ori != "ColE1"]
# returns TRUE / FALSE for each row
plasmidData$Ori != "ColE1"
## [1] FALSE FALSE  TRUE
  • order(df$size) will return the ranks of rows in df by the values in the size column
  • df[order(df$size), ] will order the entire df based on the values in the size column
  • grep("monkey", df$animal) will return the INDICES of the rows which match "monkey" in the "animal" column

Task 21

Done locally.

Chapter 11 : Control Structures of R

⏰ estimated time for completion : actual time taken - 25 mins : 10 mins so far

  • many built in conditional functions

start at task 22