Entry 4: R Basics Notes - bcb420-2025/Izumi_Ando GitHub Wiki
Chapter 2 : Installing R and RStudio
⏰ estimated time for completion : actual time taken - 30 mins : 50 mins
- skipped the first few tasks because R, RStudio, and Docker are all installed on my machine
Task 4 & 5
Workflow Code
docker run -e PASSWORD=izumita -v "$(pwd)":/home/rstudio/projects -p 8787:8787 risserlin/bcb420-base-image:winter2025-arm64
# opened RStudio on http://localhost:8787/ (acc: rstudio, psw: izumita)
Workflow Notes
- Docker documentation on accessing local files from containers : from what I understand, when you use
-v /HOST/PATH:/CONTAINER/PATH
the container will have access to/HOST/PATH
which in my case would be my entire local machine since I ran the docker in my home directory - created an R notebook through the RStudio window opened through the docker and was able to access it locally as well!
- R Notebook on GitHub
Chapter 3 : Set up R
⏰ estimated time for completion : actual time taken - 30 mins : 25 mins
- function
ls("package")
lists all the functions in a package,list.files
lists all the files in the working directory like thels
command in unix - however,
ls()
lists everything in the current R environment! (variables, dataframes etc) rm(item)
will remove item from the R environment (not a file)- it is generally good practice not to save workspaces because corruptions are hard to debug
- instead, write up scripts that make it easy to recreate environments so you can easily execute and get back to where you were
Task 6 - Git
- set up a new version control project
R_Exercise-BasicSetup
& walked through it - learned how to export Rhistory from the upper right panel! you can also run previous lines of commands through the history tab
- opening a file and a project are two different things!
Task 7 - Working Directory
Ran the following in the console
> getwd()
[1] "/Users/izumiando/Documents/BCB420/R_Exercise-BasicSetup"
> setwd("~")
> getwd()
[1] "/Users/izumiando"
> setwd("Documents/BCB420")
Task 8 - .Rprofile
- looked up
Startup
in help documentation, explains how projects are set up when they are started up - viewed the
.Rprofile
file in theR_Exercise-BasicSetup
R project, contents pasted below
# .Rprofile
# This script is executed during startup
# define init function
init <- function() {
# when executed, run the .init.R script
source(".init.R")
}
# welcome user and prompt to execute init() function
cat("\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n")
cat(" =================")
cat("\n\n")
cat(" WELCOME !\n")
cat("\n")
cat(" Type 'init()' to begin\n\n")
cat("\n")
cat(" =================")
cat("\n\n")
Chapter 4 : Console, Scripts, and Notebooks
⏰ estimated time for completion : actual time taken - 15 mins : 3 mins
- overview of console & notebooks, advantages of using these. nothing new.
Chapter 5 : Getting Help for R
⏰ estimated time for completion : actual time taken - 15 mins : 14 mins
- there are many resources within R and also on the internet to take advantage of
Task 9 - R help
help(item)
=?item
: they pull up help documentation for a function or operation?
also works on operators, ex:?"%in%"
??item
: helps you find things that are close to it, for example when?item
does not yield resultsapropos("word")
: returns all functions that contain the string "word", you can also use regex "^word" returns functions that start with "word", "word$" will return all functions that end with "word"
Task 10 - reproducible example
Notes from resources : How to write a reporducible example and stack overflow thread on minimal reproducible examples (MREs)
- use
dput(data)
this convertsdata
into a reproducible script which you can include in what you are sharing - use minimal datasets, runnable code, and necessary settings for environment (such as
sessionInfo()
orseed
)
Chapter 6 : Basics of R syntax
⏰ estimated time for completion : actual time taken - 10 mins : 15 mins
Task 11, 12 - test the operators and variables
%/%
: integer division%%
: modulo, returns remainder
skipping other operators that I am familiar withtail(var, num)
: returns the last num items in var- be careful of reserved variable names, see
?reserved
- try not to use variables that are the same as parameter names
Chapter 7 : R scalars and vectors
⏰ estimated time for completion : actual time taken - 20 mins : 28 mins
- a given object can be examined by
mode()
,typeof()
, andclass()
==
comparisons between an int and float of the same number will return TRUE but NOT in functionidentical
(x <- 17)
: adding parenthesis around assignments will print out the value- elements in vectors will be coerced into the same type
Indexing & Subsetting
- starts at 1 not 0
head()
,tail()
could be useful- you can also index by range
myVec[1:4]
,myVec[4:1]
(backwards) or usingseq(from=x, to=y, by=z)
- negative indexing will omit the element at that index
- you can subset by boolean vectors
myVec[myVec > 4]
only returns elements that are larger than 4 - you can also subset by names (if available) like "summary(myVec)["Median"]`. In this case, summary(myVec) is a vector with a "Median" element
- "[" is an operator!
- you can also add new items at indexes beyond the length of your current vector (ex, if
length(x)
is 3, you can runx[10] <- 10
). empty slots will be filled withNA
Task 13 - scalars
ran on RStudio
- use the
objectInfo()
in theR_Exercise-BasicSetup
R project to examine different objects - code for
objectInfo()
provided in the resource. this is good to have.
## function (x)
## {
## cat("object contents:")
## print(x, digits = 22)
## cat("\nstructure of object:\n")
## str(x)
## if (!is.list(x)) {
## cat("\nmode: ", mode(x), "\n")
## cat("typeof: ", typeof(x), "\n")
## cat("class: ", class(x), "\n")
## }
## if (!is.null(attributes(x))) {
## cat("\nattributes:\n")
## attributes(x)
## }
## }
Task 14 - vectors
- you can run mathematical operations on vectors : they will be applied to each element
- given a vector
a
,dim(a) <- c(2,4)
creates a matrix with the contents ofa
with the given dimensions dim()
can also return the dimensions of the input
Task 15 - matrices
- matrix indexing / subsetting can be done by
m[row, column]
either with scalars or vectors/ranges
Chapter 8 : Data Frames
⏰ estimated time for completion : actual time taken - 20 mins : 12 mins
Task 16 - Basic operations
- make it a habit to make sure
stringsAsFactors = FALSE
to avoid issues. - basic data frame operations below
# for a data frame called df
rownames(df) # either to return row names or assign (you need <- assignment for this)
nrow(df)
ncol(df)
x <- df[2,] # assigning the second row to x
# assume the name of the second row is "two"
df["two", ] # returns the second row
df[-2, ] # removes the second row
df <- rbind(df, x) # adds the removed row back in at the bottoma
Task 17 - modify data frame
- renamed a single row name in the df by 1) extracting the row names, modifying the element that needs to be changed, reassigning the modified vector as the row names of the data frame
Chapter 9 : Lists
⏰ estimated time for completion : actual time taken - 20 mins : 15 mins
Task 18
- you can index lists by
[x](/bcb420-2025/Izumi_Ando/wiki/x)
where x is a numerical index or by$
like a data frame column
Task 19
pACYC184 <- list(size=4361, marker=c("Amp", "Tet"), ori="ColE1")
pUC19 <- list(size=2686, marker="ampicillin", ori="ColE1", accession="L01397", BanI=c(235, 408, 550, 1647) )
plasmidDB <- list()
plasmidDB["pUC19"](/bcb420-2025/Izumi_Ando/wiki/"pUC19") <- pUC19
plasmidDB["pACYC184"](/bcb420-2025/Izumi_Ando/wiki/"pACYC184") <- pACYC184
plasmidDB["pBR322"](/bcb420-2025/Izumi_Ando/wiki/"pBR322") <- list(size=4361, marker=c("Amp", "Tet"), ori="ColE1")
(sizes <- unlist(lapply(plasmidDB, function(x) x$size)))
min(sizes)
Chapter 10 : Subsetting and filerting R objects
⏰ estimated time for completion : actual time taken - 15 mins : 20 mins
Task 20
- you can subset columns or rows using vectors to take out multiple sections
- you can also subset using booleans
# returns all names in rows that are not "ColE1" in the "Ori" column
plasmidData$Name[plasmidData$Ori != "ColE1"]
# returns TRUE / FALSE for each row
plasmidData$Ori != "ColE1"
## [1] FALSE FALSE TRUE
order(df$size)
will return the ranks of rows indf
by the values in thesize
columndf[order(df$size), ]
will order the entiredf
based on the values in thesize
columngrep("monkey", df$animal)
will return the INDICES of the rows which match "monkey" in the "animal" column
Task 21
Done locally.
Chapter 11 : Control Structures of R
⏰ estimated time for completion : actual time taken - 25 mins : 10 mins so far
- many built in conditional functions
start at task 22