Course: VT25 R programming (SC00035)

Getting Started

Open RStudio (or RGui)
Type a simple mathematical expression (e.g. 1+2) in the console and press enter. The code is run immediately.

You can see the code you ran in the console, but it cannot be edited. You can however scroll through the latest commands by pushing the “up” or “down” keys on your keyboard, edit them, and run them again.

Working with scripts

Open a new R script (File->New file->R script) and save it in a suitable folder. It is a good idea to save it using an .R ending, so you know that it is R code in the file. Also, using spaces in file names may cause you trouble later on, so try to use “_” instead.

All code you want to save should be written into scripts like this or, as we will see later, into other types of R files. Then you can go back later on to see what you did. In a script you can use # to write comments to your code. Commenting your code will make it easier to read or to remember what the code does. Anything written on a line after # will be disregarded when running the code.
Type the same simple expression as before (e.g. 1+2) into your new file. You can run a line by placing the cursor on the line and then:
1. either click the Run button in the top left corner of the editor,
2. or press Ctrl+Enter (in Rstudio)
3. or press Ctrl+R (in RGui).
Assign the expression to a variable called my_expr and run the code again
Solution
```
my_expr <- 1+2
```
On a new line use the print() function with the variable my_expr as argument to print the value of your new variable in the console.

The print() function is used if you run your entire script from a different location or a from a different script. We will see more of that later.
Solution
```
print(my_expr)
```
What happens if you put citation marks around the name of the variable in the print() function?
Solution
```
print("my_expr")
```
Save the script and then run the entire script. You can do this by:
1. marking all code (using the pointer or Ctrl+A) and press Ctrl+Enter, Ctrl+R or the run button
2. pressing the “Source” button in the top right of the editor, which runs the entire script
3. Using the source() function in a different script, with the script you want to run as argument (including the path to where it is stored), e.g: source("C:/My_R_Scripts/My_script.R"). This is the case where the print() function will be needed (without it nothing will be printed in the console).

Vectors

Create a vector with the numbers 1, 2, … , 10 and assign it to the variable x.
Solution
```
x <- 1:10
```
Select the five first elements in x and assign them to the variable x1.
Solution
```
x1 <- x[1:5]
```
Select the five last elements in x and assign them to the variable x2.
Solution
```
x2 <- x[6:10]
```
Calculate the sum of x1 and x2. What did you get?
Solution
```
x1 + x2
```
Now repeat the three steps above but take the three first elements and the seven last elements. Calculate the sum of these two vectors. What did you get now?

Here is a warning! You can take sums of vectors of different length and get a result. The shorter vector will be recycled until the end of the longer vector. If you’re lucky you get a warning but always check the results so you got what you expected.
Solution
```
x1<-x[1:3]
x2<-x[4:10]
x1+x2
```
Now set the variable n to equal 10 and create the following vector
```
y1 <- 1:n-1
```
What did you get and why?
Solution
```
n <- 10
y1 <- 1:n-1
```
Create a second vector
```
y2 <- 1:(n-1)
```
What is the difference?

What we see here is an example of how operations are prioritized. : has higher priority than -. Higher prioritization will be executed first. This is good to be aware of. Try out another example:
```
1+2*3 
```
What do you think the result is? Which operation has the highest priority? How can you write this so that the result equals 9?
Solution
```
(1+2)*3
```

Factors

Insert a new R code chunk where you create two vectors, measure and type, of length 10 using the following code
```
measure <- sample(1:3, 10, replace=T)
type <- sample(c("a", "b", "c"), 10, replace=T)
```
Translate type into a factor using the function as.factor().
Solution
```
type <- as.factor(type)
```
Print out type to see how it looks.
Solution
```
print(type)
```
Now print the expression c(type,"d").
Solution
```
print(c(type,"d"))
```
You will notice that a, b and c in type suddenly changed to numbers. Factors are troublesome in many ways and you need to be careful when you work with them, but when we can use them as categories in statistical analysis they are really useful. We will work more with that in the statistics part of the course but here is a short example. Copy and run the code:
```
tapply(measure, type, mean)
```
This is a very quick and easy way to calculate the mean of measure within each factor in type. To place it in a real context, you can imagine that type was some sample type and measure was some measure performed on the samples.

Data frames

We will continue with the vectors type and measure from the previous task. Create a data frame called mydata with these two vectors as columns. You can use the function data.frame().
Solution
```
mydata <- data.frame(type, measure)
```
Use the functions
1. head(),
2. tail(),
3. summary(),
4. and dim()
to look at the data frame.
Now, let’s say that we are only interested in sample type a. Look at the column called type using the dollar sign $.
Solution
```
mydata$type
```
Use the == operator to find out which of the elements in this column that have the value a. Type
```
mydata$type == "a"
```
What you got is a vector of logical values?
Use the which() function with the expression above as argument. This will give you a vector with the positions of the type vector that equals a. Assign the vector of positions to a variable that you call pos
Solution
```
pos <- which(mydata$type=="a")
```
Use square brackets to extract a subset of the data frame that contains type a (i.e the row numbers in pos). Inside the square brackets you first tell which rows, then which columns you want to see. Remember from the lecture that an empty space means that all should be used.
Solution
```
mydata[pos,]
```
Assign the subset to a variable called mydata_out.
Solution
```
mydata_out <- mydata[pos,]
```

Reading and writing text files

We will now print mydata_out to a tab separated text file using the write.table() function. Play around with the arguments
1. col.names
2. row.names
3. quote
4. sep
in write.table() and see what happens. You can open the file in a simple text editor to see how the output changes when you change the arguments.
Solution
```
write.table(mydata_out, 
      "labtest20240422.txt",
      col.names=T, row.names=F, quote=F, sep="\t")
```

Now use the read.table() function to read the same data into R again, and save it as mydata_in.

Solution

mydata_in <- read.table("labtest20240422.txt", 
      header = T, sep="\t", as.is = T)

Are mydata_in and mydata_out identical? Use the summary() function on each of them. Is there any difference?

Home: R programming

Developed by Maria Nethander, 2017, Modified by Fanny Berglund, 2024

R I: Introduction to R - BDC-training/VT25 GitHub Wiki

Getting Started

Working with scripts

Vectors

Factors

Data frames

Reading and writing text files

Home: R programming

⚠️ GitHub.com Fallback ⚠️

R I: Introduction to R - BDC-training/VT25 GitHub Wiki

Getting Started

Working with scripts

Vectors

Factors

Data frames

Reading and writing text files

Home: R programming

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️