R I: Introduction to R - BDC-training/VT25 GitHub Wiki
Course: VT25 R programming (SC00035)
-
Open
RStudio
(or RGui) -
Type a simple mathematical expression (e.g. 1+2) in the console and press enter. The code is run immediately.
You can see the code you ran in the console, but it cannot be edited. You can however scroll through the latest commands by pushing the “up” or “down” keys on your keyboard, edit them, and run them again.
-
Open a new R script (
File->New file->R script
) and save it in a suitable folder. It is a good idea to save it using an.R
ending, so you know that it is R code in the file. Also, using spaces in file names may cause you trouble later on, so try to use “_” instead.All code you want to save should be written into scripts like this or, as we will see later, into other types of R files. Then you can go back later on to see what you did. In a script you can use
#
to write comments to your code. Commenting your code will make it easier to read or to remember what the code does. Anything written on a line after#
will be disregarded when running the code. -
Type the same simple expression as before (e.g. 1+2) into your new file. You can run a line by placing the cursor on the line and then:
- either click the
Run
button in the top left corner of the editor, - or press
Ctrl+Enter
(in Rstudio) - or press
Ctrl+R
(in RGui).
- either click the
-
Assign the expression to a variable called
my_expr
and run the code againSolution
my_expr <- 1+2
-
On a new line use the
print()
function with the variablemy_expr
as argument to print the value of your new variable in the console.The
print()
function is used if you run your entire script from a different location or a from a different script. We will see more of that later.Solution
print(my_expr)
-
What happens if you put citation marks around the name of the variable in the
print()
function?Solution
print("my_expr")
-
Save the script and then run the entire script. You can do this by:
- marking all code (using the pointer or
Ctrl+A
) and pressCtrl+Enter
,Ctrl+R
or the run button - pressing the “Source” button in the top right of the editor, which runs the entire script
- Using the
source()
function in a different script, with the script you want to run as argument (including the path to where it is stored), e.g:source("C:/My_R_Scripts/My_script.R")
. This is the case where theprint()
function will be needed (without it nothing will be printed in the console).
- marking all code (using the pointer or
-
Create a vector with the numbers 1, 2, … , 10 and assign it to the variable
x
.Solution
x <- 1:10
-
Select the five first elements in
x
and assign them to the variablex1
.Solution
x1 <- x[1:5]
-
Select the five last elements in x and assign them to the variable
x2
.Solution
x2 <- x[6:10]
-
Calculate the sum of
x1
andx2
. What did you get?Solution
x1 + x2
-
Now repeat the three steps above but take the three first elements and the seven last elements. Calculate the sum of these two vectors. What did you get now?
Here is a warning! You can take sums of vectors of different length and get a result. The shorter vector will be recycled until the end of the longer vector. If you’re lucky you get a warning but always check the results so you got what you expected.
Solution
x1<-x[1:3] x2<-x[4:10] x1+x2
-
Now set the variable
n
to equal 10 and create the following vectory1 <- 1:n-1
What did you get and why?
Solution
n <- 10 y1 <- 1:n-1
y2 <- 1:(n-1)
What is the difference?
What we see here is an example of how operations are prioritized.
:
has higher priority than-
. Higher prioritization will be executed first. This is good to be aware of. Try out another example:1+2*3
What do you think the result is? Which operation has the highest priority? How can you write this so that the result equals 9?
Solution
(1+2)*3
-
Insert a new R code chunk where you create two vectors,
measure
andtype
, of length 10 using the following codemeasure <- sample(1:3, 10, replace=T) type <- sample(c("a", "b", "c"), 10, replace=T)
-
Translate
type
into a factor using the functionas.factor()
.Solution
type <- as.factor(type)
-
Print out
type
to see how it looks.Solution
print(type)
-
Now print the expression
c(type,"d")
.Solution
print(c(type,"d"))
You will notice that
a
,b
andc
intype
suddenly changed to numbers. Factors are troublesome in many ways and you need to be careful when you work with them, but when we can use them as categories in statistical analysis they are really useful. We will work more with that in the statistics part of the course but here is a short example. Copy and run the code:tapply(measure, type, mean)
This is a very quick and easy way to calculate the mean of
measure
within each factor intype
. To place it in a real context, you can imagine thattype
was some sample type andmeasure
was some measure performed on the samples.
-
We will continue with the vectors
type
andmeasure
from the previous task. Create a data frame calledmydata
with these two vectors as columns. You can use the functiondata.frame()
.Solution
mydata <- data.frame(type, measure)
-
Use the functions
-
head()
, -
tail()
, -
summary()
, - and
dim()
to look at the data frame.
-
-
Now, let’s say that we are only interested in sample type
a
. Look at the column calledtype
using the dollar sign$
.Solution
mydata$type
-
Use the
==
operator to find out which of the elements in this column that have the valuea
. Typemydata$type == "a"
What you got is a vector of logical values?
-
Use the
which()
function with the expression above as argument. This will give you a vector with the positions of the type vector that equalsa
. Assign the vector of positions to a variable that you callpos
Solution
pos <- which(mydata$type=="a")
-
Use square brackets to extract a subset of the data frame that contains type
a
(i.e the row numbers inpos
). Inside the square brackets you first tell which rows, then which columns you want to see. Remember from the lecture that an empty space means that all should be used.Solution
mydata[pos,]
-
Assign the subset to a variable called
mydata_out
.Solution
mydata_out <- mydata[pos,]
-
We will now print
mydata_out
to a tab separated text file using thewrite.table()
function. Play around with the argumentscol.names
row.names
quote
sep
in
write.table()
and see what happens. You can open the file in a simple text editor to see how the output changes when you change the arguments.Solution
write.table(mydata_out, "labtest20240422.txt", col.names=T, row.names=F, quote=F, sep="\t")
-
Now use the
read.table()
function to read the same data into R again, and save it asmydata_in
.Solution
mydata_in <- read.table("labtest20240422.txt", header = T, sep="\t", as.is = T)
-
Are
mydata_in
andmydata_out
identical? Use thesummary()
function on each of them. Is there any difference?
Home: R programming
Developed by Maria Nethander, 2017, Modified by Fanny Berglund, 2024