R I: Introduction to R - BDC-training/VT25 GitHub Wiki

Course: VT25 R programming (SC00035)


Getting Started

  1. Open RStudio (or RGui)

  2. Type a simple mathematical expression (e.g. 1+2) in the console and press enter. The code is run immediately.

    You can see the code you ran in the console, but it cannot be edited. You can however scroll through the latest commands by pushing the “up” or “down” keys on your keyboard, edit them, and run them again.

Working with scripts

  1. Open a new R script (File->New file->R script) and save it in a suitable folder. It is a good idea to save it using an .R ending, so you know that it is R code in the file. Also, using spaces in file names may cause you trouble later on, so try to use “_” instead.

    All code you want to save should be written into scripts like this or, as we will see later, into other types of R files. Then you can go back later on to see what you did. In a script you can use # to write comments to your code. Commenting your code will make it easier to read or to remember what the code does. Anything written on a line after # will be disregarded when running the code.

  2. Type the same simple expression as before (e.g. 1+2) into your new file. You can run a line by placing the cursor on the line and then:

    1. either click the Run button in the top left corner of the editor,
    2. or press Ctrl+Enter (in Rstudio)
    3. or press Ctrl+R (in RGui).
  3. Assign the expression to a variable called my_expr and run the code again

    Solution
    my_expr <- 1+2
  4. On a new line use the print() function with the variable my_expr as argument to print the value of your new variable in the console.

    The print() function is used if you run your entire script from a different location or a from a different script. We will see more of that later.

    Solution
    print(my_expr)
  5. What happens if you put citation marks around the name of the variable in the print() function?

    Solution
    print("my_expr")
  6. Save the script and then run the entire script. You can do this by:

    1. marking all code (using the pointer or Ctrl+A) and press Ctrl+Enter, Ctrl+R or the run button
    2. pressing the “Source” button in the top right of the editor, which runs the entire script
    3. Using the source() function in a different script, with the script you want to run as argument (including the path to where it is stored), e.g: source("C:/My_R_Scripts/My_script.R"). This is the case where the print() function will be needed (without it nothing will be printed in the console).

Vectors

  1. Create a vector with the numbers 1, 2, … , 10 and assign it to the variable x.

    Solution
    x <- 1:10
  2. Select the five first elements in x and assign them to the variable x1.

    Solution
    x1 <- x[1:5]
  3. Select the five last elements in x and assign them to the variable x2.

    Solution
    x2 <- x[6:10]
  4. Calculate the sum of x1 and x2. What did you get?

    Solution
    x1 + x2
  5. Now repeat the three steps above but take the three first elements and the seven last elements. Calculate the sum of these two vectors. What did you get now?

    Here is a warning! You can take sums of vectors of different length and get a result. The shorter vector will be recycled until the end of the longer vector. If you’re lucky you get a warning but always check the results so you got what you expected.

    Solution
    x1<-x[1:3]
    x2<-x[4:10]
    x1+x2
  6. Now set the variable n to equal 10 and create the following vector

    y1 <- 1:n-1

    What did you get and why?

    Solution
    n <- 10
    y1 <- 1:n-1
    Create a second vector
    y2 <- 1:(n-1)

    What is the difference?

    What we see here is an example of how operations are prioritized. : has higher priority than -. Higher prioritization will be executed first. This is good to be aware of. Try out another example:

    1+2*3 

    What do you think the result is? Which operation has the highest priority? How can you write this so that the result equals 9?

    Solution
    (1+2)*3

Factors

  1. Insert a new R code chunk where you create two vectors, measure and type, of length 10 using the following code

    measure <- sample(1:3, 10, replace=T)
    type <- sample(c("a", "b", "c"), 10, replace=T)
  2. Translate type into a factor using the function as.factor().

    Solution
    type <- as.factor(type)
  3. Print out type to see how it looks.

    Solution
    print(type)
  4. Now print the expression c(type,"d").

    Solution
    print(c(type,"d"))

    You will notice that a, b and c in type suddenly changed to numbers. Factors are troublesome in many ways and you need to be careful when you work with them, but when we can use them as categories in statistical analysis they are really useful. We will work more with that in the statistics part of the course but here is a short example. Copy and run the code:

    tapply(measure, type, mean)

    This is a very quick and easy way to calculate the mean of measure within each factor in type. To place it in a real context, you can imagine that type was some sample type and measure was some measure performed on the samples.

Data frames

  1. We will continue with the vectors type and measure from the previous task. Create a data frame called mydata with these two vectors as columns. You can use the function data.frame().

    Solution
    mydata <- data.frame(type, measure)
  2. Use the functions

    1. head(),
    2. tail(),
    3. summary(),
    4. and dim()

    to look at the data frame.

  3. Now, let’s say that we are only interested in sample type a. Look at the column called type using the dollar sign $.

    Solution
    mydata$type
  4. Use the == operator to find out which of the elements in this column that have the value a. Type

    mydata$type == "a"

    What you got is a vector of logical values?

  5. Use the which() function with the expression above as argument. This will give you a vector with the positions of the type vector that equals a. Assign the vector of positions to a variable that you call pos

    Solution
    pos <- which(mydata$type=="a")
  6. Use square brackets to extract a subset of the data frame that contains type a (i.e the row numbers in pos). Inside the square brackets you first tell which rows, then which columns you want to see. Remember from the lecture that an empty space means that all should be used.

    Solution
    mydata[pos,]
  7. Assign the subset to a variable called mydata_out.

    Solution
    mydata_out <- mydata[pos,]

Reading and writing text files

  1. We will now print mydata_out to a tab separated text file using the write.table() function. Play around with the arguments

    1. col.names
    2. row.names
    3. quote
    4. sep

    in write.table() and see what happens. You can open the file in a simple text editor to see how the output changes when you change the arguments.

    Solution
    write.table(mydata_out, 
          "labtest20240422.txt",
          col.names=T, row.names=F, quote=F, sep="\t")
  2. Now use the read.table() function to read the same data into R again, and save it as mydata_in.

    Solution
    mydata_in <- read.table("labtest20240422.txt", 
          header = T, sep="\t", as.is = T)
  3. Are mydata_in and mydata_out identical? Use the summary() function on each of them. Is there any difference?



Developed by Maria Nethander, 2017, Modified by Fanny Berglund, 2024

⚠️ **GitHub.com Fallback** ⚠️