Microbiome Helper 2 Introduction to R and R notebooks - LangilleLab/microbiome_helper GitHub Wiki
Authors: Robyn Wright
Modifications by: NA
Based on initial versions by: NA
Introduction to R
There are hundreds of different coding languages and these all tend to have different intended use cases. In bioinformatics, there are three that I believe are the most widely used today: bash, R, and Python. Lots of the initial analyses that we run on a server will use bash, although they may have components written in other programming languages, and then most downstream and statistical analyses will use R or Python. Most bioinformaticians will have a preference out of these - my personal favourite is Python, because I think it is the easiest to customise plots/figures with, but I think that R is typically easier to learn and has packages with built-in functions for more analyses, so that is what we'll be using for most of the tutorials/workflows on this repository.
You will typically find that as you start to understand the basics of one programming language, if you read code in another language you might understand it, even if you wouldn't know exactly how to write it - a bit like speaking two relatively similar languages like Spanish and Portuguese, where if you learn one, you'll likely understand lots of the other even if you can't speak it. For each coding language, you'll find that there is the base language, which is a bit like the grammar in a spoken language and determines how you need to format your commands, and then there are packages (like plugins) that you will install on top of this where you'll be able to "call" different "functions", for example, to import a table from a file or to make a barplot of the data in that table.
R is a free coding language that was designed for statistical computing and data visualisation. Because there are so many packages available for common things that we like to do with our microbiome data, it's what has tended to be used the most and what we will use here.
As with any program, there are lots of different versions of both R/Python and the packages that you will install to them. Slightly frustratingly, you'll find that some of these will have different dependencies - they will be reliant on a particular version of R - and this could be different for different packages that you use, but as a general rule, one of the latest versions should have the most available support and most other packages should be compatible. I'd therefore recommend downloading/installing the latest version before you start.
Introduction to RStudio
I've covered coding languages above, but we will often use an Integrated Development Environment (IDE) for working with scripts and commands that we run. These essentially provide a wrapper for the coding language itself where you can develop and test the code that you write, and they typically provide some other nice features like panels where you can see your files or where you can get help for the packages/functions that you are using. They are typically nicer to look at and use than the software itself. For R, the most widely used IDE by far is RStudio - others exist that are focused on Python, like Jupyter or Spyder, and there is nothing wrong with a different one, we've just chosen to focus on RStudio as it's what we personally use.
You can download RStudio for desktop for your personal computer, but it's also available as a server version and some servers, like our lab one or the AWS servers that we use, will already have it installed. It is useful for writing code in the same way that something like Word is for writing a paragraph - it will suggest spots where you may have made a mistake, and sometimes display additional information as you type. You can use it for writing and running individual scripts, both in R as well as several other programming languages, as well as for R notebooks, that we'll discuss below.
When you open up RStudio, it's likely to look something like this: Here, you can see that on the left side there is a Console (as well as the other tabs Terminal and Jobs), and on the right there is an Environment panel (which also has other tabs) at the top, and a Files panel (which again has other tabs) at the bottom. In the Files panel, you can see all of the different files that I have on the server, and the other tabs would let you e.g. view the plots that you make, or get help about the Packages or Functions that you're using. The Environment panel will have information about the "objects" that you currently have imported into R. I won't go through all of this now, but you should get relatively good familiarity with this as you work through some of the tutorials/workflows.
Introduction to R markdown notebooks
While R notebooks allow you to work with R code, they also work really well for just keeping track of the code that you run in R or any other language, as well as allowing you to share objects between different code chunks that are written in different coding languages without exporting to a regular file and re-importing in the other language.
There are quite a few examples on the RStudio R markdown pages, so it is worth browsing through these to see what they are capable of first, and then coming back to how we are using them here.
To create an R notebook, simply go to File > New File > R Notebook in RStudio. You can read through the brief instructions and try running it! After you've familiarised yourself a little, you can choose a new title at the top, and then delete the rest. I would then make a new "Chunk" for each of the code chunks that I have below (click on the green C and choose "R"). You can make notes outside of these chunks if you like.
If you use the R notebook and make chunks, you'll also need to run the chunks! You can do that by clicking the green play button on the top right hand corner of each chunk, or by holding
Command+Shift+Enter. You can run a single line by pressingCommand+Enter.
Here you can see a new R notebook that includes three empty chunks:
I've also set up the document with the code_folding option hide, so that when we make the HTML document the code won't show, but you can still click on it to show it.
Hopefully this is enough to get you started with R notebooks here, but we'll aim to have more explanations and some R notebook templates that you can use for some of the analysis workflows that we have here.