Methodology - BGIGPD/BestPractices4Pathogenomics GitHub Wiki

Introduction

中文版教程

How this course work?

  1. A wiki guidance includes:

    • Content / Schedule of this course
    • A rapid introduction to describe core concepts
    • A step by step guide to guide you practicing analysis
  2. A git repository where:

    • We will prepare nessisary scripts in advace for you
    • Act as a demo project practicing bioinformatics analysis
      • Install or load environments with necessary packages get ready
      • Organize and store dataset
      • Edit and execute scripts
      • commit changes to code
      • push and pull changes to keep work updated
    • Sync and share your work with your team

Prepration

1.Gtting familiar with linux shells

Shells are command-line interpreters that provide a way for users to interact with the operating system. They are used to execute commands, run programs, and manipulate files and directories.
Here are the most popular shells:

  • Bash (Bourne Again SHell): The most commonly used shell in Linux and Unix-like operating systems.
  • Zsh (Z Shell): A more advanced shell with features like syntax highlighting, command completion, and themes.
  • Fish (Friendly Interactive Shell): A user-friendly shell with features like syntax highlighting, command completion, and auto-suggestions.

In order to standardize teaching, ensure the consistency of code execution, and the operability of exercises, I recommend everyone using the Bash shell on our bastion host. Here is how we login it:

From local terminal:

ssh uomc-worker01.genomics.cn

By using xshell:
xshell login

More details please check our guidance about How to use bastion host

Following is some extra guidline without using our bastion server.

For Linux & Mac OS users, Bash is already installed in your system, simple open Terminal and here you are.

For Windows OS users, it's a little bit tough to get bash, but you still have some options:

  1. Install a Linux distribution on your Windows machine, such as Ubuntu or Fedora.
  2. Use a terminal emulator that supports Bash, such as Git Bash or Cygwin.
  3. Use a terminal emulator that supports Bash, such as Git Bash or Cygwin.
  4. [recommanded]Windows Subsystem for Linux (WSL): A feature of Windows 10 that allows you to run a Linux distribution directly on your Windows machine.

More details to learn shell:

2.using virtual environment like conda

Conda is a popular package and environment management system used primarily for installing software packages, managing dependencies, and creating isolated environments.
If you were suffered by install packages manually, you better try conda

The easy way:

bash /home/fangchao/Miniconda.sh

The hard way:

  1. search conda
  2. find the installation guidence
  3. Download and install
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh 

If successfully installed, your terminal should display message like below:

[fangchao@localhost ~]$ bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
PREFIX=/home/fangchao/miniconda3
Unpacking payload ...

Installing base environment...

Preparing transaction: ...working... done
Executing transaction: ...working... done
installation finished.

3.Using git for version control

Git is a widely used distributed version control system for tracking changes in source code during software development. It is designed to handle everything from small to very large projects with speed and efficiency.

After install conda and activate the base env, git should be already installed.

But for the first time using git, you may need configure your author information so we can distinguish your works from others.

git config --global --add user.name YourName # Replace `YourName` with your own name;
git config --global --add user.email YourEmail # Replace `YourEmail` with your own email;

3.1 Clone this course repository

Go to a folder you want to store the course repository. Assume the default home folder /home/<yourname> (also known as $HOME or ~`) once you login to the bastion server. Then run the following command:

git clone https://github.com/BGIGPD/BestPractices4Pathogenomics.git

You should see the following output if nothing goes wrong:

Cloning into 'BestPractices4Pathogenomics'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
Receiving objects: 100% (6/6), done.

After cloning the repository, you should see a new folder named BestPractices4Pathogenomics in your current directory.
Go into it:

cd BestPractices4Pathogenomics

3.2 Create your own working branch

Create a new branch for you own work.

IMPORTANT: Replace YourName with your own name or any other name you want to use.

git checkout -b YourName

You should see the following output if nothing goes wrong:

Switched to a new branch 'YourName'

3.3 Inititate our work

Create a new folder and a README file in it.

mkdir YourName

Then enter this folder:

cd YourName

Edit it by using vi( it's short for vim, learn more basic usage here):

vi README.md

In vi editor, you should enter 'i' first to enter a --INSERT-- model.

Add a markdown syntax h1-title of "About this project". Write something about your work then save it. Here is an example:

# About this project

This is my work.

To save your content, type 'ESC' to leave --INSERT-- mode, then type:

:wq

to save and quit vi editor. Here w means write(save), q means quit.

3.4 Commit your work to the repository

git status
git add README.md
git status
git commit -m 'My work begin' README.md

3.5 Sync with your remote branch

git push origin YourName

Well done! You have now been aware of a git-styled version control workflow.

4. Install R and RStudio

4.1 Install R

R(The R Project for Statistical Computing) is a programming language and environment commonly used for statistical computing, data analysis, and visualization. It is widely used by statisticians, data miners, and researchers for a variety of tasks involving data manipulation and analysis.
Since display images on remote server is often complex and inconvinient, we recommand install it on your local system (laptop).

Find the download page, select a mirror close to your location, and download the proper version of R for your operating system.
WOnload and install R

4.2 Install RStudio

Go Rstudio website (RStudio) and download the proper version of RStudio for your operating system.
Rstudio installer

4.3 Start RStudio

rstudio

Extend: How to push/pull to GitHub.com

Method1: Temporarily use another git code host

git remote add gitea https://gitea.biochao.cc/fangchao/BestPractices4Pathogenomics.git

git push gitea YourName

Replace the branch parameter with your defined name.

Method2: In this demo case, I figure out an easy way for us to do so (also temporarily).

1. login our bastion host and start sftp

Bastion Host

2. copy the rsa key from Bastion host

Copy the rsa key from Bastion host to your local machine, under ~/.ssh/.
filezilla

3. setup ssh config

Edit ~/.ssh/config:

Host github.com
HostName github.com
IdentityFile ~/.ssh/bot4demo_id_rsa
IMPORTANT NOTE

Plese remove above content in ~/.ssh/config after this course or after you regist your own github account. Otherwise it will affect your future work.

4. make sure we are using the ssh urls as remote:

git remote set-url origin [email protected]:BGIGPD/BestPractices4Pathogenomics.git

5. try git push

Go back to section 3.5 and try again.

6. Learn more about git basic operations

Read following sections from Git Book and try them out!

  1. Getting Started
    1.2 A Short History of Git
    1.3 What is Git?
    1.4 The Command Line
    1.5 Installing Git
    1.6 First-Time Git Setup
    1.7 Getting Help
    1.8 Summary
  2. Git Basics 2.1 Getting a Git Repository
    2.2 Recording Changes to the Repository
    2.3 Viewing the Commit History
    2.4 Undoing Things
    2.5 Working with Remotes
⚠️ **GitHub.com Fallback** ⚠️