Home - Karthikeyan-Lab-Caltech/Wiki GitHub Wiki

Karthikeyan Lab HPC Wiki

1. Getting Started

Logging into HPC

Using terminal (Mac) , CMD (Windows), or iTerm2:

  • Login with SSH: ssh [email protected]
  • Then type your Caltech password (you will not be able to see as you type)

2. Essential Commands

Essential Commands

3. Folder Structure and Organization

Due to the migration of HPC groups from /central to /resnick the group will need to migrate from /central/groups/enviromics into /resnick/groups/enviromics.

In the process of moving hopefully we can reorganize the HPC. The current HPC group is quiete disorganised and the goals of the reorganisiation are to prevent data reducndancy, streamline tool sharing, and keep the group organized.

The overall organization will be split into 3 shared directories as well as an individual folder for each user.

/resnick/groups/enviromics/
│
├── tools/             # Shared software
├── database/          # Downloaded tool databases
├── data/              # Shared datasets
├── $USER/             # Personal space

Tools Folder

There are lots of tools that we use as a group. These should be tools frequently used and especially those with a higher difficulty of installation. Tools and any changes might affect anyone using them thus any modification or additions will be discussed during lab meeting. Additionally, one should first install and test the tool in their own personal folder then if desired we can migrate it into the tools folder Located at /central/groups/enviromics/tools.

Naming Conventions - TBD Update the tools list

Database Folder

Lots of tools requiere the download of tool-specific databases. These tend to be quite large and take a long time to download. The purpose of the group is to prevent mutiple downloads of the same databases as well as tracking the version history. Located at /central/groups/enviromics/database. When any database is added or updated there are 2 steps to take:

Naming Conventions - TBD

  1. Update the database list
  2. Set Read-Only permission

Data Folder

Having sequencing data available to everyone should promote open source analysis and collaboration. The data folder can be used to store ones one sequencing data or data downloaded from outside sources. Once downloaded one should use symbolic links when working with data ln -s /path/to/database ./ Located at /central/groups/enviromics/data. When any data is added or updated there are 2 steps to take:

Naming Conventions - TBD

  1. Update the data list
  2. Set Read-Only permission

4. Job Submission

SLURM Commands

5. Modules and Enviroments

Modules and Environments

6. Text Processing

Text Processing

7. Git Hub

Git Hub

8. To-Add

To-Add

⚠️ **GitHub.com Fallback** ⚠️