Getting Started - k-ngo/CATMD GitHub Wiki

Getting Started with CATMD

Welcome to CATMD! This guide will help you get started with using CATMD either on Google Colab or your local machine. Follow the instructions below based on your preferred setup.


1. How to Use CATMD on Google Colab?

CATMD can be run easily on Google Colab, using Google's free cloud resources with no installation needed. It only takes a few minutes to get started:

  1. Sign In to Google Colab

    • Open the CATMD notebook.
    • Sign in with your Google account.
    • If you are new to Google Colab, visit this introduction page to learn what it is.
  2. Install Packages and Import Modules

    • Scroll to the section titled "1️⃣ Install Necessary Packages".
    • Hover over the code block, then click the run arrow run to the left to execute it.
    • First, you will be connected to a free Google Compute Engine instance with 12.7 GB RAM, 107.7 GB storage, 2 CPU Threads (for multiprocessing). tutorial_connect_hosted_runtime
      • Note: Doesn't work? In the top-right corner of the Colab window, click on the downward arrow to the right of the "Connect" button, then click "Connect to a hosted runtime".
      • Upgrade Option: For more memory, disk space, or computing power, consider running locally—the setup only takes a few minutes (see section below).
    • Then, the script will install all required Python modules necessary to run CATMD. This will take only a few minutes and nothing will be installed to your computer.
    • Next, go to "2️⃣ Import Modules" and click the run arrow run to import them.
  3. Load MD Topology and Coordinates Files

    • Navigate to the "3️⃣ Load MD Trajectory Files" section (see options below).
    • Upload or specify your topology and coordinates files as needed.
  4. That's it! You Can Now Start Using CATMD table_of_contents

    • Click the⠀⋮☰⠀button to left of the screen to open the Table of Contents and navigate to different sections.
    • Browse the notebook for tools (e.g., analysis scripts, plotting).
    • Select the desired groups of atoms, residues, or segments for analysis (see Selection Language Tutorial)
    • Click the run arrow run on any tool’s code block to run it, or right-click anywhere outside of text fields and choose "Run the focused cell" from the menu.
    • If you see tools marked as , start by testing with small atom selections and fewer frames (use a subset of frames or increase step size). Consider running locally for big systems (see section below).
  5. Save Outputs:

    • Manually save plotted figures by right-clicking them in your browser and selecting "Save Image As...".
    • Or, use the "Download Output Folder as ZIP" tool at the end to compress outputs into a .zip file and download it.
    • Note: Outputs are not saved automatically to your local machine when running on Google's cloud instance and will be lost when the instance expires or if you remain idle too long.

2. How to Use CATMD on Your Local Machine?

a. Using Your Home Computer (Jupyter Notebook)

  1. Setup Conda

    • Install Miniconda from anaconda.com.
    • Create a new Conda environment and install Jupyter using the terminal by executing the following commands sequentially:
      conda create -n catmd python=3.11
      conda activate catmd
      conda install -c conda-forge notebook
      conda install pip
  2. Launch Jupyter Notebook

    • Create a folder to save your results, then navigate to it in the terminal
    • Start Jupyter Notebook by entering this command in the terminal:
      jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --port=8888 --NotebookApp.port_retries=0
      
    • Copy the local URL (e.g., either http://localhost:8888/... or http://127.0.0.1:8888/...) from the terminal output.
  3. Connect Colab to Local Runtime

    • In Google Colab, click the dropdown next to "Connect" (top-right).
    • Select "Connect to a local runtime".
    • Paste the URL from Step 2 and click "Connect".
    • Note: Keeps failing to connect? Use Docker instead (see section 2b)
  4. Install Packages and Import Modules

    • Navigate to "1️⃣ Install Necessary Packages", then click the run arrow run to the left to execute it. The script will install all required Python modules necessary to run CATMD in the virtual "catmd" environment. You will only need to do this step once.
      • Error with the installation? Try conda install -c conda-forge compilers first before launching the Jupyter Notebook.
    • Next, go to "2️⃣ Import Modules" and click the run arrow run to import them.
      • Error with importing modules? Try conda install -c conda-forge xorg-libxrender first before launching the Jupyter Notebook.
    • Navigate to the "3️⃣ Load MD Trajectory Files" section (see options below) and click the run arrow run. Here, you can upload or specify your topology and coordinates files as needed.
  5. That's it! You Can Now Start Using CATMD

    • Click the⠀⋮☰⠀button to left of the screen to open the Table of Contents and navigate to different sections.

b. Using Your Home Computer (Docker)

  1. Install Docker

  2. Pull the Colab Runtime Image (one-time)

    • The CATMD Docker-based setup uses a special Colab Runtime image.
    • Open a terminal and run:
      docker pull us-docker.pkg.dev/colab-images/public/runtime
    • Note: You only need to do this once. After downloading, Docker will cache the image locally and reuse it automatically without re-downloading.
  3. Launch the Colab Runtime

    • Navigate to the folder where you want CATMD to read/write files (your working directory, CATMD will not be able to access files outside this directory)
    • Launch the container with the following command:
      docker run -p 127.0.0.1:9000:8080 \
        -v "$(pwd)":/content \
        -w /content \
        us-docker.pkg.dev/colab-images/public/runtime
    • You will find 2 URLs right below the line Or copy and paste one of these URLs: (type=jupyter) in the resulting output.
    • Copy one of them, excluding the (type=Jupyter) at the end.
  4. Connect Colab to the Local Docker Runtime

    • In Google Colab, click the dropdown next to "Connect" (top-right corner).
    • Select "Connect to a local runtime".
    • In the address field, paste the URL you copied above:
      http://127.0.0.1:9000/?token=...
      
    • Click "Connect".
  5. Run CATMD Normally

    • Once connected, you can follow the normal CATMD workflow.

c. Using a Remote Workstation/Server

If you're running CATMD on a remote server (e.g., an HPC login node or lab workstation), you can still connect it to Colab. This setup allows you to develop interactively on Colab while running heavy simulations on your remote machine.

  1. Launch Jupyter Notebook on the Remote Machine

    • SSH into your remote machine.
    • Repeat steps 1 and 2 from the section above (Jupyter Notebook or Docker) to setup conda and launch Jupyter Notebook/Docker on your remote machine.
    • Copy the full URL with token from the terminal (e.g., http://localhost:8888/...).
  2. Tunnel the Jupyter Port Back to Your Local Machine

    • On your local machine, open a new terminal and run either of the following depending on whether you use Jupyter or Docker. Replace <your_username>@<remote_ip> with the correct one for your server:
      (For Jupyter) ssh -N -L 8888:localhost:8888 <your_username>@<remote_ip>
      (For Docker)  ssh -N -L 9000:localhost:9000 <your_username>@<remote_ip>
    • Keep this terminal open to remain connected. There will be no output.
  3. Connect Colab to the Remote Runtime

    • In Google Colab, choose "Connect to a local runtime"
    • Paste the local tunnel URL from the remote machine you copied earlier http://localhost:8888/... (Jupyter) or http://127.0.0.1:9000/... (Docker)
    • Click "Connect". Colab will now execute on your local machine using your remote machine's resources.

3. Options to Load Trajectories

CATMD provides three methods to load MD trajectory files. A full list of accepted file formats for topology and coordinates files can be found here.

Option 1: Upload Files (best for small files: <50,000 atoms, <100 frames)

  • How: In the "Load MD Trajectory Files" section, select Upload Files from the dropdown. Click Run run, then use the upload buttons to provide topology and coordinates files.

Option 2: Specify Paths (ideal for large files; fast access via Google Drive or local storage)

  • How:
    • On Colab: Select Specify Paths. For large files, upload to Google Drive first (faster transfer speeds), then:
      • Run the code with loading_mode set to Specify Paths once to mount your Drive.
      • A popup window will appear, asking your permission to let the notebook mount your Drive.
      • After mounting the Drive, set paths to your topology and coordinates files in the corresponding fields, e.g. drive/MyDrive/CATMD/step5_input.psf and drive/MyDrive/CATMD/1000ns.dcd.
      • Run the code with loading_mode set to Specify Paths again.
      • Note: CATMD only reads files from your mounted drive without modifying or adding new files to your existing files or directories. To save the output, you will need to do so manually or use the Download Output Folder as a ZIP tool at the end of the notebook.
    • Locally: Specify paths relative to your Jupyter Notebook working directory (e.g., ./step5_input.psf). This is the directory in which you launched the Notebook in the terminal.

Option 3: Auto-Load from Directory (convenient for reloading pre-uploaded files or loading local runs)

  • How: Select Auto-Load from Directory. CATMD detects the first topology and coordinates files in the current directory.

Recommendations

  • Large Files: Upload to Google Drive first, then use "Specify Paths" in Colab for faster transfers. If files are already uploaded to Google Drive, the next time you use CATMD, they will instantly be available after mounting your Drive. This is not true for files uploaded only to the Colab instance as they are only temporary and will be deleted when the instance expires.
  • Local Runs: Use "Specify Paths" or "Auto-Load" based on your file organization.

⚠️ **GitHub.com Fallback** ⚠️