Features Database - KosinskiLab/AlphaPulldown GitHub Wiki
Instead of generating feature files locally, you can download them from the AlphaPulldown Features Database, which contains precomputed protein features for major model organisms.
You can browse the full list and download individual features at https://alphapulldown.s3.embl.de/index.html or https://s3.embl.de/alphapulldown/index.html.
Installation
[!NOTE] For EMBL cluster users: You can access the directory with generated features files at
/g/alphafold/input_features/
To access the Features Database, you need to install the MinIO Client (mc).
Steps:
- Download the
mcbinary. - Make the binary executable.
- Move it to your
PATHfor system-wide access.
Example for AMD64 architecture:
curl -O https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
sudo mv mc /usr/local/bin/
Verify installation:
To ensure mc is correctly installed, you can run:
mc --help
Configuration
Set up an alias for easy access to the AlphaPulldown Features Database hosted at EMBL:
mc alias set embl https://s3.embl.de "" "" --api S3v4
This alias allows you to interact with the Features Database as if it were a local directory.
Downloading Features
Once mc is installed and configured, you can start accessing the Features Database. The mc commands mimic standard bash commands.
List available organisms:
To view the list of available organisms with precomputed feature files, run:
mc ls embl/alphapulldown/input_features
Each organism directory contains compressed .pkl.xz feature files, named according to their UniProt ID.
Download specific protein features:
For example, to download the feature file for the protein with UniProt ID Q6BF25 from Escherichia coli, use:
mc cp embl/alphapulldown/input_features/Escherichia_coli/Q6BF25.pkl.xz Q6BF25.pkl.xz
Download all features for an organism:
To download all feature files for proteins from a specific organism, such as E. coli, copy the entire directory:
mc cp --recursive embl/alphapulldown/input_features/Escherichia_coli/ ./Escherichia_coli/
Alternatively, you can mirror the contents of the organism’s directory, ensuring all files are synced between the source and your local directory:
mc mirror embl/alphapulldown/input_features/Escherichia_coli/ Escherichia_coli/
This command mirrors the remote directory to your local system, keeping both locations in sync.