AlphaFold

AlphaFold#

Currently, we have AlphaFold version 2.0.0, 2.1.1, 2.3.2, and 3.0.0 installed on Quest (multimers available for everything newer than 2.0.0). For more details on both releases of AlphaFold, please visit the AlphaFold website .

Changes for AlphaFold3

AlphaFold3 changed the way that model parameters were implemented. Parameters must now be requested on a user-by-user basis. Information on requesting the parameters can be found on the AlphaFold3 website .

GPU card considerations when running AlphaFold

Different versions of AlphaFold may require different GPUs. For information on how to request a specific version of GPU, please reference the Quest GPU page.

A reference list is included below:

AlphaFold3 - a100 or h100
AlphaFold2 - a100

AlphaFold 3.0.0#

Running AlphaFold#

Whenever possible it’s recommended to split the CPU and GPU workloads for AlphaFold. This helps with resource optimization and ensuring your workflow completes in the most efficient way.

The AlphaFold3 module provides 3 functions: af3_cpu, af3_gpu, and af3_full

The examples below can also be found on the AlphaFold3 GitHub example and show how to run an example workflow in the CPU and GPU steps.

The AlphaFold3 site also provides a fold_input.json as an example of how to format the input for the application.

{
  "name": "2PV7",
  "sequences": [
    {
      "protein": {
        "id": ["A", "B"],
        "sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
      }
    }
  ],
  "modelSeeds": [1],
  "dialect": "alphafold3",
  "version": 1
}

Using this input file, you can run the cpu only data pipeline with the command below. The data pipeline is the first step in the Alphafold3 simulations and completes the genetic and template search for the input data.

$ af3_cpu --model_dir=/path/to/model/parameters \
    --output_dir=$(pwd)/output \
    --json_path=$(pwd)/fold_input.json

If the application launches successfully, you’ll see information about the start of the simulation and how long each iteration took to complete in the log file for the batch job.

Once the CPU portion completes, you can update the paths to run the GPU portion of AlphaFold3. This is the inference part of the simulation. It takes the output of the previous CPU step as input for the inference.

$ af3_gpu --model_dir=/path/to/model/parameters \
    --output_dir=$(pwd)/output/ \
    --json_path=$(pwd)/output/2pv7/2pv7_data.json

After the GPU inference is completed, you should see similar information about the application’s setup and simulation times in the log file. All of the output for the CPU and GPU steps of the simulation can be found in the --output_dir as specified by you.

AlphaFold 2.3.2#

Please find examples of running AlphaFold 2.3.2 monomer or multimer on Quest in the RCDS Example Slurm Jobs Repository on GitHub .

Installation on Quest#

AlphaFold 2.3.2 is installed on Quest inside of a Singularity container following the instructions from the DeepMind team .

The container contains CUDA 11.1.1, Python 3.8, jax 0.3.25, and jaxlib 0.3.25+cuda11.cudnn805. In addition, please note that this install of AlphaFold contains a modification to AlphaFold in order to allow for the CPU and GPU parts of AlphaFold to be run separately. We added the following flag

+flags.DEFINE_boolean('only_msas', False, 'Whether to only build MSAs, and not do any prediction.')

Instead of calling singularity directly, we provide a module which wraps the call to the singularity run.

$ module load alphafold/2.3.2-with-msas-only-and-config-yaml

This creates two shell functions, one for running Alphafold multimer (alphafold-multimer), and one for use with Alpha monomer (alphafold-monomer).

alphafold-monomer --fasta_paths=/projects/intro/alphafold/T1050.fasta \
    --max_template_date=2022-01-01 \
    --model_preset=monomer \
    --db_preset=full_dbs \
    --only_msas=[true|false] \
    --use_gpu_relax=[true|false] \
    --output_dir=$(pwd)/out

model_preset
- monomer: This is the original model used at CASP14 with no ensembling.
- monomer_casp14: This is the original model used at CASP14 with num_ensemble=8, matching our CASP14 configuration. This is largely provided for reproducibility as it is 8x more computationally expensive for limited accuracy gain (+0.1 average GDT gain on CASP14 domains).
- monomer_ptm: This is the original CASP14 model fine tuned with the pTM head, providing a pairwiseconfidence measure. It is slightly less accurate than the normal monomer model.
use_gpu_relax
- Whether to relax on GPU. Relax on GPU can be much faster than CPU, so it is recommended to enable if possible. GPUs must be available if this setting is enabled.
use_precomputed_msas
- Whether to read MSAs that have been written to disk.
only_msas
- Whether to only build MSAs, and not do any prediction.

alphafold-multimer --fasta_paths=/projects/intro/alphafold/6E3K.fasta \
    --max_template_date=2022-01-01 \
    --model_preset=multimer \
    --db_preset=full_dbs \
    --only_msas=[true|false] \
    --use_gpu_relax=[true|false] \
    --output_dir=$(pwd)/out

model_preset
- multimer: This is the AlphaFold-Multimer model. To use this model, provide a multi-sequence FASTA file. In addition, the UniProt database should have been downloaded.
use_gpu_relax
- Whether to relax on GPU. Relax on GPU can be much faster than CPU, so it is recommended to enable if possible. GPUs must be available if this setting is enabled.
use_precomputed_msas
- Whether to read MSAs that have been written to disk.
only_msas
- Whether to only build MSAs, and not do any prediction.

If you would like to see the contents of the shell function alphafold-multimer or alphafold-monomer, you can run type alphafold-monomer or type alphafold-multimer on the command line.

Running AlphaFold#

Below, we provide an example submission script for running AlphaFold with separate CPU and GPU workloads on Quest. First, you construct the submission script that will use only CPU resources, which we will call example_submit_cpu_part.sh.

Next, you construct the submission script that will use GPU resources, which we will call example_submit_gpu_part.sh.

Finally, we use the following bash script which we will call submit_alphafold_workflow.sh to submit the CPU job first, and then submit the GPU job as dependent on the CPU job finishing with status OK.

#!/bin/bash
cpu_job=($(sbatch example_submit_cpu_part.sh))
echo "cpu_job ${cpu_job[-1]}" >> slurm_ids
gpu_job=($(sbatch --dependency=afterok:${cpu_job[-1]} example_submit_gpu_part.sh))
echo "gpu_job ${gpu_job[-1]}" >> slurm_ids

AlphaFold 2.1.1 - Separate CPU and GPU (Preferred)#

Installation on Quest#

AlphaFold 2.1.1 is installed inside of a Singularity container following the instructions from the DeepMind team .

The container contains CUDA 11.1, Python 3.7.11, TensorFlow 2.5.0, jax 0.2.25, and jaxlib 0.1.69+cuda111. In addition, please note that this install of AlphaFold contains a modification to AlphaFold in order to allow for the CPU and GPUparts of AlphaFold to be run separately. We added the following flag

+flags.DEFINE_boolean('only_msas', False, 'Whether to only build MSAs, and not do any prediction.')

Instead of calling singularity directly, we provide a module which wraps the call to the singularity run.

$ module load alphafold/2.1.1-only-msas-flag-addition

This creates two shell functions, one for running Alphafold multimer (alphafold-multimer), and one for use with Alpha monomer (alphafold-monomer).

alphafold-monomer --fasta_paths=/full/path/to/fasta \
  --output_dir=/full/path/to/outdir \
  --max_template_date= \
  --only_msas=[true|false] \
  --use_precomputed_msas=[true|false] \
  --model_preset=[monomer|monomer_casp14|monomer_ptm] \
  --db_preset=full_dbs \

model_preset
- monomer: This is the original model used at CASP14 with no ensembling.
- monomer_casp14: This is the original model used at CASP14 with num_ensemble=8, matching our CASP14 configuration. This is largely provided for reproducibility as it is 8x more computationally expensive for limited accuracy gain (+0.1 average GDT gain on CASP14 domains).
- monomer_ptm: This is the original CASP14 model fine tuned with the pTM head, providing a pairwiseconfidence measure. It is slightly less accurate than the normal monomer model.
use_precomputed_msas
- Whether to read MSAs that have been written to disk.
only_msas
- Whether to only build MSAs, and not do any prediction.

alphafold-multimer --fasta_paths=/full/path/to/fasta \
  --output_dir=/full/path/to/outdir \
  --max_template_date= \
  --only_msas=[true|false] \
   --use_precomputed_msas=[true|false] \
  --model_preset=multimer \
  --db_preset=full_dbs \
  --is_prokaryote_list=[true|false] 

model_preset
- multimer: This is the AlphaFold-Multimer model. To use this model, provide a multi-sequence FASTA file. In addition, the UniProt database should have been downloaded.
is_prokaryote_list
- optionally set the –is_prokaryote_list flag with booleans that determine whether all input sequences in the given fasta file are prokaryotic. If that is not the case or the origin is unknown, set to false for that fasta.
use_precomputed_msas
- Whether to read MSAs that have been written to disk.
only_msas
- Whether to only build MSAs, and not do any prediction.

If you would like to see the contents of the shell function alphafold-multimer or alphafold-monomer, you can run type ``alphafold-monomer or type alphafold-multimer on the command line.

Running AlphaFold#

Below, we provide an example submission script for running AlphaFold with separate CPU and GPU workloads on Quest. First, you construct the submission script that will use only CPU resources, which we will call example_submit_cpu_part.sh.

Next, you construct the submission script that will use GPU resources, which we will call example_submit_gpu_part.sh.

Finally, we use the following bash script which we will call submit_alphafold_workflow.sh to submit the CPU job first, and then submit the GPU job as dependent on the CPU job finishing with status OK.

AlphaFold 2.1.1 - Older version#

Installation on Quest#

AlphaFold 2.1.1 is installed inside of a Singularity container following the instructions from the DeepMind team .

The container contains CUDA 11.0, Python 3.7.10, and TensorFlow 2.5.0.

Instead of calling singularity directly, we provide a module which wraps the call to the singularity run.

$ module load alphafold/2.1.1

This creates two shell functions, one for running Alphafold multimer (alphafold-multimer), and one for use with Alpha monomer (alphafold-monomer).

alphafold-monomer --fasta_paths=/full/path/to/fasta \
  --output_dir=/full/path/to/outdir \
  --max_template_date= \
  --model_preset=[monomer|monomer_casp14|monomer_ptm] \
  --db_preset=full_dbs

model_preset
- monomer: This is the original model used at CASP14 with no ensembling.
- monomer_casp14: This is the original model used at CASP14 with num_ensemble=8, matching our CASP14 configuration. This is largely provided for reproducibility as it is 8x more computationally expensive for limited accuracy gain (+0.1 average GDT gain on CASP14 domains).
- monomer_ptm: This is the original CASP14 model fine tuned with the pTM head, providing a pairwiseconfidence measure. It is slightly less accurate than the normal monomer model.

alphafold-multimer --fasta_paths=/full/path/to/fasta \
  --output_dir=/full/path/to/outdir \
  --max_template_date= \
  --model_preset=multimer \
  --db_preset=full_dbs \
  --is_prokaryote_list=[true|false] 

model_preset
- multimer: This is the AlphaFold-Multimer model. To use this model, provide a multi-sequence FASTA file. In addition, the UniProt database should have been downloaded.
is_prokaryote_list
- optionally set the –is_prokaryote_list flag with booleans that determine whether all input sequences in the given fasta file are prokaryotic. If that is not the case or the origin is unknown, set to false for that fasta.

If you would like to see the contents of the shell function alphafold, you can run type alphafold on the command line.

Running AlphaFold#

Below, we provide an example submission script for running AlphaFold on Quest.

submit_alphafold_2.1.1_example.sh

#!/bin/bash
#SBATCH --account=pXXXX  ## YOUR SLURM ACCOUNT pXXXX or bXXXX
#SBATCH --partition=gengpu  ### PARTITION (buyin, short, normal, etc)
#SBATCH --nodes=1 ## how many computers do you need - for AlphaFold this should always be one
#SBATCH --ntasks-per-node=12 ## how many cpus or processors do you need on each computer
#SBATCH --gres=gpu:a100:1  ## type of GPU requested, and number of GPU cards to run on
#SBATCH --time=48:00:00 ## how long does this need to run 
#SBATCH --mem=85G ## how much RAM do you need per node (this effects your FairShare score so be careful to not ask for more than you need))
#SBATCH --job-name=run_AlphaFold  ## When you run squeue -u <NETID> this is how you can identify the job
#SBATCH --output=AlphaFold.log ## standard out and standard error goes to this file
#SBATCH --mail-type=ALL ## you can receive e-mail alerts from SLURM when your job begins and when your job finishes (completed, failed, etc)
#SBATCH --mail-user=email@northwestern.edu ## your email, non-Northwestern email addresses may not be supported

#########################################################################
### PLEASE NOTE:                                                      ###
### The above CPU, Memory, and GPU resources have been selected based ###
### on the computing resources that alphafold was tested on           ###
### which can be found here:                                          ###
### https://github.com/deepmind/alphafold#running-alphafold)          ###
### It is likely that you do not have to change anything above        ###
### besides your Slurm account, and email (if you want to be emailed).   ###
######################################################################### module purge

module purge
module load alphafold/2.1.1

# template monomer
# alphafold-monomer --fasta_paths=/full/path/to/fasta \
# --output_dir=/full/path/to/outdir \
# --max_template_date= \
# --model_preset=[monomer|monomer_casp14|monomer_ptm] \
# --db_preset=full_dbs
### 
### monomer: This is the original model used at CASP14 with no ensembling.
### 
### monomer_casp14: This is the original model used at CASP14 with num_ensemble=8, matching our CASP14 configuration. This is largely provided for reproducibility as it is 8x more computationally expensive for limited accuracy gain (+0.1 average GDT gain on CASP14 domains).
### 
### monomer_ptm: This is the original CASP14 model fine tuned with the pTM head, providing a pairwise confidence measure. It is slightly less accurate than the normal monomer model.

# template multimer
# alphafold-multimer --fasta_paths=/full/path/to/fasta \
# --output_dir=/full/path/to/outdir \
# --max_template_date= \
# --model_preset=multimer \
# --is_prokaryote_list=[true|false] \
# --db_preset=full_dbs
### 
### multimer: This is the AlphaFold-Multimer model. To use this model, provide a multi-sequence FASTA file. In addition, the UniProt database should have been downloaded.
###
### optionally set the --is_prokaryote_list flag with booleans that determine whether all input sequences in the given fasta file are prokaryotic. If that is not the case or the origin is unknown, set to false for that fasta.

# real example monomer (takes about 3 hours and 15 minutes)
alphafold-monomer --fasta_paths=/projects/intro/alphafold/T1050.fasta \
    --max_template_date=2020-05-14 \
    --model_preset=monomer \
    --db_preset=full_dbs \
    --output_dir=$(pwd)/out

# real example multimer (takes about 2 hours and 40 minutes)
alphafold-multimer --fasta_paths=/projects/intro/alphafold/6E3K.fasta \
    --max_template_date=2020-05-14 \
    --model_preset=multimer \
    --db_preset=full_dbs \
    --output_dir=$(pwd)/out

AlphaFold 2.0.0#

Installation on Quest#

AlphaFold 2.0.0 is installed inside of a Singularity container following the instructions from the DeepMind team .

The container contains CUDA 11.0, Python 3.7.10, and TensorFlow 2.5.0.

Instead of calling singularity directly, we provide a module which wraps the call to the singularity run.

$ module load alphafold/2.0.0

This creates a shell function called alphafold which can be used as follows:

alphafold --fasta_paths=/full/path/to/fasta \
    --output_dir=/full/path/to/outdir \
    --model_names= \
    --preset=[full_dbs|casp14] \
    --max_template_date=

If you would like to see the contents of the shell function alphafold, you can run type alphafold on the command line.

Running AlphaFold#

Below, we provide an example submission script for running AlphaFold 2.0.0 on Quest.

AlphaFold

Contents

AlphaFold#

AlphaFold 3.0.0#

Running AlphaFold#

AlphaFold 2.3.2#

Installation on Quest#

Running AlphaFold#

AlphaFold 2.1.1 - Separate CPU and GPU (Preferred)#

Installation on Quest#

Running AlphaFold#

AlphaFold 2.1.1 - Older version#

Installation on Quest#

Running AlphaFold#

AlphaFold 2.0.0#

Installation on Quest#

Running AlphaFold#