Skip to content

ncbi/CF-random_software

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

597 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data and code for CF-random

General installation and usage guidance of CF-random for predicting the alternative conformation and fold-switching proteins.
To run CF-random in a Colab notebook, please use following link.

Open CF-random Colab

Installation

We currently not support the Windows and MacOS environment.
Installation process including Colabfold, dependencies, and Foldseek is done with following commands.

Now create a conda new conda environment:

conda create --name CF-random python=3.10
conda activate CF-random
pip install colabfold[alphafold,openmm] jax[cuda12] openmm[cuda12]
pip install textalloc tmtools adjustText thefuzz mdtraj biopython seaborn MDAnalysis Colabfold
conda install conda-forge::pymol-open-source
pip3 install -U scikit-learn

Once the dependencies are installed, install Foldseek.

conda install -c conda-forge -c bioconda foldseek

Usage

  • CF-random has different prediction modes such as fold-switching default, alternative conformation, and blind mode.
  • To execute all modes of CF-random, a multiple sequence alignment (MSA) is required. To avoid the overwriting the output files, we recommend using a different folder containing MSA.
  • PDB files for both fold1 (dominant conformation) and fold2 (alternative conformation) are required for TM-score measurement with reference files. Blind mode doesn't require PDB files, but default fold-switching and alternative conformation modes do.
  • All required PDB files and MSA file should be in same directory with provided Python scripts.

  • Please make sure that a PDB file should have a single chain, not multiple chains. If PDB file has multiple chains, CF-random would be stopped.
 --fname ####    |  folder name having a multiple sequence alignment (MSA)
 --pname ####    |  project name for running blind mode (only for blind mode)
 --pdb1  ####    |  dominant reference model used to calculate TM-score with predicted models
 --pdb2  ####    |  alternative reference model used to calculate TM-score with predicted models
 --nMSA  ####    |  the number of additional samples for predicting the structure with MSAs, default = 0
 --type  ####    |  can choose the model type of Colabfold. e.g.) ptm, monomer, and multimer
 --options ###   |  AC: predicting alternative conformations of protein with references, FS: predicting the fold-switching protein with references, and blind: predicting the alternative conformations or fold-switching proteins without reference PDB files.
 --seq ###       |  sequences of fold-switching region are required to compare the TM-score between reference crystal structure and preidcted structure. This option is only required for 'FS' option. 
  • In default mode (fold-switching and alternative conformation), CF-ramdon produces the results of TM-scores (csv and png files), plDDT, and information of selected random MSA. If CF-random predicts the both folds, generated prediction files are deposited under successed_prediction/pdb1_name and additional_sampling/pdb1_name . If not, it would not generate anything.

  • Before running the default mode of fold-switching, --seq option is required.

  • --nMSA can be applied for all options, but --nESN cannot be used for blind mode.

  • In blind mode, predicted files are deposited under blind_prediction/pdb1_name . CF-random with blind mode produces the comparison result with Foldseek.

  • For running the foldseek in blind mode, Foldseek parameter files and running Python scripts should be in same directory.

  • Before running the CF-random, ensure that the CF-random conda environment is activated:

conda activate CF-random

Examples

We provide some examples how users can run the CF-random with different modes.
First two modes such as fold-switching and alternative conformation are default modes of CF-random and the last one is a blind mode.

1. For CF-random with fold-switching mode.

For this example, RfaH would be predicted with two reference structures (i.e., 2oug_C.pdb and 6c6s_D.pdb) and a MSA file.

python main.py --fname 2oug_C-search/ --pdb1 2oug_C.pdb --pdb2 6c6s_D.pdb --option FS

Used input files:

  • PDB1: 2oug_C.pdb
  • PDB2: 6c6s_D.pdb
  • MSA: 2oug_C-search/0.a3m (MSA file should be in a folder)
  • '--seq' is required for comparing the fold-switching region between crystal structure and predicted structure

This takes <30 Minutes to run on an A100 GPU (generates 200 structures total).

Generated output files:

Predicted files from deep and random MSAs are deposited in 'predictions_all' directory.
If CF-random fails to find the selected random MSA, all generated files will be in 'predictions_all' directory.

  • TM-score plot of whole structure: TMscore_fs-region_full-MSA_2oug_C.png
  • TM-score plot of fold-switching region: TMscore_full-MSA_2oug_C.png
  • TM-score plot of fold-switching region with label of prediction rank: TMscore_fs-region_full-MSA_2oug_C_label.png
  • TM-scores and plDDT scores of predictions with deep MSA: TMs_plDDT_full_all_2oug_C.csv
  • TM-scores and plDDT scores of predictions with random MSAs: TMs_plDDT_rand_all_2oug_C.csv
  • Selection of random MSA: selected_MSA-size_2oug_C.csv (When CF-random finds the MSA depth)
    • MSA depth information (e.g. # = max-seq:max-seq-extra) (0 = 1:2, 1 = 2:4, 2 = 4:8, 3 = 8:16, 4 = 16:32, 5 = 32:64, 6 = 64:128)

2. For CF-random with alternative conformation mode.

For this mode, Lactococcal OppA would be predicted with two reference structures (i.e., 3drh.pdb and 3drf.pdb) and an MSA file.

python main.py --fname 5olw_A-search --pdb1 5olw_A.pdb --pdb2 5olx_A.pdb --option AC --nMSA 5

Used input files:

  • PDB1: 5olw_A.pdb
  • PDB2: 5olx_A.pdb
  • MSA: 5olw_A-search/0.a3m (MSA file should be in a folder)

This takes <70 Minutes to run on an A100 GPU (generates 200 structures total; protein is large: ~250 residues).

Generated output files:

Predicted files from deep and random MSAs are deposited in 'predictions_all' directory.
If CF-random fails to find the selected random MSA, all generated files will be in 'predictions_all' directory.

  • TM-score plot of whole structure: TMscore_full-MSA_5olw_A.png
  • TM-scores and plDDT scores of predictions with deep MSA: TMs_plDDT_full_all_5olw_A.csv
  • TM-scores and plDDT scores of predictions with random MSAs: TMs_plDDT_rand_all_5olw_A.csv
  • Selection of random MSA: selected_MSA-size_3drh_A.csv (When CF-random finds the MSA depth)
    • MSA depth information (e.g. # = max-seq:max-seq-extra) (0 = 1:2, 1 = 2:4, 2 = 4:8, 3 = 8:16, 4 = 16:32, 5 = 32:64, 6 = 64:128)

3. For CF-random with blind mode covering both fold-switching and alternative conformation.

python main.py --pname Mad2_test --fname 2vfx_L-search/ --option blind

Before running this code, make a symbolic link to the foldseek pdb libraries in the directory where you run the command above.

Used input files:

MSA: 2vfx_L-search/0.a3m (MSA file should be in a folder)

Generated output files:

Predicted files from deep and random MSAs are deposited in 'blind_prediction' directory.
If user uses the option '--pname', the name of output files would be entered '--pname'.

  • List of prediction files: Mad2-structures_of_interest.csv
  • The best hit list of alternative conformations: Mad2-structures_of_interest.csv
  • Cluster analysis result as an image file: Mad2-cluster.png

This takes <70 Minutes to run on an A100 GPU (generates 200 structures total + 200 foldseek files).

How to Cite

Lee, M., Schafer, J.W., Prabakaran, J. et al. Large-scale predictions of alternative protein conformations by AlphaFold2-based sequence association. Nat Commun 16, 5622 (2025). https://doi.org/10.1038/s41467-025-60759-5

License

Please see the LICENSE.md file.

About

Prediction of alternative conformation and fold-switching proteins with ColabFold

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors