Data and code for CF-random

General installation and usage guidance of CF-random for predicting the alternative conformation and fold-switching proteins.
To run CF-random in a Colab notebook, please use following link.

Installation

We currently not support the Windows and MacOS environment.
Installation process including Colabfold, dependencies, and Foldseek is done with following commands.

Now create a conda new conda environment:

conda create --name CF-random python=3.10
conda activate CF-random
pip install colabfold[alphafold,openmm] jax[cuda12] openmm[cuda12]
pip install textalloc tmtools adjustText thefuzz mdtraj biopython seaborn MDAnalysis Colabfold
conda install conda-forge::pymol-open-source
pip3 install -U scikit-learn

Once the dependencies are installed, install Foldseek.

conda install -c conda-forge -c bioconda foldseek

Usage

CF-random has different prediction modes such as fold-switching default, alternative conformation, and blind mode.
To execute all modes of CF-random, a multiple sequence alignment (MSA) is required. To avoid the overwriting the output files, we recommend using a different folder containing MSA.
PDB files for both fold1 (dominant conformation) and fold2 (alternative conformation) are required for TM-score measurement with reference files. Blind mode doesn't require PDB files, but default fold-switching and alternative conformation modes do.
All required PDB files and MSA file should be in same directory with provided Python scripts.
Please make sure that a PDB file should have a single chain, not multiple chains. If PDB file has multiple chains, CF-random would be stopped.

 --fname ####    |  folder name having a multiple sequence alignment (MSA)
 --pname ####    |  project name for running blind mode (only for blind mode)
 --pdb1  ####    |  dominant reference model used to calculate TM-score with predicted models
 --pdb2  ####    |  alternative reference model used to calculate TM-score with predicted models
 --nMSA  ####    |  the number of additional samples for predicting the structure with MSAs, default = 0
 --type  ####    |  can choose the model type of Colabfold. e.g.) ptm, monomer, and multimer
 --options ###   |  AC: predicting alternative conformations of protein with references, FS: predicting the fold-switching protein with references, and blind: predicting the alternative conformations or fold-switching proteins without reference PDB files.
 --seq ###       |  sequences of fold-switching region are required to compare the TM-score between reference crystal structure and preidcted structure. This option is only required for 'FS' option.

In default mode (fold-switching and alternative conformation), CF-ramdon produces the results of TM-scores (csv and png files), plDDT, and information of selected random MSA. If CF-random predicts the both folds, generated prediction files are deposited under successed_prediction/pdb1_name and additional_sampling/pdb1_name . If not, it would not generate anything.
Before running the default mode of fold-switching, --seq option is required.
--nMSA can be applied for all options, but --nESN cannot be used for blind mode.
In blind mode, predicted files are deposited under blind_prediction/pdb1_name . CF-random with blind mode produces the comparison result with Foldseek.
For running the foldseek in blind mode, Foldseek parameter files and running Python scripts should be in same directory.
Before running the CF-random, ensure that the CF-random conda environment is activated:

conda activate CF-random

Examples

We provide some examples how users can run the CF-random with different modes.
First two modes such as fold-switching and alternative conformation are default modes of CF-random and the last one is a blind mode.

1. For CF-random with fold-switching mode.

For this example, RfaH would be predicted with two reference structures (i.e., 2oug_C.pdb and 6c6s_D.pdb) and a MSA file.

python main.py --fname 2oug_C-search/ --pdb1 2oug_C.pdb --pdb2 6c6s_D.pdb --option FS

Used input files:

PDB1: 2oug_C.pdb
PDB2: 6c6s_D.pdb
MSA: 2oug_C-search/0.a3m (MSA file should be in a folder)
'--seq' is required for comparing the fold-switching region between crystal structure and predicted structure

This takes <30 Minutes to run on an A100 GPU (generates 200 structures total).

Generated output files:

Predicted files from deep and random MSAs are deposited in 'predictions_all' directory.
If CF-random fails to find the selected random MSA, all generated files will be in 'predictions_all' directory.

TM-score plot of whole structure: TMscore_fs-region_full-MSA_2oug_C.png
TM-score plot of fold-switching region: TMscore_full-MSA_2oug_C.png
TM-score plot of fold-switching region with label of prediction rank: TMscore_fs-region_full-MSA_2oug_C_label.png
TM-scores and plDDT scores of predictions with deep MSA: TMs_plDDT_full_all_2oug_C.csv
TM-scores and plDDT scores of predictions with random MSAs: TMs_plDDT_rand_all_2oug_C.csv
Selection of random MSA: selected_MSA-size_2oug_C.csv (When CF-random finds the MSA depth)
- MSA depth information (e.g. # = max-seq:max-seq-extra) (0 = 1:2, 1 = 2:4, 2 = 4:8, 3 = 8:16, 4 = 16:32, 5 = 32:64, 6 = 64:128)

2. For CF-random with alternative conformation mode.

For this mode, Lactococcal OppA would be predicted with two reference structures (i.e., 3drh.pdb and 3drf.pdb) and an MSA file.

python main.py --fname 5olw_A-search --pdb1 5olw_A.pdb --pdb2 5olx_A.pdb --option AC --nMSA 5

Used input files:

PDB1: 5olw_A.pdb
PDB2: 5olx_A.pdb
MSA: 5olw_A-search/0.a3m (MSA file should be in a folder)

This takes <70 Minutes to run on an A100 GPU (generates 200 structures total; protein is large: ~250 residues).

Generated output files:

Predicted files from deep and random MSAs are deposited in 'predictions_all' directory.
If CF-random fails to find the selected random MSA, all generated files will be in 'predictions_all' directory.

TM-score plot of whole structure: TMscore_full-MSA_5olw_A.png
TM-scores and plDDT scores of predictions with deep MSA: TMs_plDDT_full_all_5olw_A.csv
TM-scores and plDDT scores of predictions with random MSAs: TMs_plDDT_rand_all_5olw_A.csv
Selection of random MSA: selected_MSA-size_3drh_A.csv (When CF-random finds the MSA depth)
- MSA depth information (e.g. # = max-seq:max-seq-extra) (0 = 1:2, 1 = 2:4, 2 = 4:8, 3 = 8:16, 4 = 16:32, 5 = 32:64, 6 = 64:128)

3. For CF-random with blind mode covering both fold-switching and alternative conformation.

python main.py --pname Mad2_test --fname 2vfx_L-search/ --option blind

Before running this code, make a symbolic link to the foldseek pdb libraries in the directory where you run the command above.

Used input files:

MSA: 2vfx_L-search/0.a3m (MSA file should be in a folder)

Generated output files:

Predicted files from deep and random MSAs are deposited in 'blind_prediction' directory.
If user uses the option '--pname', the name of output files would be entered '--pname'.

List of prediction files: Mad2-structures_of_interest.csv
The best hit list of alternative conformations: Mad2-structures_of_interest.csv
Cluster analysis result as an image file: Mad2-cluster.png

This takes <70 Minutes to run on an A100 GPU (generates 200 structures total + 200 foldseek files).

How to Cite

Lee, M., Schafer, J.W., Prabakaran, J. et al. Large-scale predictions of alternative protein conformations by AlphaFold2-based sequence association. Nat Commun 16, 5622 (2025). https://doi.org/10.1038/s41467-025-60759-5

License

Please see the LICENSE.md file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data and code for CF-random

Installation

Usage

All required PDB files and MSA file should be in same directory with provided Python scripts.

For running the foldseek in blind mode, Foldseek parameter files and running Python scripts should be in same directory.

Examples

1. For CF-random with fold-switching mode.

Used input files:

Generated output files:

2. For CF-random with alternative conformation mode.

Used input files:

Generated output files:

3. For CF-random with blind mode covering both fold-switching and alternative conformation.

Used input files:

Generated output files:

How to Cite

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 597 Commits
Data		Data
Install		Install
code		code
examples		examples
LICENSE.md		LICENSE.md
README.md		README.md

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Data and code for CF-random

Installation

Usage

All required PDB files and MSA file should be in same directory with provided Python scripts.

For running the foldseek in blind mode, Foldseek parameter files and running Python scripts should be in same directory.

Examples

1. For CF-random with fold-switching mode.

Used input files:

Generated output files:

2. For CF-random with alternative conformation mode.

Used input files:

Generated output files:

3. For CF-random with blind mode covering both fold-switching and alternative conformation.

Used input files:

Generated output files:

How to Cite

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages