General installation and usage guidance of CF-random for predicting the alternative conformation and fold-switching proteins.
To run CF-random in a Colab notebook, please use following link.
We currently not support the Windows and MacOS environment.
Installation process including Colabfold, dependencies, and Foldseek is done with following commands.
Now create a conda new conda environment:
conda create --name CF-random python=3.10
conda activate CF-random
pip install colabfold[alphafold,openmm] jax[cuda12] openmm[cuda12]
pip install textalloc tmtools adjustText thefuzz mdtraj biopython seaborn MDAnalysis Colabfold
conda install conda-forge::pymol-open-source
pip3 install -U scikit-learn
Once the dependencies are installed, install Foldseek.
conda install -c conda-forge -c bioconda foldseek
- CF-random has different prediction modes such as fold-switching default, alternative conformation, and blind mode.
- To execute all modes of CF-random, a multiple sequence alignment (MSA) is required. To avoid the overwriting the output files, we recommend using a different folder containing MSA.
- PDB files for both fold1 (dominant conformation) and fold2 (alternative conformation) are required for TM-score measurement with reference files. Blind mode doesn't require PDB files, but default fold-switching and alternative conformation modes do.
- Please make sure that a PDB file should have a single chain, not multiple chains. If PDB file has multiple chains, CF-random would be stopped.
--fname #### | folder name having a multiple sequence alignment (MSA)
--pname #### | project name for running blind mode (only for blind mode)
--pdb1 #### | dominant reference model used to calculate TM-score with predicted models
--pdb2 #### | alternative reference model used to calculate TM-score with predicted models
--nMSA #### | the number of additional samples for predicting the structure with MSAs, default = 0
--type #### | can choose the model type of Colabfold. e.g.) ptm, monomer, and multimer
--options ### | AC: predicting alternative conformations of protein with references, FS: predicting the fold-switching protein with references, and blind: predicting the alternative conformations or fold-switching proteins without reference PDB files.
--seq ### | sequences of fold-switching region are required to compare the TM-score between reference crystal structure and preidcted structure. This option is only required for 'FS' option.
-
In default mode (fold-switching and alternative conformation), CF-ramdon produces the results of TM-scores (csv and png files), plDDT, and information of selected random MSA. If CF-random predicts the both folds, generated prediction files are deposited under successed_prediction/pdb1_name and additional_sampling/pdb1_name . If not, it would not generate anything.
-
Before running the default mode of fold-switching, --seq option is required.
-
--nMSA can be applied for all options, but --nESN cannot be used for blind mode.
-
In blind mode, predicted files are deposited under blind_prediction/pdb1_name . CF-random with blind mode produces the comparison result with Foldseek.
-
Before running the CF-random, ensure that the CF-random conda environment is activated:
conda activate CF-random
We provide some examples how users can run the CF-random with different modes.
First two modes such as fold-switching and alternative conformation are default modes of CF-random and the last one is a blind mode.
For this example, RfaH would be predicted with two reference structures (i.e., 2oug_C.pdb and 6c6s_D.pdb) and a MSA file.
python main.py --fname 2oug_C-search/ --pdb1 2oug_C.pdb --pdb2 6c6s_D.pdb --option FS
- PDB1: 2oug_C.pdb
- PDB2: 6c6s_D.pdb
- MSA: 2oug_C-search/0.a3m (MSA file should be in a folder)
- '--seq' is required for comparing the fold-switching region between crystal structure and predicted structure
This takes <30 Minutes to run on an A100 GPU (generates 200 structures total).
Predicted files from deep and random MSAs are deposited in 'predictions_all' directory.
If CF-random fails to find the selected random MSA, all generated files will be in 'predictions_all' directory.
- TM-score plot of whole structure: TMscore_fs-region_full-MSA_2oug_C.png
- TM-score plot of fold-switching region: TMscore_full-MSA_2oug_C.png
- TM-score plot of fold-switching region with label of prediction rank: TMscore_fs-region_full-MSA_2oug_C_label.png
- TM-scores and plDDT scores of predictions with deep MSA: TMs_plDDT_full_all_2oug_C.csv
- TM-scores and plDDT scores of predictions with random MSAs: TMs_plDDT_rand_all_2oug_C.csv
- Selection of random MSA: selected_MSA-size_2oug_C.csv (When CF-random finds the MSA depth)
- MSA depth information (e.g. # = max-seq:max-seq-extra) (0 = 1:2, 1 = 2:4, 2 = 4:8, 3 = 8:16, 4 = 16:32, 5 = 32:64, 6 = 64:128)
- MSA depth information (e.g. # = max-seq:max-seq-extra) (0 = 1:2, 1 = 2:4, 2 = 4:8, 3 = 8:16, 4 = 16:32, 5 = 32:64, 6 = 64:128)
For this mode, Lactococcal OppA would be predicted with two reference structures (i.e., 3drh.pdb and 3drf.pdb) and an MSA file.
python main.py --fname 5olw_A-search --pdb1 5olw_A.pdb --pdb2 5olx_A.pdb --option AC --nMSA 5
- PDB1: 5olw_A.pdb
- PDB2: 5olx_A.pdb
- MSA: 5olw_A-search/0.a3m (MSA file should be in a folder)
This takes <70 Minutes to run on an A100 GPU (generates 200 structures total; protein is large: ~250 residues).
Predicted files from deep and random MSAs are deposited in 'predictions_all' directory.
If CF-random fails to find the selected random MSA, all generated files will be in 'predictions_all' directory.
- TM-score plot of whole structure: TMscore_full-MSA_5olw_A.png
- TM-scores and plDDT scores of predictions with deep MSA: TMs_plDDT_full_all_5olw_A.csv
- TM-scores and plDDT scores of predictions with random MSAs: TMs_plDDT_rand_all_5olw_A.csv
- Selection of random MSA: selected_MSA-size_3drh_A.csv (When CF-random finds the MSA depth)
- MSA depth information (e.g. # = max-seq:max-seq-extra) (0 = 1:2, 1 = 2:4, 2 = 4:8, 3 = 8:16, 4 = 16:32, 5 = 32:64, 6 = 64:128)
- MSA depth information (e.g. # = max-seq:max-seq-extra) (0 = 1:2, 1 = 2:4, 2 = 4:8, 3 = 8:16, 4 = 16:32, 5 = 32:64, 6 = 64:128)
python main.py --pname Mad2_test --fname 2vfx_L-search/ --option blind
Before running this code, make a symbolic link to the foldseek pdb libraries in the directory where you run the command above.
MSA: 2vfx_L-search/0.a3m (MSA file should be in a folder)
Predicted files from deep and random MSAs are deposited in 'blind_prediction' directory.
If user uses the option '--pname', the name of output files would be entered '--pname'.
- List of prediction files: Mad2-structures_of_interest.csv
- The best hit list of alternative conformations: Mad2-structures_of_interest.csv
- Cluster analysis result as an image file: Mad2-cluster.png
This takes <70 Minutes to run on an A100 GPU (generates 200 structures total + 200 foldseek files).
Lee, M., Schafer, J.W., Prabakaran, J. et al. Large-scale predictions of alternative protein conformations by AlphaFold2-based sequence association. Nat Commun 16, 5622 (2025). https://doi.org/10.1038/s41467-025-60759-5
Please see the LICENSE.md file.