EnsembleAnalysis

This module analyzes ensembles in terms of intermediate order. In this module, ensembles have internal variable names (identifiers).

addpdb

Input of template conformers from PDB files.

addpdb file identifier
Arguments
  • file - file name, can contain wildcards

  • identifier - module-internal identifier for the ensemble, e.g. ‘ensemble_1’

Remarks
  • use wildcard ‘*’ for part of the filename to process all conformers from a previous step in the pipeline

  • individual PDB files can contain a single model or several models; this can be freely mixed

archive

Save ensemble as a ZIP file including a file list with weights and individual PDB files of all conformers

archive output ensemble_id
Arguments
  • output - name of the output file, extension is not required

  • ensemble_id - identifier of the ensemble to be save

Remarks
  • this is the “most interoperable” way of saving a weighted ensemble

  • information generated by previous processing, such as spin labelling or domain partitioning, is lost

  • metadata that is not part of PDB specification, such as AlphaFold predicted aligned error, is lost

  • the file-list-with-weights format of the included ‘.ens’ file is human readable and easy to process by other software

  • we favour this format for deposition on Zenodo

  • the format can be reimported with the get_Zenodo keyword

  • for processing with some other software, saving to a single PDB file with the keyword save may be the better option

asphericity

Computes the ensemble averaged asphericity and plots asphericity versus radius of gyration for all conformers

asphericity input [address]
Arguments
  • input - identifier of the input ensemble

  • address - chain address or chain and residue ranges, e.g. (A)4-270, defaults to (A)

Remarks
  • output is a figure with Rg on the x axis and asphericity on the y axis

  • each conformer corresponds to one point marker, with MarkerSize corresponding to population

cluster

Reduce ensemble size by clustering.

cluster input output size range
Arguments
  • input - identifier of the input ensemble

  • output - identifier of the output (reduced) ensemble

  • size - number of conformers in the reduced ensemble

  • range - chain and residue range , e.g. (A)187-320 or list of residues, e.g. (A)187,231,316

Remarks
  • ensemble Shannon entropy and width before and after size reduction are reported in the log file

  • a similarity measure is reported in the log file

compare

Comparison of two ensembles

compare ensemble_1 ensemble_2 [range [mode]]
Arguments
  • ensemble_1 - identifier of the first ensemble

  • ensemble_2 - identifier of the second ensemble

  • range - optional MMMx address that specifies only a range of a conformer for comparison, e.g. (A)187-320

  • mode - optional string mode can be ‘resolved’ to request residue-wise comparison

Remarks
  • this is a legacy keyword, better use match for comparing two ensembles

  • the algorithm works well only if both ensembles are dense (spatially overlapping conformers)

  • the algorithm computes overlap of pseudo-electron densities between ensembles

  • the range argument ‘(*)’ selects the complete structure

  • the two ensembles may have different numbers of conformers

  • residue-wise comparison of large ensembles can take very long

coulomb

Computes and displays the ensemble averaged Coulomb interaction for pairs of charged residues

coulomb filename input [aa1 [aa2 [pH [I [Tmax]]]]]
Arguments
  • filename - name of the output file, comma-separated value file

  • input - identifier of the input ensemble

  • aa1 - amino acid type 1, defaults to Arg, use three-letter code

  • aa2 - amino acid type 2, defaults to Glu, use three-letter code

  • pH - pH value, default is 7

  • I - ionic strength, default is 0.150 M

  • Tmax - temperature corresponding to white on the color scale, defaults to the maximum interaction among all pairs

Remarks
  • output is as a ‘.csv’ file, with the residue numbers in the first and second column and the Coulomb interaction in the third column

  • in addition, a figure is output with a hot colormap, where black is no interaction and white the maximum interaction

  • the interaction is scaled by the Boltzmann constant, so that it corresponds to the temperature where it matches thermal energy

  • specify parameter Tmax if you want to compare different residue pairs for the same ensemble

  • a salt bridge at 0.150 M ionic strength is in the range of 350-400 K

density

Computes a 3D electron density map of an ensemble

density filename input [range [resolution]]
Arguments
  • filename - name of the output file, specify with extension ‘.mat’ for MMMx density files or ‘.mrc’ for MRC files

  • input - identifier of the input ensemble

  • range - optional MMMx address that specifies only a range of a conformer for analysis, e.g. (A)187-320

  • resolution - resolution in Angstroem (optional), defaults to 1

Remarks
  • output is as a ‘.mrc’ file, which can be visualized by most protein graphics programs

  • output can also be as a Matlab file for visualization with MMM or the ‘visualize_isosuface’ function

  • a larger value for resolution leads to faster computation and a smaller file and may still be adequate for strong disorder

figures

Specify figure output format.

figures format
Arguments
  • format - one of the formats in which Matlab can save figures, e.g. ‘pdf’,’svg’,’png’, ‘jpg’, default is ‘pdf’

Remarks
  • figure saving is on by default in this module, use format ‘off’ to switch it off

flexibility

Computes (local) Ramachandran flexibility profiles of peptide or nucleotide chains

flexibility filename input
Arguments
  • filename - name of the output file

  • input - identifier of the input ensemble

Remarks
  • the algorithm analyzes variation of backbone dihedrals \psi and \phi

  • for RNA, pseudo-torsion angles are analyzed

  • local flexibility ranges between 0 (rigid) and 1 (random)

getens

Input of an ensemble from an MMMx ensemble list.

gentens file identifier
Arguments
  • file - name of an MMMx ensemble file list (extension ‘.ens’)

  • identifier - module-internal identifier for the ensemble, e.g. ‘ensemble_2’

Remarks
  • best way of analyzing an ensemble generated by the EnsembleFit module

  • all PDB files specified in the ensemble file list must be on the Matlab path

get_MMMx

Import of an ensemble from the internal (Matlab) format of MMMx

gen_MMMx filename identifier
Arguments
  • filename - name of a Matlab file generated with put_MMMx, e.g. ‘FUS_dispersed.mat’

  • identifier - module-internal identifier for the ensemble, e.g. ‘FUS_idspersed’

Remarks
  • this is the computationally least costly way of importing an ensemble

  • note that this format is not compatible with any other modelling or visualization software

get_PED

Import of an ensemble from the protein ensemble database (PED)

gen_PED PED_ID.ens_nr identifier
Arguments
  • PED_ID.ens_nr - PED identifier, followed by a dot and the ensemble number, e.g. ‘PED00020.e001’

  • identifier - module-internal identifier for the ensemble, e.g. ‘MeV1’

Remarks
  • PED ensembles do not feature conformer weights, uniform weights are assumed

  • requires internet access

get_Zenodo

Import of an ensemble from Zenodo

gen_Zenodo Zenodo_ID.filename identifier
Arguments
  • Zenodo_ID.filename - Zenodo identifier, followed by a dot and the file name, e.g. ‘8214049.FUS_condensed.zip’

  • identifier - module-internal identifier for the ensemble, e.g. ‘FUS_condensed’

Remarks
  • preferred format for Zenodo deposition is a ZIP archive (.zip) containing a .ens file and all PDB files listed in the .ens file list

  • if all PDB files are already available locally (on the Matlab path), the file on Zenodo can also be just a .ens file

  • archives containing an .ens file and PDB files can be imported as well from .gz, .tar, and .tar.gz formats

  • use the Zenodo keyword for downloading PDB files of a raw ensemble without importing the raw ensemble itself into MMMx

inertiaframe

Transform all conformers to their respective inertia frames

inertiaframe output input range
Arguments
  • output - name of the output file, extension ‘.pdb’ is appended, if none

  • input - identifier of the input ensemble

  • range - optional MMMx address that specifies only a range of a conformer for analysis, e.g. (A)187-320

Remarks
  • the x axis corresponds to the minimum and the z axis to the maximum moment of inertia

  • the smallest x and z coordinates correspond to the N terminus

  • the center of origin of the coordinate frame is the center of gravity of the conformer

match

Match conformers in one ensemble by conformers in a second ensemble

match ensemble_1 ensemble_2 [range [range2]]
Arguments
  • ensemble_1 - identifier of the first ensemble

  • ensemble_2 - identifier of the second ensemble

  • range - optional MMMx address that specifies a chain/residue range for matching, e.g. (A)187-320

  • range - optional MMMx address that specifies a different range in the second ensemble for comparison, e.g. (B)1-134

Remarks
  • the algorithm finds the closest conformer by distance root mean square in the second ensemble for each conformer in the first ensemble

  • the range argument ‘(*)’ selects the complete structure

  • the range argument can be missing (complete structure is the default)

  • if the first range argument is given and the second one is missing, the same range is applied in the second ensemble

  • the list of matches and the maximum mismatch are reported in the log file

measures

Compute various measures of the ensemble. This is a block key with n lines for n measures.

measures filename  input [range]
   subkey
   ...
.measures
Arguments
  • filename - basis name for the output files, abbreviated below as ‘%s’

  • input - identifier for the input ensemble

  • range - optional MMMx address that specifies only a range of a conformer for analysis, e.g. (A)187-320

  • subkey - a subkey that specifies a measure from the following list

Available subkeys
  • matlab - save output data to Matlab files

  • csv - save output data to comma-separated value files

  • oriented - assume that conformers are already oriented, default is false (conformers are superimposed)

  • Rg - radius of gyration. including standard deviation (output to logfile)

  • width - ensemble width and density in Angstroem (output to logfile), also computes pair r.m.s.d. matrix and central conformer

  • correlation - correlation matrix, output as figure and to files ‘residue_pair_correlation_%s’ with extensions ‘.csv’ and ‘.mat’

  • sort - sort for computation of correlation matrix

  • drms - uses distance root mean square deviation for correlation matrix and sorting

  • compactness - compactness matrix

Remarks
  • saving output to both Matlab (‘.mat’) and ‘.csv’ files is allowed

  • if neither the matlab nor the csv subkey is present, output is only to figures or logfile

  • oriented affects only computation of pair r.m.s.d. (correlation matrix)

order

Computes local order profiles of peptide or nucleotide chains

order filename input
Arguments
  • filename - name of the output file

  • input - identifier of the input ensemble

Remarks
  • the algorithm is based on an adaptation of Flory’s characteristic ratio to polymers with secondary structure

  • the local order parameter ranges between 0 (random) and 1 (perfect order)

  • the local order parameter is somewhat longer ranged than the flexibility parameter mentioned above

property

Computes a 3D property map of an ensemble

property filename input [range [resolution [property [pH [I]]]]]
Arguments
  • filename - name of the output file, specify with extension ‘.mat’ for MMMx density files or ‘.mrc’ for MRC files

  • input - identifier of the input ensemble

  • range - optional MMMx address that specifies only a range of a conformer for analysis, e.g. (A)187-320

  • resolution - resolution in Angstroem (optional), defaults to 1

  • property - can be electrostatic (default), cation-pi, or hydrophobic

  • pH - pH value, default is 7

  • I - ionic strength, default is 0.150 M

Remarks
  • output is as a ‘.mrc’ file, which can be visualized by most protein graphics programs

  • output can also be as a Matlab file for visualization with MMM or the ‘visualize_isosuface’ function

  • a larger value for resolution leads to faster computation and a smaller file and may still be adequate for strong disorder

put_MMMx

Save ensemble in internal MMMx (Matlab) format

put_MMMx output ensemble_id
Arguments
  • output - name of the output file, extension ‘.mat’ is appended, if none

  • ensemble_id - identifier of the ensemble to be save

Remarks
  • this is the fastest way of saving an ensemble

  • any information generated by previous processing, such as spin labelling or domain partitioning, is retained

  • any metadata that is not part of PDB specification, such as AlphaFold predicted aligned error, is retained

  • this format cannot be imported by any other modelling or visualization software (at this time)

  • we strongly recommend to save in an exchangeable, if lossy, format as well by the archive or save keywords

  • we discourage deposition of only this format in an open data context, because the format is not interoperable and thus violates FAIR principles

save

Save ensemble to a single PDB file and a tab-separated file with weights

save output ensemble_id
Arguments
  • output - name of the output file, extension ‘.pdb’ is appended, if none

  • ensemble_id - identifier of the ensemble to be save

Remarks
  • the two output files can be used for submission to the protein ensemble database (PED)

  • weights (populations) are stored in a REMARK 400 field, MMMx can read them on reloading, but other software cannot

  • weights are also stored in a tab-separated (.tsv) file with the same basis name

  • in some contexts, saving to an archive of individual conformer files and a file list with weights is better, use keyword archive for that

sort

Iterative hierarchical clustering and sorting of an ensemble bsed on distance root-mean square deviation.

sort filename input [option]
Arguments
  • filename - name of the output ensemble list, extension should be ‘.ens’

  • input - identifier of the input ensemble

  • option - option ‘oriented’ assumes that the conformers are already in the same frame, otherwise they are optimally superimposed

  • option - option ‘similarity’ starts from the conformer with highest population and builds a list with maximum similarity between neighbours

  • option - option ‘population’ sorts conformers by descending population

Remarks
  • by default (no option specified) similar conformers are grouped to clusters and the clusters are sorted by descending population

  • for cases with multiple discrete states, the default is strongly recommended

subsample

Subsample an ensemble to a smaller ensemble.

subsample ratio input output
Arguments
  • ratio - integer reduction factor for ensemble size

  • input - identifier of the input ensemble

  • output - identifier of the output (reduced) ensemble

Remarks
  • this is particularly useful for molecular dynamics trajectories

superimpose

Superposition of conformers in an ensemble

superimpose output input [range [template [template_range [mode]]]]
Arguments
  • output - name of the output file, extension ‘.pdb’ is appended, if none

  • input - identifier of the input ensemble

  • range - optional MMMx address that specifies only a range of a conformer for analysis, e.g. (A)187-320

  • template - template ensemble or structure (optional)

  • template_range - optional MMMx address that specifies a template range of a conformer, e.g. (B)187-320

  • mode - optional string mode can be ‘central’ to request superposition onto the central conformer

Remarks
  • by default, superposition is to the first conformer of the input ensemble if no range is provided

  • if a template and central are specified, superposition is to central conformer of a superensemble consisting of input and template

  • the range argument ‘(*)’ selects the complete structure

transition

Visualization of a state transition between two ensembles. This is a block key.

transition initial.(chain) final.(chain) range output
   subkey
   ...
.transition
Arguments
  • initial - identifier for the initial-state ensemble

  • final - identifier for the final-state ensemble

  • (chain) - chain tag, as in SRSF1_free.A, for selecting chain A in ensemble SRSF1_free

  • range - range where conformers are superimposed, as in 121-195 for residues 121-195 of the selected chains, do not include a chain tag here

  • output - basis filename for output

  • subkey - a subkey that specifies a visualization command from the following list

Available subkeys
  • show - MMM show command, is applied per conformer, example show (A)16-87 ribbon

  • color - MMM color command, is applied per conformer, example color (A)16-87 red * (cmd) (address) (argument) - any MMM command can be issued, address is a chain/range address and must be applicable per conformer

Remarks
  • the range argument can also be a list of residues, such as 16,107,148

  • conformers of the initial-state ensemble are divided to deselected conformers and conformational selection

  • conformers of the final-state ensemble are divided to conformational selection and induced fit

  • assignments and populations per subset are reported in the logfile

  • a visualization in abstract conformation space is automatically saved

  • PDB files and a .mmm script file are stored for visualization

  • the .mmm script file must be run separately in MMM

  • population (weight) is transparency-encoded if any subkeys are used

  • if the subkey block is empty, snake models (coil with diameter-encoded weight) are displayed in MMM

  • if the subkey block is empty, coloring is by subset (deselected red, conformational selection green, induced fit blue, superimposed range grey)

zenodo

Download and possibly extract a file from Zenodo without importing an ensemble to MMMx

Zenodo Zenodo_ID.filename
Arguments
  • Zenodo_ID.filename - Zenodo identifier, followed by a dot and the file name, e.g. ‘6384003.raw_superensemble_with_jackknife_ensembles.zip’

Remarks
  • any file on Zenodo can be downloaded, for instance, also ‘.mcx’ or ‘.mat’ files

  • archives in ‘.zip’, ‘.gz’, ‘.tar’, and ‘.tar.gz’ formats are automatically extracted after download

  • use the get_Zenodo keyword for directly importing an ensemble from Zenodo into MMMx