EnsembleAnalysis¶

This module analyzes ensembles in terms of intermediate order. In this module, ensembles have internal variable names (identifiers).

`addpdb`¶

Input of template conformers from PDB files.

addpdb file identifier

Arguments

file - file name, can contain wildcards
identifier - module-internal identifier for the ensemble, e.g. ‘ensemble_1’

Remarks

use wildcard ‘*’ for part of the filename to process all conformers from a previous step in the pipeline
individual PDB files can contain a single model or several models; this can be freely mixed

`archive`¶

Save ensemble as a ZIP file including a file list with weights and individual PDB files of all conformers

archive output ensemble_id

Arguments

output - name of the output file, extension is not required
ensemble_id - identifier of the ensemble to be save

Remarks

this is the “most interoperable” way of saving a weighted ensemble
information generated by previous processing, such as spin labelling or domain partitioning, is lost
metadata that is not part of PDB specification, such as AlphaFold predicted aligned error, is lost
the file-list-with-weights format of the included ‘.ens’ file is human readable and easy to process by other software
we favour this format for deposition on Zenodo
the format can be reimported with the get_Zenodo keyword
for processing with some other software, saving to a single PDB file with the keyword save may be the better option

`asphericity`¶

Computes the ensemble averaged asphericity and plots asphericity versus radius of gyration for all conformers

asphericity input [address]

Arguments

input - identifier of the input ensemble
address - chain address or chain and residue ranges, e.g. $(A)4-270$ , defaults to $(A)$

Remarks

output is a figure with Rg on the x axis and asphericity on the y axis
each conformer corresponds to one point marker, with MarkerSize corresponding to population

`cluster`¶

Reduce ensemble size by clustering.

cluster input output size range

Arguments

input - identifier of the input ensemble
output - identifier of the output (reduced) ensemble
size - number of conformers in the reduced ensemble
range - chain and residue range , e.g. $(A)187-320$ or list of residues, e.g. (A)187,231,316

Remarks

ensemble Shannon entropy and width before and after size reduction are reported in the log file
a similarity measure is reported in the log file

`compare`¶

Comparison of two ensembles

compare ensemble_1 ensemble_2 [range [mode]]

Arguments

ensemble_1 - identifier of the first ensemble
ensemble_2 - identifier of the second ensemble
range - optional MMMx address that specifies only a range of a conformer for comparison, e.g. $(A)187-320$
mode - optional string mode can be ‘resolved’ to request residue-wise comparison

Remarks

this is a legacy keyword, better use match for comparing two ensembles
the algorithm works well only if both ensembles are dense (spatially overlapping conformers)
the algorithm computes overlap of pseudo-electron densities between ensembles
the range argument ‘(*)’ selects the complete structure
the two ensembles may have different numbers of conformers
residue-wise comparison of large ensembles can take very long

`coulomb`¶

Computes and displays the ensemble averaged Coulomb interaction for pairs of charged residues

coulomb filename input [aa1 [aa2 [pH [I [Tmax]]]]]

Arguments

filename - name of the output file, comma-separated value file
input - identifier of the input ensemble
aa1 - amino acid type 1, defaults to $Arg$ , use three-letter code
aa2 - amino acid type 2, defaults to $Glu$ , use three-letter code
pH - pH value, default is 7
I - ionic strength, default is 0.150 M
Tmax - temperature corresponding to white on the color scale, defaults to the maximum interaction among all pairs

Remarks

output is as a ‘.csv’ file, with the residue numbers in the first and second column and the Coulomb interaction in the third column
in addition, a figure is output with a $hot$ colormap, where black is no interaction and white the maximum interaction
the interaction is scaled by the Boltzmann constant, so that it corresponds to the temperature where it matches thermal energy
specify parameter $Tmax$ if you want to compare different residue pairs for the same ensemble
a salt bridge at 0.150 M ionic strength is in the range of 350-400 K

`density`¶

Computes a 3D electron density map of an ensemble

density filename input [range [resolution]]

Arguments

filename - name of the output file, specify with extension ‘.mat’ for MMMx density files or ‘.mrc’ for MRC files
input - identifier of the input ensemble
range - optional MMMx address that specifies only a range of a conformer for analysis, e.g. $(A)187-320$
resolution - resolution in Angstroem (optional), defaults to 1

Remarks

output is as a ‘.mrc’ file, which can be visualized by most protein graphics programs
output can also be as a Matlab file for visualization with MMM or the ‘visualize_isosuface’ function
a larger value for resolution leads to faster computation and a smaller file and may still be adequate for strong disorder

`figures`¶

Specify figure output format.

figures format

Arguments

format - one of the formats in which Matlab can save figures, e.g. ‘pdf’,’svg’,’png’, ‘jpg’, default is ‘pdf’

Remarks

figure saving is on by default in this module, use format ‘off’ to switch it off

`flexibility`¶

Computes (local) Ramachandran flexibility profiles of peptide or nucleotide chains

flexibility filename input

Arguments

filename - name of the output file
input - identifier of the input ensemble

Remarks

the algorithm analyzes variation of backbone dihedrals $\psi$ and $\phi$
for RNA, pseudo-torsion angles are analyzed
local flexibility ranges between 0 (rigid) and 1 (random)

`getens`¶

Input of an ensemble from an MMMx ensemble list.

gentens file identifier

Arguments

file - name of an MMMx ensemble file list (extension ‘.ens’)
identifier - module-internal identifier for the ensemble, e.g. ‘ensemble_2’

Remarks

best way of analyzing an ensemble generated by the EnsembleFit module
all PDB files specified in the ensemble file list must be on the Matlab path

`get_MMMx`¶

Import of an ensemble from the internal (Matlab) format of MMMx

gen_MMMx filename identifier

Arguments

filename - name of a Matlab file generated with put_MMMx, e.g. ‘FUS_dispersed.mat’
identifier - module-internal identifier for the ensemble, e.g. ‘FUS_idspersed’

Remarks

this is the computationally least costly way of importing an ensemble
note that this format is not compatible with any other modelling or visualization software

`get_PED`¶

Import of an ensemble from the protein ensemble database (PED)

gen_PED PED_ID.ens_nr identifier

Arguments

PED_ID.ens_nr - PED identifier, followed by a dot and the ensemble number, e.g. ‘PED00020.e001’
identifier - module-internal identifier for the ensemble, e.g. ‘MeV1’

Remarks

PED ensembles do not feature conformer weights, uniform weights are assumed
requires internet access

`get_Zenodo`¶

Import of an ensemble from Zenodo

gen_Zenodo Zenodo_ID.filename identifier

Arguments

Zenodo_ID.filename - Zenodo identifier, followed by a dot and the file name, e.g. ‘8214049.FUS_condensed.zip’
identifier - module-internal identifier for the ensemble, e.g. ‘FUS_condensed’

Remarks

preferred format for Zenodo deposition is a ZIP archive (.zip) containing a .ens file and all PDB files listed in the .ens file list
if all PDB files are already available locally (on the Matlab path), the file on Zenodo can also be just a .ens file
archives containing an .ens file and PDB files can be imported as well from .gz, .tar, and .tar.gz formats
use the Zenodo keyword for downloading PDB files of a raw ensemble without importing the raw ensemble itself into MMMx

`inertiaframe`¶

Transform all conformers to their respective inertia frames

inertiaframe output input range

Arguments

output - name of the output file, extension ‘.pdb’ is appended, if none
input - identifier of the input ensemble
range - optional MMMx address that specifies only a range of a conformer for analysis, e.g. $(A)187-320$

Remarks

the x axis corresponds to the minimum and the z axis to the maximum moment of inertia
the smallest x and z coordinates correspond to the N terminus
the center of origin of the coordinate frame is the center of gravity of the conformer

`match`¶

Match conformers in one ensemble by conformers in a second ensemble

match ensemble_1 ensemble_2 [range [range2]]

Arguments

ensemble_1 - identifier of the first ensemble
ensemble_2 - identifier of the second ensemble
range - optional MMMx address that specifies a chain/residue range for matching, e.g. $(A)187-320$
range - optional MMMx address that specifies a different range in the second ensemble for comparison, e.g. $(B)1-134$

Remarks

the algorithm finds the closest conformer by distance root mean square in the second ensemble for each conformer in the first ensemble
the range argument ‘(*)’ selects the complete structure
the range argument can be missing (complete structure is the default)
if the first range argument is given and the second one is missing, the same range is applied in the second ensemble
the list of matches and the maximum mismatch are reported in the log file

`measures`¶

Compute various measures of the ensemble. This is a block key with $n$ lines for $n$ measures.

measures filename  input [range]
   subkey
   ...
.measures

Arguments

filename - basis name for the output files, abbreviated below as ‘%s’
input - identifier for the input ensemble
range - optional MMMx address that specifies only a range of a conformer for analysis, e.g. $(A)187-320$
subkey - a subkey that specifies a measure from the following list

Available subkeys

matlab - save output data to Matlab files
csv - save output data to comma-separated value files
oriented - assume that conformers are already oriented, default is false (conformers are superimposed)
Rg - radius of gyration. including standard deviation (output to logfile)
width - ensemble width and density in Angstroem (output to logfile), also computes pair r.m.s.d. matrix and central conformer
correlation - correlation matrix, output as figure and to files ‘residue_pair_correlation_%s’ with extensions ‘.csv’ and ‘.mat’
sort - sort for computation of correlation matrix
drms - uses distance root mean square deviation for correlation matrix and sorting
compactness - compactness matrix

Remarks

saving output to both Matlab (‘.mat’) and ‘.csv’ files is allowed
if neither the matlab nor the csv subkey is present, output is only to figures or logfile
oriented affects only computation of pair r.m.s.d. (correlation matrix)

`order`¶

Computes local order profiles of peptide or nucleotide chains

order filename input

Arguments

filename - name of the output file
input - identifier of the input ensemble

Remarks

the algorithm is based on an adaptation of Flory’s characteristic ratio to polymers with secondary structure
the local order parameter ranges between 0 (random) and 1 (perfect order)
the local order parameter is somewhat longer ranged than the flexibility parameter mentioned above

`property`¶

Computes a 3D property map of an ensemble

property filename input [range [resolution [property [pH [I]]]]]

Arguments

filename - name of the output file, specify with extension ‘.mat’ for MMMx density files or ‘.mrc’ for MRC files
input - identifier of the input ensemble
range - optional MMMx address that specifies only a range of a conformer for analysis, e.g. $(A)187-320$
resolution - resolution in Angstroem (optional), defaults to 1
property - can be $electrostatic$ (default), $cation-pi$ , or $hydrophobic$
pH - pH value, default is 7
I - ionic strength, default is 0.150 M

Remarks

output is as a ‘.mrc’ file, which can be visualized by most protein graphics programs
output can also be as a Matlab file for visualization with MMM or the ‘visualize_isosuface’ function
a larger value for resolution leads to faster computation and a smaller file and may still be adequate for strong disorder

`put_MMMx`¶

Save ensemble in internal MMMx (Matlab) format

put_MMMx output ensemble_id

Arguments

output - name of the output file, extension ‘.mat’ is appended, if none
ensemble_id - identifier of the ensemble to be save

Remarks

this is the fastest way of saving an ensemble
any information generated by previous processing, such as spin labelling or domain partitioning, is retained
any metadata that is not part of PDB specification, such as AlphaFold predicted aligned error, is retained
this format cannot be imported by any other modelling or visualization software (at this time)
we strongly recommend to save in an exchangeable, if lossy, format as well by the archive or save keywords
we discourage deposition of only this format in an open data context, because the format is not interoperable and thus violates FAIR principles

`save`¶

Save ensemble to a single PDB file and a tab-separated file with weights

save output ensemble_id

Arguments

output - name of the output file, extension ‘.pdb’ is appended, if none
ensemble_id - identifier of the ensemble to be save

Remarks

the two output files can be used for submission to the protein ensemble database (PED)
weights (populations) are stored in a REMARK 400 field, MMMx can read them on reloading, but other software cannot
weights are also stored in a tab-separated (.tsv) file with the same basis name
in some contexts, saving to an archive of individual conformer files and a file list with weights is better, use keyword archive for that

`sort`¶

Iterative hierarchical clustering and sorting of an ensemble bsed on distance root-mean square deviation.

sort filename input [option]

Arguments

filename - name of the output ensemble list, extension should be ‘.ens’
input - identifier of the input ensemble
option - option ‘oriented’ assumes that the conformers are already in the same frame, otherwise they are optimally superimposed
option - option ‘similarity’ starts from the conformer with highest population and builds a list with maximum similarity between neighbours
option - option ‘population’ sorts conformers by descending population

Remarks

by default (no option specified) similar conformers are grouped to clusters and the clusters are sorted by descending population
for cases with multiple discrete states, the default is strongly recommended

`subsample`¶

Subsample an ensemble to a smaller ensemble.

subsample ratio input output

Arguments

ratio - integer reduction factor for ensemble size
input - identifier of the input ensemble
output - identifier of the output (reduced) ensemble

Remarks

this is particularly useful for molecular dynamics trajectories

`superimpose`¶

Superposition of conformers in an ensemble

superimpose output input [range [template [template_range [mode]]]]

Arguments

output - name of the output file, extension ‘.pdb’ is appended, if none
input - identifier of the input ensemble
range - optional MMMx address that specifies only a range of a conformer for analysis, e.g. $(A)187-320$
template - template ensemble or structure (optional)
template_range - optional MMMx address that specifies a template range of a conformer, e.g. $(B)187-320$
mode - optional string mode can be ‘central’ to request superposition onto the central conformer

Remarks

by default, superposition is to the first conformer of the input ensemble if no range is provided
if a template and central are specified, superposition is to central conformer of a superensemble consisting of input and template
the range argument ‘(*)’ selects the complete structure

`transition`¶

Visualization of a state transition between two ensembles. This is a block key.

transition initial.(chain) final.(chain) range output
   subkey
   ...
.transition

Arguments

initial - identifier for the initial-state ensemble
final - identifier for the final-state ensemble
(chain) - chain tag, as in SRSF1_free.A, for selecting chain A in ensemble SRSF1_free
range - range where conformers are superimposed, as in 121-195 for residues 121-195 of the selected chains, do not include a chain tag here
output - basis filename for output
subkey - a subkey that specifies a visualization command from the following list

Available subkeys

show - MMM show command, is applied per conformer, example show (A)16-87 ribbon
color - MMM color command, is applied per conformer, example color (A)16-87 red * (cmd) (address) (argument) - any MMM command can be issued, address is a chain/range address and must be applicable per conformer

Remarks

the range argument can also be a list of residues, such as 16,107,148
conformers of the initial-state ensemble are divided to deselected conformers and conformational selection
conformers of the final-state ensemble are divided to conformational selection and induced fit
assignments and populations per subset are reported in the logfile
a visualization in abstract conformation space is automatically saved
PDB files and a .mmm script file are stored for visualization
the .mmm script file must be run separately in MMM
population (weight) is transparency-encoded if any subkeys are used
if the subkey block is empty, snake models (coil with diameter-encoded weight) are displayed in MMM
if the subkey block is empty, coloring is by subset (deselected red, conformational selection green, induced fit blue, superimposed range grey)

`zenodo`¶

Download and possibly extract a file from Zenodo without importing an ensemble to MMMx

Zenodo Zenodo_ID.filename

Arguments

Zenodo_ID.filename - Zenodo identifier, followed by a dot and the file name, e.g. ‘6384003.raw_superensemble_with_jackknife_ensembles.zip’

Remarks

any file on Zenodo can be downloaded, for instance, also ‘.mcx’ or ‘.mat’ files
archives in ‘.zip’, ‘.gz’, ‘.tar’, and ‘.tar.gz’ formats are automatically extracted after download
use the get_Zenodo keyword for directly importing an ensemble from Zenodo into MMMx

EnsembleAnalysis¶

addpdb¶

archive¶

asphericity¶

cluster¶

compare¶

coulomb¶

density¶

figures¶

flexibility¶

getens¶

get_MMMx¶

get_PED¶

get_Zenodo¶

inertiaframe¶

match¶

measures¶

order¶

property¶

put_MMMx¶

save¶

sort¶

subsample¶

superimpose¶

transition¶

zenodo¶

`addpdb`¶

`archive`¶

`asphericity`¶

`cluster`¶

`compare`¶

`coulomb`¶

`density`¶

`figures`¶

`flexibility`¶

`getens`¶

`get_MMMx`¶

`get_PED`¶

`get_Zenodo`¶

`inertiaframe`¶

`match`¶

`measures`¶

`order`¶

`property`¶

`put_MMMx`¶

`save`¶

`sort`¶

`subsample`¶

`superimpose`¶

`transition`¶

`zenodo`¶