EnsembleAnalysis¶
This module analyzes ensembles in terms of intermediate order. In this module, ensembles have internal variable names (identifiers).
addpdb
¶
Input of template conformers from PDB files.
addpdb file identifier
- Arguments
file
- file name, can contain wildcardsidentifier
- module-internal identifier for the ensemble, e.g. ‘ensemble_1’
- Remarks
use wildcard ‘*’ for part of the filename to process all conformers from a previous step in the pipeline
individual PDB files can contain a single model or several models; this can be freely mixed
archive
¶
Save ensemble as a ZIP file including a file list with weights and individual PDB files of all conformers
archive output ensemble_id
- Arguments
output
- name of the output file, extension is not requiredensemble_id
- identifier of the ensemble to be save
- Remarks
this is the “most interoperable” way of saving a weighted ensemble
information generated by previous processing, such as spin labelling or domain partitioning, is lost
metadata that is not part of PDB specification, such as AlphaFold predicted aligned error, is lost
the file-list-with-weights format of the included ‘.ens’ file is human readable and easy to process by other software
we favour this format for deposition on Zenodo
the format can be reimported with the
get_Zenodo
keywordfor processing with some other software, saving to a single PDB file with the keyword
save
may be the better option
asphericity
¶
Computes the ensemble averaged asphericity and plots asphericity versus radius of gyration for all conformers
asphericity input [address]
- Arguments
input
- identifier of the input ensembleaddress
- chain address or chain and residue ranges, e.g. , defaults to
- Remarks
output is a figure with Rg on the x axis and asphericity on the y axis
each conformer corresponds to one point marker, with MarkerSize corresponding to population
cluster
¶
Reduce ensemble size by clustering.
cluster input output size range
- Arguments
input
- identifier of the input ensembleoutput
- identifier of the output (reduced) ensemblesize
- number of conformers in the reduced ensemblerange
- chain and residue range , e.g. or list of residues, e.g. (A)187,231,316
- Remarks
ensemble Shannon entropy and width before and after size reduction are reported in the log file
a similarity measure is reported in the log file
compare
¶
Comparison of two ensembles
compare ensemble_1 ensemble_2 [range [mode]]
- Arguments
ensemble_1
- identifier of the first ensembleensemble_2
- identifier of the second ensemblerange
- optional MMMx address that specifies only a range of a conformer for comparison, e.g.mode
- optional string mode can be ‘resolved’ to request residue-wise comparison
- Remarks
this is a legacy keyword, better use
match
for comparing two ensemblesthe algorithm works well only if both ensembles are dense (spatially overlapping conformers)
the algorithm computes overlap of pseudo-electron densities between ensembles
the range argument ‘(*)’ selects the complete structure
the two ensembles may have different numbers of conformers
residue-wise comparison of large ensembles can take very long
coulomb
¶
Computes and displays the ensemble averaged Coulomb interaction for pairs of charged residues
coulomb filename input [aa1 [aa2 [pH [I [Tmax]]]]]
- Arguments
filename
- name of the output file, comma-separated value fileinput
- identifier of the input ensembleaa1
- amino acid type 1, defaults to , use three-letter codeaa2
- amino acid type 2, defaults to , use three-letter codepH
- pH value, default is 7I
- ionic strength, default is 0.150 MTmax
- temperature corresponding to white on the color scale, defaults to the maximum interaction among all pairs
- Remarks
output is as a ‘.csv’ file, with the residue numbers in the first and second column and the Coulomb interaction in the third column
in addition, a figure is output with a colormap, where black is no interaction and white the maximum interaction
the interaction is scaled by the Boltzmann constant, so that it corresponds to the temperature where it matches thermal energy
specify parameter if you want to compare different residue pairs for the same ensemble
a salt bridge at 0.150 M ionic strength is in the range of 350-400 K
density
¶
Computes a 3D electron density map of an ensemble
density filename input [range [resolution]]
- Arguments
filename
- name of the output file, specify with extension ‘.mat’ for MMMx density files or ‘.mrc’ for MRC filesinput
- identifier of the input ensemblerange
- optional MMMx address that specifies only a range of a conformer for analysis, e.g.resolution
- resolution in Angstroem (optional), defaults to 1
- Remarks
output is as a ‘.mrc’ file, which can be visualized by most protein graphics programs
output can also be as a Matlab file for visualization with MMM or the ‘visualize_isosuface’ function
a larger value for resolution leads to faster computation and a smaller file and may still be adequate for strong disorder
figures
¶
Specify figure output format.
figures format
- Arguments
format
- one of the formats in which Matlab can save figures, e.g. ‘pdf’,’svg’,’png’, ‘jpg’, default is ‘pdf’
- Remarks
figure saving is on by default in this module, use format ‘off’ to switch it off
flexibility
¶
Computes (local) Ramachandran flexibility profiles of peptide or nucleotide chains
flexibility filename input
- Arguments
filename
- name of the output fileinput
- identifier of the input ensemble
- Remarks
the algorithm analyzes variation of backbone dihedrals and
for RNA, pseudo-torsion angles are analyzed
local flexibility ranges between 0 (rigid) and 1 (random)
getens
¶
Input of an ensemble from an MMMx ensemble list.
gentens file identifier
- Arguments
file
- name of an MMMx ensemble file list (extension ‘.ens’)identifier
- module-internal identifier for the ensemble, e.g. ‘ensemble_2’
- Remarks
best way of analyzing an ensemble generated by the EnsembleFit module
all PDB files specified in the ensemble file list must be on the Matlab path
get_MMMx
¶
Import of an ensemble from the internal (Matlab) format of MMMx
gen_MMMx filename identifier
- Arguments
filename
- name of a Matlab file generated withput_MMMx
, e.g. ‘FUS_dispersed.mat’identifier
- module-internal identifier for the ensemble, e.g. ‘FUS_idspersed’
- Remarks
this is the computationally least costly way of importing an ensemble
note that this format is not compatible with any other modelling or visualization software
get_PED
¶
Import of an ensemble from the protein ensemble database (PED)
gen_PED PED_ID.ens_nr identifier
- Arguments
PED_ID.ens_nr
- PED identifier, followed by a dot and the ensemble number, e.g. ‘PED00020.e001’identifier
- module-internal identifier for the ensemble, e.g. ‘MeV1’
- Remarks
PED ensembles do not feature conformer weights, uniform weights are assumed
requires internet access
get_Zenodo
¶
Import of an ensemble from Zenodo
gen_Zenodo Zenodo_ID.filename identifier
- Arguments
Zenodo_ID.filename
- Zenodo identifier, followed by a dot and the file name, e.g. ‘8214049.FUS_condensed.zip’identifier
- module-internal identifier for the ensemble, e.g. ‘FUS_condensed’
- Remarks
preferred format for Zenodo deposition is a ZIP archive (.zip) containing a .ens file and all PDB files listed in the .ens file list
if all PDB files are already available locally (on the Matlab path), the file on Zenodo can also be just a .ens file
archives containing an .ens file and PDB files can be imported as well from .gz, .tar, and .tar.gz formats
use the
Zenodo
keyword for downloading PDB files of a raw ensemble without importing the raw ensemble itself into MMMx
inertiaframe
¶
Transform all conformers to their respective inertia frames
inertiaframe output input range
- Arguments
output
- name of the output file, extension ‘.pdb’ is appended, if noneinput
- identifier of the input ensemblerange
- optional MMMx address that specifies only a range of a conformer for analysis, e.g.
- Remarks
the x axis corresponds to the minimum and the z axis to the maximum moment of inertia
the smallest x and z coordinates correspond to the N terminus
the center of origin of the coordinate frame is the center of gravity of the conformer
match
¶
Match conformers in one ensemble by conformers in a second ensemble
match ensemble_1 ensemble_2 [range [range2]]
- Arguments
ensemble_1
- identifier of the first ensembleensemble_2
- identifier of the second ensemblerange
- optional MMMx address that specifies a chain/residue range for matching, e.g.range
- optional MMMx address that specifies a different range in the second ensemble for comparison, e.g.
- Remarks
the algorithm finds the closest conformer by distance root mean square in the second ensemble for each conformer in the first ensemble
the range argument ‘(*)’ selects the complete structure
the range argument can be missing (complete structure is the default)
if the first range argument is given and the second one is missing, the same range is applied in the second ensemble
the list of matches and the maximum mismatch are reported in the log file
measures
¶
Compute various measures of the ensemble. This is a block key with lines for measures.
measures filename input [range]
subkey
...
.measures
- Arguments
filename
- basis name for the output files, abbreviated below as ‘%s’input
- identifier for the input ensemblerange
- optional MMMx address that specifies only a range of a conformer for analysis, e.g.subkey
- a subkey that specifies a measure from the following list
- Available subkeys
matlab
- save output data to Matlab filescsv
- save output data to comma-separated value filesoriented
- assume that conformers are already oriented, default is false (conformers are superimposed)Rg
- radius of gyration. including standard deviation (output to logfile)width
- ensemble width and density in Angstroem (output to logfile), also computes pair r.m.s.d. matrix and central conformercorrelation
- correlation matrix, output as figure and to files ‘residue_pair_correlation_%s’ with extensions ‘.csv’ and ‘.mat’sort
- sort for computation of correlation matrixdrms
- uses distance root mean square deviation for correlation matrix and sortingcompactness
- compactness matrix
- Remarks
saving output to both Matlab (‘.mat’) and ‘.csv’ files is allowed
if neither the
matlab
nor thecsv
subkey is present, output is only to figures or logfileoriented
affects only computation of pair r.m.s.d. (correlation matrix)
order
¶
Computes local order profiles of peptide or nucleotide chains
order filename input
- Arguments
filename
- name of the output fileinput
- identifier of the input ensemble
- Remarks
the algorithm is based on an adaptation of Flory’s characteristic ratio to polymers with secondary structure
the local order parameter ranges between 0 (random) and 1 (perfect order)
the local order parameter is somewhat longer ranged than the flexibility parameter mentioned above
property
¶
Computes a 3D property map of an ensemble
property filename input [range [resolution [property [pH [I]]]]]
- Arguments
filename
- name of the output file, specify with extension ‘.mat’ for MMMx density files or ‘.mrc’ for MRC filesinput
- identifier of the input ensemblerange
- optional MMMx address that specifies only a range of a conformer for analysis, e.g.resolution
- resolution in Angstroem (optional), defaults to 1property
- can be (default), , orpH
- pH value, default is 7I
- ionic strength, default is 0.150 M
- Remarks
output is as a ‘.mrc’ file, which can be visualized by most protein graphics programs
output can also be as a Matlab file for visualization with MMM or the ‘visualize_isosuface’ function
a larger value for resolution leads to faster computation and a smaller file and may still be adequate for strong disorder
put_MMMx
¶
Save ensemble in internal MMMx (Matlab) format
put_MMMx output ensemble_id
- Arguments
output
- name of the output file, extension ‘.mat’ is appended, if noneensemble_id
- identifier of the ensemble to be save
- Remarks
this is the fastest way of saving an ensemble
any information generated by previous processing, such as spin labelling or domain partitioning, is retained
any metadata that is not part of PDB specification, such as AlphaFold predicted aligned error, is retained
this format cannot be imported by any other modelling or visualization software (at this time)
we strongly recommend to save in an exchangeable, if lossy, format as well by the
archive
orsave
keywordswe discourage deposition of only this format in an open data context, because the format is not interoperable and thus violates FAIR principles
save
¶
Save ensemble to a single PDB file and a tab-separated file with weights
save output ensemble_id
- Arguments
output
- name of the output file, extension ‘.pdb’ is appended, if noneensemble_id
- identifier of the ensemble to be save
- Remarks
the two output files can be used for submission to the protein ensemble database (PED)
weights (populations) are stored in a REMARK 400 field, MMMx can read them on reloading, but other software cannot
weights are also stored in a tab-separated (.tsv) file with the same basis name
in some contexts, saving to an archive of individual conformer files and a file list with weights is better, use keyword
archive
for that
sort
¶
Iterative hierarchical clustering and sorting of an ensemble bsed on distance root-mean square deviation.
sort filename input [option]
- Arguments
filename
- name of the output ensemble list, extension should be ‘.ens’input
- identifier of the input ensembleoption
- option ‘oriented’ assumes that the conformers are already in the same frame, otherwise they are optimally superimposedoption
- option ‘similarity’ starts from the conformer with highest population and builds a list with maximum similarity between neighboursoption
- option ‘population’ sorts conformers by descending population
- Remarks
by default (no option specified) similar conformers are grouped to clusters and the clusters are sorted by descending population
for cases with multiple discrete states, the default is strongly recommended
subsample
¶
Subsample an ensemble to a smaller ensemble.
subsample ratio input output
- Arguments
ratio
- integer reduction factor for ensemble sizeinput
- identifier of the input ensembleoutput
- identifier of the output (reduced) ensemble
- Remarks
this is particularly useful for molecular dynamics trajectories
superimpose
¶
Superposition of conformers in an ensemble
superimpose output input [range [template [template_range [mode]]]]
- Arguments
output
- name of the output file, extension ‘.pdb’ is appended, if noneinput
- identifier of the input ensemblerange
- optional MMMx address that specifies only a range of a conformer for analysis, e.g.template
- template ensemble or structure (optional)template_range
- optional MMMx address that specifies a template range of a conformer, e.g.mode
- optional string mode can be ‘central’ to request superposition onto the central conformer
- Remarks
by default, superposition is to the first conformer of the input ensemble if no range is provided
if a template and central are specified, superposition is to central conformer of a superensemble consisting of input and template
the range argument ‘(*)’ selects the complete structure
transition
¶
Visualization of a state transition between two ensembles. This is a block key.
transition initial.(chain) final.(chain) range output
subkey
...
.transition
- Arguments
initial
- identifier for the initial-state ensemblefinal
- identifier for the final-state ensemble(chain)
- chain tag, as inSRSF1_free.A
, for selecting chain A in ensemble SRSF1_freerange
- range where conformers are superimposed, as in 121-195 for residues 121-195 of the selected chains, do not include a chain tag hereoutput
- basis filename for outputsubkey
- a subkey that specifies a visualization command from the following list
- Available subkeys
show
- MMMshow
command, is applied per conformer, exampleshow (A)16-87 ribbon
color
- MMMcolor
command, is applied per conformer, examplecolor (A)16-87 red
*(cmd) (address) (argument)
- any MMM command can be issued,address
is a chain/range address and must be applicable per conformer
- Remarks
the
range
argument can also be a list of residues, such as16,107,148
conformers of the initial-state ensemble are divided to deselected conformers and conformational selection
conformers of the final-state ensemble are divided to conformational selection and induced fit
assignments and populations per subset are reported in the logfile
a visualization in abstract conformation space is automatically saved
PDB files and a .mmm script file are stored for visualization
the .mmm script file must be run separately in MMM
population (weight) is transparency-encoded if any subkeys are used
if the subkey block is empty, snake models (coil with diameter-encoded weight) are displayed in MMM
if the subkey block is empty, coloring is by subset (deselected red, conformational selection green, induced fit blue, superimposed range grey)
zenodo
¶
Download and possibly extract a file from Zenodo without importing an ensemble to MMMx
Zenodo Zenodo_ID.filename
- Arguments
Zenodo_ID.filename
- Zenodo identifier, followed by a dot and the file name, e.g. ‘6384003.raw_superensemble_with_jackknife_ensembles.zip’
- Remarks
any file on Zenodo can be downloaded, for instance, also ‘.mcx’ or ‘.mat’ files
archives in ‘.zip’, ‘.gz’, ‘.tar’, and ‘.tar.gz’ formats are automatically extracted after download
use the
get_Zenodo
keyword for directly importing an ensemble from Zenodo into MMMx