EnsembleFit¶

This module performs integrative fitting of a raw ensemble to various sets of experimental restraints. The ensemble is contracted by fitting weights (populations) and discarding conformers with zero or very low weight.

`addpdb`¶

Input of template conformers from PDB files.

addpdb file

Arguments

file - file name, can contain wildcards

Remarks

use wildcard ‘*’ for part of the filename to process all conformers from a previous step in the pipeline
use this command to generate a raw ensemble or add conformers to a raw ensemble generated by getpdb
only one addpdb directive is allowed (the last one overwrites previous ones)

`archive`¶

Save ensemble as a ZIP file including a file list with weights and individual PDB files of all conformers

archive output ensemble_id

Arguments

output - name of the output file, extension is not required
ensemble_id - identifier of the ensemble to be save

Remarks

this is the “most interoperable” way of saving a weighted ensemble
information generated by previous processing, such as spin labelling or domain partitioning, is lost
metadata that is not part of PDB specification, such as AlphaFold predicted aligned error, is lost
the file-list-with-weights format of the included ‘.ens’ file is human readable and easy to process by other software
we favour this format for deposition on Zenodo
the format can be reimported with the get_Zenodo keyword
for processing with some other software, saving to a single PDB file with the keyword save may be the better option

`blocksize`¶

Specifies initial block size for population fitting

blocksize conformers

Arguments

conformers - initial number of conformers per block, defaults to 100

Remarks

block size is adaptive, there should be no reason to depart from the default

`csv`¶

Save fit results to comma-separated value (CSV) files

csv

Remarks

by default, full fit results are saved only to a Matlab file and CSV saving is off
if csv is on, all information underlying fit plots is saved, even if plot is off
this feature also reports fit quality of individual restraints to the logfile (except PRE)
small-angle scattering data has four columns: scattering vector, intensity, standard deviation, fitted intensity
PRE data has four columns: index, experimental PRE ratio rate, standard deviation, fitted PRE ratio or rate
distance distribution restraint (ddr) data has a variable column format, the format is specified in the logfile for each individual restraint
specifiers for ddr columns are: r distance axis, d experimental distribution, l lower bound, u upper bound, f fitted distribution, g distribution corresponding to a Gaussian restraint
dipolar evolution (deer) data has three columns, time axis (microseconds), experimental data, and fitted data
if plotgroups were specified for ddr, a format specifier s1 stands for plot group 1, s2 for plot group 2, and so on

`ddr`¶

Definition of distance distribution restraints. This is a block key with $n$ lines for $n$ restraints.

ddr label_1 [label_2]
   'address_1' 'address_2' 'rmean' 'rstd' [@'fname']
   ...
.ddr

Arguments

label_1, label_2 - label types, e.g. $mtsl$ , $dota-gd$
address_1, address_2 addresses of the two labelled sites, e.g., $(A)16$ , $107$
rmean mean distance in Angstroem, e.g. $32.5$
rstd standard deviation in Angstroem, e.g. $15.5$
fname optional file name of the distance distribution

Remarks

if both labels are the same, it is sufficient to specify the label type once
use separate ‘ddr’ blocks for each label combination
the file name is optional, but using full distributions is strongly recommended
if a full distribution is provided, rmean and rstd can be skipped

`deer`¶

Definition of primary DEER data as restraints or for backcalculation. This is a block key with $n$ lines for $n$ restraints.

deer label_1 [label_2]
   'address_1' 'address_2' @'fname'
   ...
.deer

Arguments

label_1, label_2 - label types, e.g. $mtsl$ , $dota-gd$
address_1, address_2 addresses of the two labelled sites, e.g., $(A)16$ , $107$
fname file name of the DEER data, must contain a background fit (see Remarks)

Remarks

the data files must contain a time axis as first column, the real part of phase-corrected primary data as second column, and the background fit as fourth column
Comparative Deer Analyzer in DeerAnalysis 2022 and later provides the required format
for backcalculation with the nofit keyword, the background is not used
use separate ‘deer’ blocks for each label combination

`discard`¶

Defines the weight threshold for discarding conformers as a fraction of the maximum weight.

discard threshold

Arguments

threshold - a number between 0 and 1, default is 0.01

`expand`¶

Input and expansion of rigid-body arrangements.

expand [fname]

Arguments

file - optional fle name for saving extracted rigid-body arrangements

Remarks

the output of a previous Rigi module in the pipeline is expanded
input file format is the Matlab output format of Rigi
use this command only for direct processing of Rigi results by EnsembleFit
this keyword cannot be combined with initial, addpdb, and getpdb
only one expand directive is allowed (the last one overwrites previous ones)

`figures`¶

Requests that figures are saved and specifies a graphics format for them.

figures format

Arguments

format - optional, one of the formats in which Matlab can save figures, e.g. ‘pdf’

Remarks

this switches on figure saving, which is off by default
in most contexts, vector graphic output as ‘pdf’ works best, this is the default
plot is switched on if it was not already switched on
file names for small-angle scattering fits are derived from the name of the input data
file names for distance distribution overlap are derived from the two site addresses
file names for PRE fits are derived from the labeling site
each small-angle scattering restraint generates four plots: linear, semi-logarithmic, double logarithmic, and residual

`getpdb`¶

Input of a raw ensemble by reading a single PDB file.

getpdb file

Arguments

file - file name

Remarks

the PDB file can contain several models (conformers) or a single one
for MMMx ensemble PDB files with population information in REMARK 400, such information is read, otherwise populations are uniform
only one getpdb directive is allowed (the last one overwrites previous ones)

`initial`¶

Input an initial ensemble with populations from an MMMx ensemble fle

initial file

Arguments

file - file name, must refer to a single ensemble (extension ‘.ens’ or ‘zip’)

Remarks

use this in combination with nofit to generate plots or save data for an existing ensemble
it is possible to combine initial with addpdb and/or getpdb
only one initial directive is allowed (the last one overwrites previous ones)
the filename can specify a ZIP archive containing individual PDB files for conformers and the corresponding file list with weights

`interactive`¶

Requests display of fit information during fitting

interactive

Remarks

the key enables display of fit information in a plot during fitting
this option may be useful for tests, but should be skipped for runs on a server

`nnllsq`¶

Requests non-negative linear least square fitting of all populations.

nnllsq bckg sasbckg

Arguments

bckg - order of the polynomial for additional DEER background correction, a constant offset (order 0) is default
sasbckg - if this argument is present (use, e.g. $on$ ), constant small-angle scattering background ist fitted, defaults to no fit

Remarks

requires that DEER restraints are defined by primary DEER data including a existing background fit (see keyword deer)
ddr distance distribution restraints are ignored
PRE restraints are always fitted as PRE ratios, if rates are given, these are converted

`nofit`¶

Specifies basis name for saving output conformers

nofit

Remarks

the key requests only restraint computation and analysis for the input ensemble, without fitting of weights (populations)

`pdbsave`¶

Request that the ensemble is saved into a single PDB file and specifies the file name

pdbsave file

Arguments

file - output file name, extension should be ‘.pdb’

Remarks

if this key is missing, the ensemble is stored only as a list of selected input conformers and their weights

`plot`¶

Requests generation of Matlab plots showing fit quality

plot

Remarks

the key generates Matlab result plots after fitting, default is not to plot
this can be useful even on a server, if you save the plots as PDF files

`plotgroup`¶

Assigns conformers to plot groups.

plotgroup svgcolor conformers

Arguments

svgcolor - a scalable vector graphics color name for the distributions of the subensemble
conformers - a conformer number list in MMMx address list format

Remarks

see SVG color table for available colors
conformer numbers are separated by comma and ranges are indicated by hyphen, e.g. ‘2, 4, 7-11, 15’
this makes sense only in a nofit run, after the original ensemble was already analyzed
it can help to see how subensembles contribute to distance distributions
the only effect is for plots of fitted distance distributions

`pre`¶

Definition of NMR paramagnetic relaxation enhancement (PRE) restraints as intensity ratios. This is a block key with $n$ lines for $n$ restraints.

pre label site Larmor td R2dia [taui [taur [maxrate]]]
   'address_1' 'ratio' ['std']
   ...
.pre

Arguments

label - label type, e.g. $mtsl$
site - spin-labelled site, e.g. $(A)16$
Larmor - proton Larmor frequency in MHz, e.g. 700
td - total INEPT delay in ms. e.g. 10.8
R2dia - relaxation rate for the diamagnetic sample in $s^{-1}$ , e.g. 66
taui - correlation time of internal label motion in ns, e.g. 0.6, default 0.5
taur - rotational correlation time of the protein in ns, e.g. 3.7
maxrate - maximum rate enhancement in $s^{-1}$ , e.g. 150, defaults to 170
address - site address, e.g., $(A)16$
ratio - intensity ratio between paramagnetic and diamagnetic sample, should be between 0 and 1
std - standard deviation of the PRE ratio, optional

Remarks

ratios above 1 are accepted and interpreted as no PRE effect
‘taui’ may be estimated from the CW EPR spectrum of the labelled sample
‘taur’ will be estimated or computed with HYDROPRO if it is not provided, this is usually better
for disordered systems, a general ‘taur’ for all conformers may be a poor approximation
if standard deviation is missing, all PRE restraints in this block have the same weight

`prerates`¶

Definition of NMR paramagnetic relaxation enhancement (PRE) restraints as relaxation enhancement rates $\Gamma_2$ . This is a block key with $n$ lines for $n$ restraints.

prerates label larmor td R2dia [taui [taur [maxrate]]]
   'address_1' 'rate' ['std']
   ...
.prerates

Arguments

label - label type, e.g. $mtsl$
larmor - proton Larmor frequency in MHz, e.g. 700
td - total INEPT delay in ms. e.g. 10.8, is used to convert rate to ratio for NNLLSQ fitting
R2dia - relaxation rate for the diamagnetic sample in $s^{-1}$ , has no effect for rate fitting
taui - correlation time of internal label motion in ns, e.g. 0.6, default 0.5
taur - rotational correlation time of the protein in ns, e.g. 3.7
maxrate - maximum rate enhancement in $s^{-1}$ , e.g. 150, defaults to 170
address - site address, e.g., $(A)16$
rate - rate enhancement in $s^{-1}$ , e.g. 40
std - standard deviation of the rate enhancement, optional

Remarks

ratios above 1 are accepted and interpreted as no PRE effect
‘taui’ may be estimated from the CW EPR spectrum of the labelled sample
‘taur’ will be estimated or computed with HYDROPRO if it is not provided
for disordered systems, a general ‘taur’ for all conformers may be a poor approximation
if standard deviation is missing, all PRE restraints in this block have the same weight

`preratelinear`¶

This keyword implements legacy behavior of fitting PRE rates linearly. Use it only for replicating old fits. New default behavior is fitting the logarithm of the rate.

`rmean`¶

For fitting mean distances instead of distributions. Provided for method development.

rmean

Remarks

the key requests that mean distances instead of distance distribution restraints are fitted
do this only if you have a very good reason

`sans`¶

Specifies small-angle neutron scattering restraints

sans data [resolution [deuteration]]

Arguments

data - name of the input scattering data file, must be a file acceptable by ‘cryson’ in the ATSAS package
resolution - name of a resolution file, must be a file acceptable by ‘cryson’ in the ATSAS package * deuteration - fraction of buffer deuteration, between 0 and 1, e.g. 0.66, optional

Remarks

SANS fitting works without resolution file, but it is strongly recommended to provide one
if deuteration is not specified, natural proton abundance buffer is assumed
SANS curves are computed by the ATSAS package installed on this computer and present on the Matlab path

`save`¶

Specifies a file name for saving the fitted ensemble

save file

Arguments

file - output file name, extension should be ‘.ens’

Remarks

if the save key is missing, the ensemble list is saved to ‘ensemble.ens’

`saxs`¶

Specifies small-angle x-ray scattering restraints

saxs data ['crysol3']

Arguments

data - name of the input scattering data file, must be a file acceptable by ‘crysol’ in the ATSAS package
'crysol3' - if crysol3 is specified, SAXS data are computed with this newer version

Remarks

crysol3 uses a different algorithm for the hydration shell
fitting once with original crysol and once with crysol3 can provide an idea about uncertainty due to hydration shell modelling
SAXS curves are computed by the ATSAS package installed on this computer and present on the Matlab path

`zenodo`¶

Download and possibly extract a file from Zenodo

Zenodo Zenodo_ID.filename

Arguments

Zenodo_ID.filename - Zenodo identifier, followed by a dot and the file name, e.g. ‘6384003.raw_superensemble_with_jackknife_ensembles.zip’

Remarks

any file on Zenodo can be downloaded, for instance, also ‘.ens’ files
archives in ‘.zip’, ‘.gz’, ‘.tar’, and ‘.tar.gz’ formats are automatically extracted after download
this can be used together with the addpb or initial keywords for working with raw ensemble or initial ensembles stored on Zenodo

EnsembleFit¶

addpdb¶

archive¶

blocksize¶

csv¶

ddr¶

deer¶

discard¶

expand¶

figures¶

getpdb¶

initial¶

interactive¶

nnllsq¶

nofit¶

pdbsave¶

plot¶

plotgroup¶

pre¶

prerates¶

preratelinear¶

rmean¶

sans¶

save¶

saxs¶

zenodo¶

`addpdb`¶

`archive`¶

`blocksize`¶

`csv`¶

`ddr`¶

`deer`¶

`discard`¶

`expand`¶

`figures`¶

`getpdb`¶

`initial`¶

`interactive`¶

`nnllsq`¶

`nofit`¶

`pdbsave`¶

`plot`¶

`plotgroup`¶

`pre`¶

`prerates`¶

`preratelinear`¶

`rmean`¶

`sans`¶

`save`¶

`saxs`¶

`zenodo`¶