EnsembleFit

This module performs integrative fitting of a raw ensemble to various sets of experimental restraints. The ensemble is contracted by fitting weights (populations) and discarding conformers with zero or very low weight.

addpdb

Input of template conformers from PDB files.

addpdb file
Arguments
  • file - file name, can contain wildcards

Remarks
  • use wildcard ‘*’ for part of the filename to process all conformers from a previous step in the pipeline

  • use this command to generate a raw ensemble or add conformers to a raw ensemble generated by getpdb

  • only one addpdb directive is allowed (the last one overwrites previous ones)

archive

Save ensemble as a ZIP file including a file list with weights and individual PDB files of all conformers

archive output ensemble_id
Arguments
  • output - name of the output file, extension is not required

  • ensemble_id - identifier of the ensemble to be save

Remarks
  • this is the “most interoperable” way of saving a weighted ensemble

  • information generated by previous processing, such as spin labelling or domain partitioning, is lost

  • metadata that is not part of PDB specification, such as AlphaFold predicted aligned error, is lost

  • the file-list-with-weights format of the included ‘.ens’ file is human readable and easy to process by other software

  • we favour this format for deposition on Zenodo

  • the format can be reimported with the get_Zenodo keyword

  • for processing with some other software, saving to a single PDB file with the keyword save may be the better option

blocksize

Specifies initial block size for population fitting

blocksize conformers
Arguments
  • conformers - initial number of conformers per block, defaults to 100

Remarks
  • block size is adaptive, there should be no reason to depart from the default

csv

Save fit results to comma-separated value (CSV) files

csv
Remarks
  • by default, full fit results are saved only to a Matlab file and CSV saving is off

  • if csv is on, all information underlying fit plots is saved, even if plot is off

  • this feature also reports fit quality of individual restraints to the logfile (except PRE)

  • small-angle scattering data has four columns: scattering vector, intensity, standard deviation, fitted intensity

  • PRE data has four columns: index, experimental PRE ratio rate, standard deviation, fitted PRE ratio or rate

  • distance distribution restraint (ddr) data has a variable column format, the format is specified in the logfile for each individual restraint

  • specifiers for ddr columns are: r distance axis, d experimental distribution, l lower bound, u upper bound, f fitted distribution, g distribution corresponding to a Gaussian restraint

  • dipolar evolution (deer) data has three columns, time axis (microseconds), experimental data, and fitted data

  • if plotgroups were specified for ddr, a format specifier s1 stands for plot group 1, s2 for plot group 2, and so on

ddr

Definition of distance distribution restraints. This is a block key with n lines for n restraints.

ddr label_1 [label_2]
   'address_1' 'address_2' 'rmean' 'rstd' [@'fname']
   ...
.ddr
Arguments
  • label_1, label_2 - label types, e.g. mtsl, dota-gd

  • address_1, address_2 addresses of the two labelled sites, e.g., (A)16, 107

  • rmean mean distance in Angstroem, e.g. 32.5

  • rstd standard deviation in Angstroem, e.g. 15.5

  • fname optional file name of the distance distribution

Remarks
  • if both labels are the same, it is sufficient to specify the label type once

  • use separate ‘ddr’ blocks for each label combination

  • the file name is optional, but using full distributions is strongly recommended

  • if a full distribution is provided, rmean and rstd can be skipped

deer

Definition of primary DEER data as restraints or for backcalculation. This is a block key with n lines for n restraints.

deer label_1 [label_2]
   'address_1' 'address_2' @'fname'
   ...
.deer
Arguments
  • label_1, label_2 - label types, e.g. mtsl, dota-gd

  • address_1, address_2 addresses of the two labelled sites, e.g., (A)16, 107

  • fname file name of the DEER data, must contain a background fit (see Remarks)

Remarks
  • the data files must contain a time axis as first column, the real part of phase-corrected primary data as second column, and the background fit as fourth column

  • Comparative Deer Analyzer in DeerAnalysis 2022 and later provides the required format

  • for backcalculation with the nofit keyword, the background is not used

  • use separate ‘deer’ blocks for each label combination

discard

Defines the weight threshold for discarding conformers as a fraction of the maximum weight.

discard threshold
Arguments
  • threshold - a number between 0 and 1, default is 0.01

expand

Input and expansion of rigid-body arrangements.

expand [fname]
Arguments
  • file - optional fle name for saving extracted rigid-body arrangements

Remarks
  • the output of a previous Rigi module in the pipeline is expanded

  • input file format is the Matlab output format of Rigi

  • use this command only for direct processing of Rigi results by EnsembleFit

  • this keyword cannot be combined with initial, addpdb, and getpdb

  • only one expand directive is allowed (the last one overwrites previous ones)

figures

Requests that figures are saved and specifies a graphics format for them.

figures format
Arguments
  • format - optional, one of the formats in which Matlab can save figures, e.g. ‘pdf’

Remarks
  • this switches on figure saving, which is off by default

  • in most contexts, vector graphic output as ‘pdf’ works best, this is the default

  • plot is switched on if it was not already switched on

  • file names for small-angle scattering fits are derived from the name of the input data

  • file names for distance distribution overlap are derived from the two site addresses

  • file names for PRE fits are derived from the labeling site

  • each small-angle scattering restraint generates four plots: linear, semi-logarithmic, double logarithmic, and residual

getpdb

Input of a raw ensemble by reading a single PDB file.

getpdb file
Arguments
  • file - file name

Remarks
  • the PDB file can contain several models (conformers) or a single one

  • for MMMx ensemble PDB files with population information in REMARK 400, such information is read, otherwise populations are uniform

  • only one getpdb directive is allowed (the last one overwrites previous ones)

initial

Input an initial ensemble with populations from an MMMx ensemble fle

initial file
Arguments
  • file - file name, must refer to a single ensemble (extension ‘.ens’ or ‘zip’)

Remarks
  • use this in combination with nofit to generate plots or save data for an existing ensemble

  • it is possible to combine initial with addpdb and/or getpdb

  • only one initial directive is allowed (the last one overwrites previous ones)

  • the filename can specify a ZIP archive containing individual PDB files for conformers and the corresponding file list with weights

interactive

Requests display of fit information during fitting

interactive
Remarks
  • the key enables display of fit information in a plot during fitting

  • this option may be useful for tests, but should be skipped for runs on a server

nnllsq

Requests non-negative linear least square fitting of all populations.

nnllsq bckg sasbckg
Arguments
  • bckg - order of the polynomial for additional DEER background correction, a constant offset (order 0) is default

  • sasbckg - if this argument is present (use, e.g. on), constant small-angle scattering background ist fitted, defaults to no fit

Remarks
  • requires that DEER restraints are defined by primary DEER data including a existing background fit (see keyword deer)

  • ddr distance distribution restraints are ignored

  • PRE restraints are always fitted as PRE ratios, if rates are given, these are converted

nofit

Specifies basis name for saving output conformers

nofit
Remarks
  • the key requests only restraint computation and analysis for the input ensemble, without fitting of weights (populations)

pdbsave

Request that the ensemble is saved into a single PDB file and specifies the file name

pdbsave file
Arguments
  • file - output file name, extension should be ‘.pdb’

Remarks
  • if this key is missing, the ensemble is stored only as a list of selected input conformers and their weights

plot

Requests generation of Matlab plots showing fit quality

plot
Remarks
  • the key generates Matlab result plots after fitting, default is not to plot

  • this can be useful even on a server, if you save the plots as PDF files

plotgroup

Assigns conformers to plot groups.

plotgroup svgcolor conformers
Arguments
  • svgcolor - a scalable vector graphics color name for the distributions of the subensemble

  • conformers - a conformer number list in MMMx address list format

Remarks
  • see SVG color table for available colors

  • conformer numbers are separated by comma and ranges are indicated by hyphen, e.g. ‘2, 4, 7-11, 15’

  • this makes sense only in a nofit run, after the original ensemble was already analyzed

  • it can help to see how subensembles contribute to distance distributions

  • the only effect is for plots of fitted distance distributions

pre

Definition of NMR paramagnetic relaxation enhancement (PRE) restraints as intensity ratios. This is a block key with n lines for n restraints.

pre label site Larmor td R2dia [taui [taur [maxrate]]]
   'address_1' 'ratio' ['std']
   ...
.pre
Arguments
  • label - label type, e.g. mtsl

  • site - spin-labelled site, e.g. (A)16

  • Larmor - proton Larmor frequency in MHz, e.g. 700

  • td - total INEPT delay in ms. e.g. 10.8

  • R2dia - relaxation rate for the diamagnetic sample in s^{-1}, e.g. 66

  • taui - correlation time of internal label motion in ns, e.g. 0.6, default 0.5

  • taur - rotational correlation time of the protein in ns, e.g. 3.7

  • maxrate - maximum rate enhancement in s^{-1}, e.g. 150, defaults to 170

  • address - site address, e.g., (A)16

  • ratio - intensity ratio between paramagnetic and diamagnetic sample, should be between 0 and 1

  • std - standard deviation of the PRE ratio, optional

Remarks
  • ratios above 1 are accepted and interpreted as no PRE effect

  • ‘taui’ may be estimated from the CW EPR spectrum of the labelled sample

  • ‘taur’ will be estimated or computed with HYDROPRO if it is not provided, this is usually better

  • for disordered systems, a general ‘taur’ for all conformers may be a poor approximation

  • if standard deviation is missing, all PRE restraints in this block have the same weight

prerates

Definition of NMR paramagnetic relaxation enhancement (PRE) restraints as relaxation enhancement rates \Gamma_2. This is a block key with n lines for n restraints.

prerates label larmor td R2dia [taui [taur [maxrate]]]
   'address_1' 'rate' ['std']
   ...
.prerates
Arguments
  • label - label type, e.g. mtsl

  • larmor - proton Larmor frequency in MHz, e.g. 700

  • td - total INEPT delay in ms. e.g. 10.8, is used to convert rate to ratio for NNLLSQ fitting

  • R2dia - relaxation rate for the diamagnetic sample in s^{-1}, has no effect for rate fitting

  • taui - correlation time of internal label motion in ns, e.g. 0.6, default 0.5

  • taur - rotational correlation time of the protein in ns, e.g. 3.7

  • maxrate - maximum rate enhancement in s^{-1}, e.g. 150, defaults to 170

  • address - site address, e.g., (A)16

  • rate - rate enhancement in s^{-1}, e.g. 40

  • std - standard deviation of the rate enhancement, optional

Remarks
  • ratios above 1 are accepted and interpreted as no PRE effect

  • ‘taui’ may be estimated from the CW EPR spectrum of the labelled sample

  • ‘taur’ will be estimated or computed with HYDROPRO if it is not provided

  • for disordered systems, a general ‘taur’ for all conformers may be a poor approximation

  • if standard deviation is missing, all PRE restraints in this block have the same weight

preratelinear

This keyword implements legacy behavior of fitting PRE rates linearly. Use it only for replicating old fits. New default behavior is fitting the logarithm of the rate.

rmean

For fitting mean distances instead of distributions. Provided for method development.

rmean
Remarks
  • the key requests that mean distances instead of distance distribution restraints are fitted

  • do this only if you have a very good reason

sans

Specifies small-angle neutron scattering restraints

sans data [resolution [deuteration]]
Arguments
  • data - name of the input scattering data file, must be a file acceptable by ‘cryson’ in the ATSAS package

  • resolution - name of a resolution file, must be a file acceptable by ‘cryson’ in the ATSAS package * deuteration - fraction of buffer deuteration, between 0 and 1, e.g. 0.66, optional

Remarks
  • SANS fitting works without resolution file, but it is strongly recommended to provide one

  • if deuteration is not specified, natural proton abundance buffer is assumed

  • SANS curves are computed by the ATSAS package installed on this computer and present on the Matlab path

save

Specifies a file name for saving the fitted ensemble

save file
Arguments
  • file - output file name, extension should be ‘.ens’

Remarks
  • if the save key is missing, the ensemble list is saved to ‘ensemble.ens’

saxs

Specifies small-angle x-ray scattering restraints

saxs data ['crysol3']
Arguments
  • data - name of the input scattering data file, must be a file acceptable by ‘crysol’ in the ATSAS package

  • 'crysol3' - if crysol3 is specified, SAXS data are computed with this newer version

Remarks
  • crysol3 uses a different algorithm for the hydration shell

  • fitting once with original crysol and once with crysol3 can provide an idea about uncertainty due to hydration shell modelling

  • SAXS curves are computed by the ATSAS package installed on this computer and present on the Matlab path

zenodo

Download and possibly extract a file from Zenodo

Zenodo Zenodo_ID.filename
Arguments
  • Zenodo_ID.filename - Zenodo identifier, followed by a dot and the file name, e.g. ‘6384003.raw_superensemble_with_jackknife_ensembles.zip’

Remarks
  • any file on Zenodo can be downloaded, for instance, also ‘.ens’ files

  • archives in ‘.zip’, ‘.gz’, ‘.tar’, and ‘.tar.gz’ formats are automatically extracted after download

  • this can be used together with the addpb or initial keywords for working with raw ensemble or initial ensembles stored on Zenodo