EnsembleFit¶
This module performs integrative fitting of a raw ensemble to various sets of experimental restraints. The ensemble is contracted by fitting weights (populations) and discarding conformers with zero or very low weight.
addpdb
¶
Input of template conformers from PDB files.
addpdb file
- Arguments
file
- file name, can contain wildcards
- Remarks
use wildcard ‘*’ for part of the filename to process all conformers from a previous step in the pipeline
use this command to generate a raw ensemble or add conformers to a raw ensemble generated by
getpdb
only one
addpdb
directive is allowed (the last one overwrites previous ones)
archive
¶
Save ensemble as a ZIP file including a file list with weights and individual PDB files of all conformers
archive output ensemble_id
- Arguments
output
- name of the output file, extension is not requiredensemble_id
- identifier of the ensemble to be save
- Remarks
this is the “most interoperable” way of saving a weighted ensemble
information generated by previous processing, such as spin labelling or domain partitioning, is lost
metadata that is not part of PDB specification, such as AlphaFold predicted aligned error, is lost
the file-list-with-weights format of the included ‘.ens’ file is human readable and easy to process by other software
we favour this format for deposition on Zenodo
the format can be reimported with the
get_Zenodo
keywordfor processing with some other software, saving to a single PDB file with the keyword
save
may be the better option
blocksize
¶
Specifies initial block size for population fitting
blocksize conformers
- Arguments
conformers
- initial number of conformers per block, defaults to 100
- Remarks
block size is adaptive, there should be no reason to depart from the default
csv
¶
Save fit results to comma-separated value (CSV) files
csv
- Remarks
by default, full fit results are saved only to a Matlab file and CSV saving is off
if
csv
is on, all information underlying fit plots is saved, even ifplot
is offthis feature also reports fit quality of individual restraints to the logfile (except PRE)
small-angle scattering data has four columns: scattering vector, intensity, standard deviation, fitted intensity
PRE data has four columns: index, experimental PRE ratio rate, standard deviation, fitted PRE ratio or rate
distance distribution restraint (ddr) data has a variable column format, the format is specified in the logfile for each individual restraint
specifiers for ddr columns are:
r
distance axis,d
experimental distribution,l
lower bound,u
upper bound,f
fitted distribution,g
distribution corresponding to a Gaussian restraintdipolar evolution (deer) data has three columns, time axis (microseconds), experimental data, and fitted data
if
plotgroups
were specified for ddr, a format specifiers1
stands for plot group 1,s2
for plot group 2, and so on
ddr
¶
Definition of distance distribution restraints. This is a block key with lines for restraints.
ddr label_1 [label_2]
'address_1' 'address_2' 'rmean' 'rstd' [@'fname']
...
.ddr
- Arguments
label_1
,label_2
- label types, e.g. ,address_1
,address_2
addresses of the two labelled sites, e.g., ,rmean
mean distance in Angstroem, e.g.rstd
standard deviation in Angstroem, e.g.fname
optional file name of the distance distribution
- Remarks
if both labels are the same, it is sufficient to specify the label type once
use separate ‘ddr’ blocks for each label combination
the file name is optional, but using full distributions is strongly recommended
if a full distribution is provided,
rmean
andrstd
can be skipped
deer
¶
Definition of primary DEER data as restraints or for backcalculation. This is a block key with lines for restraints.
deer label_1 [label_2]
'address_1' 'address_2' @'fname'
...
.deer
- Arguments
label_1
,label_2
- label types, e.g. ,address_1
,address_2
addresses of the two labelled sites, e.g., ,fname
file name of the DEER data, must contain a background fit (see Remarks)
- Remarks
the data files must contain a time axis as first column, the real part of phase-corrected primary data as second column, and the background fit as fourth column
Comparative Deer Analyzer in DeerAnalysis 2022 and later provides the required format
for backcalculation with the
nofit
keyword, the background is not useduse separate ‘deer’ blocks for each label combination
discard
¶
Defines the weight threshold for discarding conformers as a fraction of the maximum weight.
discard threshold
- Arguments
threshold
- a number between 0 and 1, default is 0.01
expand
¶
Input and expansion of rigid-body arrangements.
expand [fname]
- Arguments
file
- optional fle name for saving extracted rigid-body arrangements
- Remarks
the output of a previous Rigi module in the pipeline is expanded
input file format is the Matlab output format of Rigi
use this command only for direct processing of Rigi results by EnsembleFit
this keyword cannot be combined with
initial
,addpdb
, andgetpdb
only one
expand
directive is allowed (the last one overwrites previous ones)
figures
¶
Requests that figures are saved and specifies a graphics format for them.
figures format
- Arguments
format
- optional, one of the formats in which Matlab can save figures, e.g. ‘pdf’
- Remarks
this switches on figure saving, which is off by default
in most contexts, vector graphic output as ‘pdf’ works best, this is the default
plot
is switched on if it was not already switched onfile names for small-angle scattering fits are derived from the name of the input data
file names for distance distribution overlap are derived from the two site addresses
file names for PRE fits are derived from the labeling site
each small-angle scattering restraint generates four plots: linear, semi-logarithmic, double logarithmic, and residual
getpdb
¶
Input of a raw ensemble by reading a single PDB file.
getpdb file
- Arguments
file
- file name
- Remarks
the PDB file can contain several models (conformers) or a single one
for MMMx ensemble PDB files with population information in
REMARK 400
, such information is read, otherwise populations are uniformonly one
getpdb
directive is allowed (the last one overwrites previous ones)
initial
¶
Input an initial ensemble with populations from an MMMx ensemble fle
initial file
- Arguments
file
- file name, must refer to a single ensemble (extension ‘.ens’ or ‘zip’)
- Remarks
use this in combination with
nofit
to generate plots or save data for an existing ensembleit is possible to combine
initial
withaddpdb
and/orgetpdb
only one
initial
directive is allowed (the last one overwrites previous ones)the filename can specify a ZIP archive containing individual PDB files for conformers and the corresponding file list with weights
interactive
¶
Requests display of fit information during fitting
interactive
- Remarks
the key enables display of fit information in a plot during fitting
this option may be useful for tests, but should be skipped for runs on a server
nnllsq
¶
Requests non-negative linear least square fitting of all populations.
nnllsq bckg sasbckg
- Arguments
bckg
- order of the polynomial for additional DEER background correction, a constant offset (order 0) is defaultsasbckg
- if this argument is present (use, e.g. ), constant small-angle scattering background ist fitted, defaults to no fit
- Remarks
requires that DEER restraints are defined by primary DEER data including a existing background fit (see keyword
deer
)ddr
distance distribution restraints are ignoredPRE restraints are always fitted as PRE ratios, if rates are given, these are converted
nofit
¶
Specifies basis name for saving output conformers
nofit
- Remarks
the key requests only restraint computation and analysis for the input ensemble, without fitting of weights (populations)
pdbsave
¶
Request that the ensemble is saved into a single PDB file and specifies the file name
pdbsave file
- Arguments
file
- output file name, extension should be ‘.pdb’
- Remarks
if this key is missing, the ensemble is stored only as a list of selected input conformers and their weights
plot
¶
Requests generation of Matlab plots showing fit quality
plot
- Remarks
the key generates Matlab result plots after fitting, default is not to plot
this can be useful even on a server, if you save the plots as PDF files
plotgroup
¶
Assigns conformers to plot groups.
plotgroup svgcolor conformers
- Arguments
svgcolor
- a scalable vector graphics color name for the distributions of the subensembleconformers
- a conformer number list in MMMx address list format
- Remarks
see SVG color table for available colors
conformer numbers are separated by comma and ranges are indicated by hyphen, e.g. ‘2, 4, 7-11, 15’
this makes sense only in a
nofit
run, after the original ensemble was already analyzedit can help to see how subensembles contribute to distance distributions
the only effect is for plots of fitted distance distributions
pre
¶
Definition of NMR paramagnetic relaxation enhancement (PRE) restraints as intensity ratios. This is a block key with lines for restraints.
pre label site Larmor td R2dia [taui [taur [maxrate]]]
'address_1' 'ratio' ['std']
...
.pre
- Arguments
label
- label type, e.g.site
- spin-labelled site, e.g.Larmor
- proton Larmor frequency in MHz, e.g. 700td
- total INEPT delay in ms. e.g. 10.8R2dia
- relaxation rate for the diamagnetic sample in , e.g. 66taui
- correlation time of internal label motion in ns, e.g. 0.6, default 0.5taur
- rotational correlation time of the protein in ns, e.g. 3.7maxrate
- maximum rate enhancement in , e.g. 150, defaults to 170address
- site address, e.g.,ratio
- intensity ratio between paramagnetic and diamagnetic sample, should be between 0 and 1std
- standard deviation of the PRE ratio, optional
- Remarks
ratios above 1 are accepted and interpreted as no PRE effect
‘taui’ may be estimated from the CW EPR spectrum of the labelled sample
‘taur’ will be estimated or computed with HYDROPRO if it is not provided, this is usually better
for disordered systems, a general ‘taur’ for all conformers may be a poor approximation
if standard deviation is missing, all PRE restraints in this block have the same weight
prerates
¶
Definition of NMR paramagnetic relaxation enhancement (PRE) restraints as relaxation enhancement rates . This is a block key with lines for restraints.
prerates label larmor td R2dia [taui [taur [maxrate]]]
'address_1' 'rate' ['std']
...
.prerates
- Arguments
label
- label type, e.g.larmor
- proton Larmor frequency in MHz, e.g. 700td
- total INEPT delay in ms. e.g. 10.8, is used to convert rate to ratio for NNLLSQ fittingR2dia
- relaxation rate for the diamagnetic sample in , has no effect for rate fittingtaui
- correlation time of internal label motion in ns, e.g. 0.6, default 0.5taur
- rotational correlation time of the protein in ns, e.g. 3.7maxrate
- maximum rate enhancement in , e.g. 150, defaults to 170address
- site address, e.g.,rate
- rate enhancement in , e.g. 40std
- standard deviation of the rate enhancement, optional
- Remarks
ratios above 1 are accepted and interpreted as no PRE effect
‘taui’ may be estimated from the CW EPR spectrum of the labelled sample
‘taur’ will be estimated or computed with HYDROPRO if it is not provided
for disordered systems, a general ‘taur’ for all conformers may be a poor approximation
if standard deviation is missing, all PRE restraints in this block have the same weight
preratelinear
¶
This keyword implements legacy behavior of fitting PRE rates linearly. Use it only for replicating old fits. New default behavior is fitting the logarithm of the rate.
rmean
¶
For fitting mean distances instead of distributions. Provided for method development.
rmean
- Remarks
the key requests that mean distances instead of distance distribution restraints are fitted
do this only if you have a very good reason
sans
¶
Specifies small-angle neutron scattering restraints
sans data [resolution [deuteration]]
- Arguments
data
- name of the input scattering data file, must be a file acceptable by ‘cryson’ in the ATSAS packageresolution
- name of a resolution file, must be a file acceptable by ‘cryson’ in the ATSAS package *deuteration
- fraction of buffer deuteration, between 0 and 1, e.g. 0.66, optional
- Remarks
SANS fitting works without resolution file, but it is strongly recommended to provide one
if deuteration is not specified, natural proton abundance buffer is assumed
SANS curves are computed by the ATSAS package installed on this computer and present on the Matlab path
save
¶
Specifies a file name for saving the fitted ensemble
save file
- Arguments
file
- output file name, extension should be ‘.ens’
- Remarks
if the save key is missing, the ensemble list is saved to ‘ensemble.ens’
saxs
¶
Specifies small-angle x-ray scattering restraints
saxs data ['crysol3']
- Arguments
data
- name of the input scattering data file, must be a file acceptable by ‘crysol’ in the ATSAS package'crysol3'
- if crysol3 is specified, SAXS data are computed with this newer version
- Remarks
crysol3 uses a different algorithm for the hydration shell
fitting once with original crysol and once with crysol3 can provide an idea about uncertainty due to hydration shell modelling
SAXS curves are computed by the ATSAS package installed on this computer and present on the Matlab path
zenodo
¶
Download and possibly extract a file from Zenodo
Zenodo Zenodo_ID.filename
- Arguments
Zenodo_ID.filename
- Zenodo identifier, followed by a dot and the file name, e.g. ‘6384003.raw_superensemble_with_jackknife_ensembles.zip’
- Remarks
any file on Zenodo can be downloaded, for instance, also ‘.ens’ files
archives in ‘.zip’, ‘.gz’, ‘.tar’, and ‘.tar.gz’ formats are automatically extracted after download
this can be used together with the
addpb
orinitial
keywords for working with raw ensemble or initial ensembles stored on Zenodo