Tables for
Volume F
Crystallography of biological macromolecules
Edited by E. Arnold, D. M. Himmel and M. G. Rossmann

International Tables for Crystallography (2012). Vol. F, ch. 15.3, pp. 407-408   | 1 | 2 |

Section 15.3.3. Preparation of input data

K. D. Cowtan,a* K. Y. J. Zhangb and P. Mainc

aDepartment of Chemistry, University of York, York YO1 5DD, England,bDivision of Basic Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N., Seattle, WA 90109, USA, and cDepartment of Physics, University of York, York YO1 5DD, England
Correspondence e-mail:

15.3.3. Preparation of input data

| top | pdf |

Input data are provided by two routes: numerical parameters, such as solvent content and averaging operators, are included in the command file using appropriate keywords, whereas reflections and masks are referenced by giving their file names on the command line. In the simplest case; for example a solvent-flattening and histogram-matching calculation, all that is required is an initial reflection file and an estimate of the solvent content.

Use all available data: The reflection file must be in CCP4 `MTZ' format, and contain at least the structure-factor amplitudes, phase estimates and figures of merit. If the phase estimates are obtained from a homologous structure by molecular replacement, the figures of merit can be generated by the SIGMAA program (Read, 1986[link]). When the phases are estimated using a single isomorphous derivative (SIR), it is recommended that Hendrickson–Lattman coefficients (Hendrickson & Lattman, 1970[link]) are used to represent the phase estimate instead of the figure of merit. Hendrickson–Lattman coefficients can represent the bimodal distribution of the SIR phases, whereas the figure of merit can only represent the unimodal distribution of the average of two equally probable phase choices. It is recommended that a reflection file containing every possible reflection is used. The low-resolution data should be included since they provide a significant amount of information on the protein–solvent boundary. The high-resolution data without phase estimates should also be included since their phases can be estimated by DM. Phase extension can usually improve the original phases further compared to phase refinement only. Unobserved reflections are marked by a missing number flag. This is important for the preservation of the free-R reflections. It also enables DM to extrapolate missing reflections from density constraints and increases the phase improvement power.

The estimation of solvent content: The solvent content, [C_{\rm solv}], can be obtained by various experimental methods, such as the solvent dehydration method and the deuterium exchange method (Matthews, 1974[link]). It can also be estimated through [C_{\rm solv} = 1 - (NV_{a}ML/V). \eqno(]Here, N is the total number of atoms, including hydrogen atoms, in one protein molecule. Va is the average volume occupied by each atom, which is estimated to be approximately 10 Å3 (Matthews, 1968[link]). M is the number of molecules per asymmetric unit. L is the number of asymmetric units in the cell. V is the unit-cell volume. The correctly estimated solvent content should be entered in the program with the SOLC keyword, since this will be used not only to find the solvent–protein boundary but also to scale the input structure-factor amplitudes. If it is desirable to use a more conservative solvent mask in order to prevent clipping of protein densities, especially in the flexible loop regions, different solvent and protein fractions should be specified using the SOLMASK keyword.

Solvent mask: A solvent mask may be supplied; it may be used for the entire calculation or updated after several cycles. The solvent mask usually divides the cell into protein and solvent regions; however it is also possible to specify excluded regions which are unknown. If no solvent mask is supplied, it will be calculated by a modified Wang–Leslie procedure (Wang, 1985[link]; Leslie, 1987[link]) and updated as the phase-improvement calculation progresses.

Averaging operators: In an averaging calculation, the averaging operators must be supplied; these are typically obtained by rotation and translation searches using a program such as AMoRe (Navaza, 1994[link]) or X-PLOR (Brünger, 1992[link]). If the coordinates of several heavy atoms are known, they can be used to calculate the noncrystallographic symmetry (NCS) operators. If a partial model can be built into the density, structure-superposition programs, such as LSQKAB (Kabsch, 1976[link]), can be used to obtain the rotation and translation matrices that relate different molecules in the asymmetric unit. This can also be achieved through the program O using the `lsq_explicit' command (Jones et al., 1991[link]). The averaging operators can be further refined in DM by minimizing the residual between NCS related densities.

Averaging mask: An averaging mask may be supplied; this is distinct from the solvent mask, allowing for parts of the protein to remain unaveraged if required. If no averaging mask is supplied, the mask will be calculated by a local-correlation approach (Cowtan & Main, 1998[link]; Vellieux et al., 1995[link]). If multiple domains are to be averaged with different averaging operators (Schuller, 1996[link]), then one mask must be specified for each averaging domain. When averaging molecules related by improper NCS operations, the averaging mask must be in accord with the NCS operators provided. For example, if the supplied NCS matrix maps molecule A to molecule B, then the averaging mask must cover the volume occupied by molecule A rather than molecule B.

Multi-crystal averaging: In the case of a multi-crystal averaging calculation, one reflection file is provided for each crystal form (however, initial phases are not required in every crystal form), and one reflection file will be output for each crystal form containing the improved phases. One mask is required per averaging domain; thus, in general, only a single mask is required. This may be defined for any crystal form or in an arbitrary crystal space of its own. Averaging operators are then provided to map the mask into each of the crystal forms.

Solvent and averaging masks that are calculated within the program may be output for subsequent analysis. Refined averaging operators are also output. The input and output data for a simple DM calculation, a DM averaging calculation and a DMMULTI multi-crystal averaging calculation are shown in Figs.[link], (b)[link] and (c)[link], respectively.


Figure | top | pdf |

(a) Input and output data for a DM calculation with no averaging. Light outlines indicate optional information. (b) Input and output data for a DM averaging calculation: for a single averaging domain, the averaging mask may be calculated automatically. For multi-domain averaging, all domain masks must be given. (c) Input and output data for DMMULTI. An averaging mask (or masks, for multiple domains) must be provided.


Brünger, A. T. (1992). X-PLOR. Version 3.1. A system for X-ray crystallography and NMR. Yale University Press, New Haven.
Cowtan, K. D. & Main, P. (1998). Miscellaneous algorithms for density modification. Acta Cryst. D54, 487–493.
Hendrickson, W. A. & Lattman, E. E. (1970). Representation of phase probability distributions for simplified combination of independent phase information. Acta Cryst. B26, 136–143.
Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Cryst. A47, 110–119.
Kabsch, W. (1976). A solution for the best rotation to relate two sets of vectors. Acta Cryst. A32, 922–923.
Leslie, A. G. W. (1987). A reciprocal-space method for calculating a molecular envelope using the algorithm of B. C. Wang. Acta Cryst. A43, 134–136.
Matthews, B. W. (1968). Solvent content of protein crystals. J. Mol. Biol. 33, 491–497.
Matthews, B. W. (1974). Determination of molecular weight from protein crystals. J. Mol. Biol. 82, 513–526.
Navaza, J. (1994). AMoRe: an automated package for molecular replacement. Acta Cryst. A50, 157–163.
Read, R. J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Cryst. A42, 140–149.
Schuller, D. J. (1996). MAGICSQUASH: more versatile non-crystallographic averaging with multiple constraints. Acta Cryst. D52, 425–434.
Vellieux, F. M. D. A. P., Hunt, J. F., Roy, S. & Read, R. J. (1995). DEMON/ANGEL: a suite of programs to carry out density modification. J. Appl. Cryst. 28, 347–351.
Wang, B. C. (1985). Resolution of phase ambiguity in macromolecular crystallography. Methods Enzymol. 115, 90–112.

to end of page
to top of page