International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by E. Arnold, D. M. Himmel and M. G. Rossmann © International Union of Crystallography 2012 |
International Tables for Crystallography (2012). Vol. F, ch. 11.4, pp. 291-292
Section 11.4.10. Merging – assessment of the error model and signal magnitudes in the data^{a}UT Southwestern Medical Center at Dallas, 5323 Harry Hines Boulevard, Dallas, TX 75390–9038, USA, and ^{b}Department of Molecular Physiology and Biological Physics, University of Virginia, 1300 Jefferson Park Avenue, Charlottesville, VA 22908, USA |
Proper error estimation requires the use of Bayesian reasoning and a multi-component error model (Schwarzenbach et al., 1989; Evans, 1993). In principle, the error estimates may be derived solely from a theoretical understanding of the measurement process. However, the complexity of error propagation and correlations between various sources of effects have led crystallographers to rely on hybrid approaches also involving self-consistency analysis of symmetry-equivalent reflections.
The random errors in DENZO are estimated by a heuristic procedure that also accounts for small components of systematic errors (Borek et al., 2003). Initially, DENZO estimated errors of integrated diffraction peaks from X-ray film. After introducing detectors with larger dynamic range, the procedure was adjusted accordingly.
The initial estimates of errors are obtained bywhere n_{b} is the number of pixels used in background estimation and e_{d} is the error-density parameter defined for each instrument, which can also be overridden by the user (Gewirth, 2003) with other variables defined in equation (11.4.7.1). The sums are calculated over all the pixels in a reflection profile. The expression within the braces { } describes two components of uncertainty: the left sum accounts for contributions resulting from pixels in the peak area, whereas the right sum adds an adjustment resulting from uncertainty of the background estimate. The denominator in the front of the expression in braces is derived from error propagation for the profile-fitted intensity [equation (11.4.7.3)].
Next, the goodness-of-profile-fitting factor g is calculated:where n_{i} is the number of pixels in a reflection profile. For weak reflections the parameter g should be relatively close to 1. If it is systematically off by a large factor, the error-density parameter e_{d} should be adjusted (Borek et al., 2003). SCALEPACK applies an additional level of adjustment to the estimates produced by DENZO (Borek et al., 2003):which is scaled either by the user or by an automatically adjustable factor E_{S} (called the error scale factor) to make disagreements among symmetry-related measurements consistent:Even this scaled estimate of random error σ_{I} does not account for all types of errors and additional adjustments for systematic effects are needed.
The multiplicative scale factor has its own uncertainty independent of random errors with typical values in the range of a few per cent. However, even such small errors are important in calculations of the phase signal. Errors in the scale factors have a correlated component that equally affects measurements of intensities in phasing differences, so it does not impact on the differences themselves. The important part is estimating the magnitude of the remaining component of scaling errors, described by σ_{K}. Comparing symmetry-related reflections estimates only the relevant component of multiplicative errors. The total scaling error would have to be estimated differently, but typically it has little relevance to macromolecular crystallography and can be ignored.
The σ_{I} [equation (11.4.10.4)] can be combined with σ_{K} to obtain the final estimated error of the scaled measurement:
Symmetry-related scaled measurements I(hkl) and their uncertainty estimates σ_{E} are used to obtain merged intensities by a standard weighted averaging formula:This allows for calculations of validation statistics, called goodness-of-fit or normalized χ^{2}, for each unique index:where n represents the number of observations of a given unique index. This χ^{2} statistic is then averaged in resolution shells or over intensity bins or batch number. If the error model accounts properly for all effects, the χ^{2} statistic should fluctuate around a value of unity. If χ^{2} values depart from this expectation it may indicate a number of possibilities, e.g. various problems at earlier stages (poorly edited beam-stop shadow, hardware failures, mistakes in processing or other source of outliers etc.), inadequacy of the error model or variations in the structure factors within the symmetry-related observations. The instrumental problems or mistakes in processing should be corrected. The effects that cannot be corrected may be handled by adjusting the error model. However, if the more detailed analysis eliminates the obvious source for such problems, then the most likely source of discrepancies between symmetry-related measurements results from violation of Friedel symmetry. SCALEPACK calculates merging statistics both for the Bijvoet pairs merged together and separately. Differences in χ^{2} values between these two merging outputs are very reliable estimates of anomalous signal strength. When a more detailed analysis eliminates the obvious reasons for high χ^{2} values, the most likely remaining source of error is non-isomorphism (Borek et al., 2007, 2010).
References
Borek, D., Cymborowski, M., Machius, M., Minor, W. & Otwinowski, Z. (2010). Diffraction data analysis in the presence of radiation damage. Acta Cryst. D66, 426–436.Borek, D., Ginell, S. L., Cymborowski, M., Minor, W. & Otwinowski, Z. (2007). The many faces of radiation-induced changes. J. Synchrotron Rad. 14, 24–33.
Borek, D., Minor, W. & Otwinowski, Z. (2003). Measurement errors and their consequences in protein crystallography. Acta Cryst. D59, 2031–2038.
Evans, P. R. (1993). Data reduction: data collection and processing. In Proceedings of the CCP4 Study Weekend. Data Collection and Processing, 29–30 January, edited by L. Sawyer, N. Isaac & S. Bailey, pp. 114–123. Warrington: Daresbury Laboratory.
Gewirth, D. (2003). HKL Manual. 6th ed. HKL Research, Charlottesville, USA.
Schwarzenbach, D., Abrahams, S. C., Flack, H. D., Gonschorek, W., Hahn, Th., Huml, K., Marsh, R. E., Prince, E., Robertson, B. E., Rollett, J. S. & Wilson, A. J. C. (1989). Statistical descriptors in crystallography. Report of the IUCr Subcommittee on Statistical Descriptors. Acta Cryst. A45, 63–75.