Tables for
Volume F
Crystallography of biological macromolecules
Edited by E. Arnold, D. M. Himmel and M. G. Rossmann

International Tables for Crystallography (2012). Vol. F, ch. 11.4, pp. 291-292   | 1 | 2 |

Section 11.4.10. Merging – assessment of the error model and signal magnitudes in the data

Z. Otwinowski,a* W. Minor,b D. Boreka and M. Cymborowskib

aUT Southwestern Medical Center at Dallas, 5323 Harry Hines Boulevard, Dallas, TX 75390–9038, USA, and bDepartment of Molecular Physiology and Biological Physics, University of Virginia, 1300 Jefferson Park Avenue, Charlottesville, VA 22908, USA
Correspondence e-mail:

11.4.10. Merging – assessment of the error model and signal magnitudes in the data

| top | pdf |

Proper error estimation requires the use of Bayesian reasoning and a multi-component error model (Schwarzenbach et al., 1989[link]; Evans, 1993[link]). In principle, the error estimates may be derived solely from a theoretical understanding of the measurement process. However, the complexity of error propagation and correlations between various sources of effects have led crystallographers to rely on hybrid approaches also involving self-consistency analysis of symmetry-equivalent reflections. Estimation of random errors

| top | pdf |

The random errors in DENZO are estimated by a heuristic procedure that also accounts for small components of systematic errors (Borek et al., 2003[link]). Initially, DENZO estimated errors of integrated diffraction peaks from X-ray film. After introducing detectors with larger dynamic range, the procedure was adjusted accordingly.

The initial estimates of errors are obtained by[\eqalignno{\sigma_0&={1\over \textstyle\sum_iP_i^2/(B_i+P_iI)}&\cr&\quad\times\left\{e_d\left[\displaystyle\sum_iP_i^2(B_i+P_iI)+{e_d\over n_b}\displaystyle\sum_i{P_i^2B_i\over (B_i+P_iI)^2}\right]\right\}^{1/2},&\cr&&(}]where nb is the number of pixels used in background estimation and ed is the error-density parameter defined for each instrument, which can also be overridden by the user (Gewirth, 2003[link]) with other variables defined in equation ([link]. The sums are calculated over all the pixels in a reflection profile. The expression within the braces { } describes two components of uncertainty: the left sum accounts for contributions resulting from pixels in the peak area, whereas the right sum adds an adjustment resulting from uncertainty of the background estimate. The denominator in the front of the expression in braces is derived from error propagation for the profile-fitted intensity [equation ([link]].

Next, the goodness-of-profile-fitting factor g is calculated:[g=\left[{1\over(n_i-1)}\displaystyle\sum_i{(M_i-B_i-P_iI)^2\over e_d(B_i+P_iI)}\right]^{1/2},\eqno(]where ni is the number of pixels in a reflection profile. For weak reflections the parameter g should be relatively close to 1. If it is systematically off by a large factor, the error-density parameter ed should be adjusted (Borek et al., 2003[link]). SCALEPACK applies an additional level of adjustment to the estimates produced by DENZO (Borek et al., 2003[link]):[\sigma_S=1.2\sigma_og^{1/2},\eqno(]which is scaled either by the user or by an automatically adjustable factor ES (called the error scale factor) to make disagreements among symmetry-related measurements consistent:[\sigma_I=E_S\sigma_S.\eqno(]Even this scaled estimate of random error σI does not account for all types of errors and additional adjustments for systematic effects are needed. Estimation of multiplicative errors

| top | pdf |

The multiplicative scale factor has its own uncertainty independent of random errors with typical values in the range of a few per cent. However, even such small errors are important in calculations of the phase signal. Errors in the scale factors have a correlated component that equally affects measurements of intensities in phasing differences, so it does not impact on the differences themselves. The important part is estimating the magnitude of the remaining component of scaling errors, described by σK. Comparing symmetry-related reflections estimates only the relevant component of multiplicative errors. The total scaling error would have to be estimated differently, but typically it has little relevance to macromolecular crystallography and can be ignored.

The σI [equation ([link]] can be combined with σK to obtain the final estimated error of the scaled measurement:[\sigma_E=(1/K)(\sigma_I^2+I^2\sigma_K^2)^{1/2}.\eqno(] Merging and signal validation

| top | pdf |

Symmetry-related scaled measurements I(hkl) and their uncertainty estimates σE are used to obtain merged intensities by a standard weighted averaging formula:[\langle I\rangle={\textstyle\sum_jI_j/\sigma^2_{E,j}\over \textstyle\sum_j 1/\sigma^2_{E,j}}.\eqno(]This allows for calculations of validation statistics, called goodness-of-fit or normalized χ2, for each unique index:[\chi^2=\left({1\over n-1}\right)^{1/2}\displaystyle\sum_j{(I_j-\langle I\rangle)^2\over\sigma^2_{E,j}},\eqno(]where n represents the number of observations of a given unique index. This χ2 statistic is then averaged in resolution shells or over intensity bins or batch number. If the error model accounts properly for all effects, the χ2 statistic should fluctuate around a value of unity. If χ2 values depart from this expectation it may indicate a number of possibilities, e.g. various problems at earlier stages (poorly edited beam-stop shadow, hardware failures, mis­takes in processing or other source of outliers etc.), inadequacy of the error model or variations in the structure factors within the symmetry-related observations. The instrumental problems or mistakes in processing should be corrected. The effects that cannot be corrected may be handled by adjusting the error model. However, if the more detailed analysis eliminates the obvious source for such problems, then the most likely source of discrepancies between symmetry-related measurements results from violation of Friedel symmetry. SCALEPACK cal­culates merging statistics both for the Bijvoet pairs merged together and separately. Differences in χ2 values between these two merging outputs are very reliable estimates of anomalous signal strength. When a more detailed analysis eliminates the obvious reasons for high χ2 values, the most likely remaining source of error is non-isomorphism (Borek et al., 2007[link], 2010[link]).


Borek, D., Cymborowski, M., Machius, M., Minor, W. & Otwinowski, Z. (2010). Diffraction data analysis in the presence of radiation damage. Acta Cryst. D66, 426–436.
Borek, D., Ginell, S. L., Cymborowski, M., Minor, W. & Otwinowski, Z. (2007). The many faces of radiation-induced changes. J. Synchrotron Rad. 14, 24–33.
Borek, D., Minor, W. & Otwinowski, Z. (2003). Measurement errors and their consequences in protein crystallography. Acta Cryst. D59, 2031–2038.
Evans, P. R. (1993). Data reduction: data collection and processing. In Proceedings of the CCP4 Study Weekend. Data Collection and Processing, 29–30 January, edited by L. Sawyer, N. Isaac & S. Bailey, pp. 114–123. Warrington: Daresbury Laboratory.
Gewirth, D. (2003). HKL Manual. 6th ed. HKL Research, Charlottesville, USA.
Schwarzenbach, D., Abrahams, S. C., Flack, H. D., Gonschorek, W., Hahn, Th., Huml, K., Marsh, R. E., Prince, E., Robertson, B. E., Rollett, J. S. & Wilson, A. J. C. (1989). Statistical descriptors in crystallography. Report of the IUCr Subcommittee on Statistical Descriptors. Acta Cryst. A45, 63–75.

to end of page
to top of page