International
Tables for Crystallography Volume B Reciprocal space Edited by U. Shmueli © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. B, ch. 2.1, pp. 197-199
Section 2.1.6. Distributions of sums, averages and ratios^{a}School of Chemistry, Tel Aviv University, Tel Aviv 69 978, Israel, and ^{b}St John's College, Cambridge, England |
In Section 2.1.2.1, it was shown that the average intensity of a sufficient number of reflections is [equation (2.1.2.4)]. When the number of reflections is not `sufficient', their mean value will show statistical fluctuations about ; such statistical fluctuations are in addition to any systematic variation resulting from non-independence of atomic positions, as discussed in Sections 2.1.2.1–2.1.2.3. We thus need to consider the probability density functions of sums like and averages like where is the intensity of the ith reflection. The probability density distributions are easily obtained from a property of gamma distributions: If are independent gamma-distributed variables with parameters , their sum is a gamma-distributed variable with parameter p equal to the sum of the parameters. The sum of n intensities drawn from an acentric distribution thus has the distribution the parameters of the variables added are all equal to unity, so that their sum is p. Similarly, the sum of n intensities drawn from a centric distribution has the distribution each parameter has the value of one-half. The corresponding distributions of the averages of n intensities are then for the acentric case, and for the centric. In both cases the expected value of Y is and the variances are and , respectively, just as would be expected.
Ratios like where is given by equation (2.1.6.1), and the 's are the intensities of a set of reflections (which may or may not overlap with those included in ), are used in correlating intensities measured under different conditions. They arise in correlating reflections on different layer lines from the same or different specimens, in correlating the same reflections from different crystals, in normalizing intensities to the local average or to , and in certain systematic trial-and-error methods of structure determination (see Rabinovich & Shakked, 1984, and references therein). There are three main cases:
Aside from the scale factor, in case (i) and will differ chiefly through relatively small statistical fluctuations and uncorrected systematic errors, whereas in case (ii) the differences will be relatively large because of the inherent differences in the intensities. Here we are concerned only with cases (ii) and (iii); the practical problems of case (i) are postponed to IT C (2004).
There is little in the crystallographic literature concerning the probability distribution of sums like (2.1.6.1) or ratios like (2.1.6.7); certain results are reviewed by Srinivasan & Parthasarathy (1976, ch. 5), but with a bias toward partially related structures that makes it difficult to apply them to the immediate problem.
In case (ii) ( and independent), acentric distribution, Table 2.1.5.1 gives the distribution of the ratio where is a beta distribution of the second kind, Y is given by equation (2.1.6.2) and Z by where n is the number of intensities included in the numerator and m is the number in the denominator. The expected value of is then with variance One sees that is a biased estimate of the scaling factor between two sets of intensities and the bias, of the order of , depends only on the number of intensities averaged in the denominator. This may seem odd at first sight, but it becomes plausible when one remembers that the mean of a quantity is an unbiased estimator of itself, but the reciprocal of a mean is not an unbiased estimator of the mean of a reciprocal. The mean exists only if and the variance only for .
In the centric case, the expression for the distribution of the ratio of the two means Y and Z becomes with the expected value of equal to and with its variance equal to For the same number of reflections, the bias in and the variance for the centric distribution are considerably larger than for the acentric. For both distributions the variance of the scaling factor approaches zero when n and m become large. The variances are large for m small, in fact `infinite' if the number of terms averaged in the denominator is sufficiently small. These biases are readily removed by multiplying by or . Many methods of estimating scaling factors – perhaps most – also introduce bias (Wilson, 1975; Lomer & Wilson, 1975; Wilson, 1976, 1978c) that is not so easily removed. Wilson (1986a) has given reasons for supposing that the bias of the ratio (2.1.6.7) approximates to whatever the intensity distribution. Equations (2.1.6.12) and (2.1.6.15) are consistent with this.
When the 's are a subset of the 's, the beta distributions of the second kind are replaced by beta distributions of the first kind, with means and variances readily found from Table 2.1.5.1. The distribution of such a ratio is chiefly of interest when Y relates to a single reflection and Z relates to a group of m intensities including Y. This corresponds to normalizing intensities to the local average. Its distribution is in the acentric case, with an expected value of of unity; there is no bias, as is obvious a priori. The variance of is which is less than the variance of the intensities normalized to an `infinite' population by a fraction of the order of . Unlike the variance of the scaling factor, the variance of the normalized intensity approaches unity as n becomes large. For intensities having a centric distribution, the distribution normalized to the local average is given by with an expected value of of unity and with variance less than that for an `infinite' population by a fraction of about .
Similar considerations apply to intensities normalized to in the usual way, since they are equal to those normalized to multiplied by .
Since and [equations (2.1.6.1) and (2.1.6.8)] are sums of identically distributed variables conforming to the conditions of the central-limit theorem, it is tempting to approximate their distributions by normal distributions with the correct mean and variance. This would be reasonably satisfactory for the distributions of and themselves for quite small values of n and m, but unsatisfactory for the distribution of their ratio for any values of n and m, even large. The ratio of two variables with normal distributions is notorious for its rather indeterminate mean and `infinite' variance, resulting from the `tail' of the denominator distributions extending through zero to negative values. The leading terms of the ratio distribution are given by Kendall & Stuart (1977, p. 288).
References
International Tables for Crystallography (2004). Vol. C. Mathematical, physical and chemical tables, edited by E. Prince. Dordrecht: Kluwer Academic Publishers.Kendall, M. & Stuart, A. (1977). The advanced theory of statistics, Vol. 1, 4th ed. London: Griffin.
Lomer, T. R. & Wilson, A. J. C. (1975). Scaling of intensities. Acta Cryst. B31, 646–647.
Rabinovich, D. & Shakked, Z. (1984). A new approach to structure determination of large molecules by multi-dimensional search methods. Acta Cryst. A40, 195–200.
Srinivasan, R. & Parthasarathy, S. (1976). Some statistical applications in X-ray crystallography. Oxford: Pergamon Press.
Wilson, A. J. C. (1975). Effect of neglect of dispersion on apparent scale and temperature factors. In Anomalous scattering, edited by S. Ramaseshan & S. C. Abrahams, pp. 325–332. Copenhagen: Munksgaard.
Wilson, A. J. C. (1976). Statistical bias in least-squares refinement. Acta Cryst. A32, 994–996.
Wilson, A. J. C. (1978c). Statistical bias in scaling factors: Erratum. Acta Cryst. B34, 1749.
Wilson, A. J. C. (1986a). Distributions of sums and ratios of sums of intensities. Acta Cryst. A42, 334–339.