International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

International Tables for Crystallography (2010). Vol. B, ch. 2.1, pp. 202-203   | 1 | 2 |

Section 2.1.6. Distributions of sums, averages and ratios

U. Shmuelia* and A. J. C. Wilsonb

aSchool of Chemistry, Tel Aviv University, Tel Aviv 69 978, Israel, and bSt John's College, Cambridge, England
Correspondence e-mail:  ushmueli@post.tau.ac.il

2.1.6. Distributions of sums, averages and ratios

| top | pdf |

2.1.6.1. Distributions of sums and averages

| top | pdf |

In Section 2.1.2.1[link], it was shown that the average intensity of a sufficient number of reflections is [\Sigma] [equation (2.1.2.4)[link]]. When the number of reflections is not `sufficient', their mean value will show statistical fluctuations about [\Sigma]; such statistical fluctuations are in addition to any systematic variation resulting from non-independence of atomic positions, as discussed in Sections 2.1.2.1–2.1.2.3[link][link][link]. We thus need to consider the probability density functions of sums like [J_{n} = \textstyle\sum\limits_{i = 1}^{n}G_{i}, \eqno(2.1.6.1)]and averages like [Y = J_{n}/n, \eqno(2.1.6.2)]where [G_{i}] is the intensity of the ith reflection. The probability density distributions are easily obtained from a property of gamma distributions: If [x_{1}, x_{2}, \ldots, x_{n}] are independent gamma-distributed variables with parameters [p_{1}, p_{2}, \ldots, p_{n}], their sum is a gamma-distributed variable with parameter p equal to the sum of the parameters. The sum of n intensities drawn from an acentric distribution thus has the distribution [p(J_{n})\,{\rm d}J_{n} = \gamma_{n}(J_{n}/\Sigma)\,{\rm d}(J_{n}/\Sigma)\semi \eqno(2.1.6.3)]the parameters of the variables added are all equal to unity, so that their sum is p. Similarly, the sum of n intensities drawn from a centric distribution has the distribution [p(J_{n})\,{\rm d}J_{n} = \gamma_{n/2}[J_{n}/(2\Sigma)]\,{\rm d}[J_{n}/(2\Sigma)]\semi \eqno(2.1.6.4)]each parameter has the value of one-half. The corresponding distributions of the averages of n intensities are then [p(Y)\,{\rm d}Y = \gamma_{n}(nY/\Sigma)\,{\rm d}(nY/\Sigma) \eqno(2.1.6.5)]for the acentric case, and [p(Y)\,{\rm d}Y = \gamma_{n/2}[nY/(2\Sigma)]\,{\rm d}[nY/(2\Sigma)] \eqno(2.1.6.6)]for the centric. In both cases the expected value of Y is [\Sigma] and the variances are [\Sigma^{2}/n] and [2\Sigma^{2}/n], respectively, just as would be expected.

2.1.6.2. Distribution of ratios

| top | pdf |

Ratios like [S_{n, \, m} = J_{n}/K_{m}, \eqno(2.1.6.7)]where [J_{n}] is given by equation (2.1.6.1)[link], [K_{m} = \textstyle\sum\limits_{j = 1}^{m}H_{j}, \eqno(2.1.6.8)]and the [H_{j}]'s are the intensities of a set of reflections (which may or may not overlap with those included in [J_{n}]), are used in correlating intensities measured under different conditions. They arise in correlating reflections on different layer lines from the same or different specimens, in correlating the same reflections from different crystals, in normalizing intensities to the local average or to [\Sigma], and in certain systematic trial-and-error methods of structure determination (see Rabinovich & Shakked, 1984[link], and references therein). There are three main cases:

  • (i) [G_{i}] and [H_{i}] refer to the same reflection; for example, they might be the observed and calculated quantities for the [hkl] reflection measured under different conditions or for different crystals of the same substance; or

  • (ii) [G_{i}] and [H_{i}] are unrelated; for example, the observed and calculated values for the [hkl] reflection for a completely wrong trial structure, of values for entirely different reflections, as in reducing photographic measurements on different layer lines to the same scale; or

  • (iii) the [G_{i}]'s are a subset of the [H_{i}]'s, so that [G_{i} = H_{i}] for [i \,\lt\, n] and [m \,\gt\, n].

Aside from the scale factor, in case (i)[link] [G_{i}] and [H_{i}] will differ chiefly through relatively small statistical fluctuations and uncorrected systematic errors, whereas in case (ii)[link] the differences will be relatively large because of the inherent differences in the intensities. Here we are concerned only with cases (ii)[link] and (iii)[link]; the practical problems of case (i)[link] are postponed to IT C (2004[link]), Chapter 7.5[link] .

There is little in the crystallographic literature concerning the probability distribution of sums like (2.1.6.1)[link] or ratios like (2.1.6.7)[link]; certain results are reviewed by Srinivasan & Parthasarathy (1976[link], ch. 5), but with a bias toward partially related structures that makes it difficult to apply them to the immediate problem.

In case (ii)[link] ([G_{i}] and [H_{i}] independent), acentric distribution, Table 2.1.5.1[link] gives the distribution of the ratio [u = nY/(mZ) \eqno(2.1.6.9)][p(u)\,{\rm d}u = \beta_{2}[nY/(mZ)\hbox{; } n, m]\,{\rm d}[nY/(mZ)], \eqno(2.1.6.10)]where [\beta_{2}] is a beta distribution of the second kind, Y is given by equation (2.1.6.2)[link] and Z by [Z = K_{m}/m, \eqno(2.1.6.11)]where n is the number of intensities included in the numerator and m is the number in the denominator. The expected value of [Y/Z] is then [\langle Y/Z \rangle = {{m}\over{m-1}} = 1+{{1}\over{m}}+\ldots \eqno(2.1.6.12)]with variance [\sigma^{2} = {{(n+m-1)m^{2}}\over{(m-1)^{2}(m-2)n}}. \eqno(2.1.6.13)]One sees that [Y/Z] is a biased estimate of the scaling factor between two sets of intensities and the bias, of the order of [m^{-1}], depends only on the number of intensities averaged in the denominator. This may seem odd at first sight, but it becomes plausible when one remembers that the mean of a quantity is an unbiased estimator of itself, but the reciprocal of a mean is not an unbiased estimator of the mean of a reciprocal. The mean exists only if [m \,\gt\, 1] and the variance only for [m \,\gt\, 2].

In the centric case, the expression for the distribution of the ratio of the two means Y and Z becomes [p(u)\,{\rm d}u = \beta_{2}[nY/(mZ)\semi \ n/2, m/2]\,{\rm d}[nY/(mZ)] \eqno(2.1.6.14)]with the expected value of [Y/Z] equal to [\langle Y/Z \rangle = {{m}\over{m-2}} = 1+{{2}\over{m}}+\ldots \eqno(2.1.6.15)]and with its variance equal to [\sigma^{2} = {{2(n+m-2)m^{2}}\over{(m-2)^{2}(m-4)n}}. \eqno(2.1.6.16)]For the same number of reflections, the bias in [\langle Y/Z \rangle] and the variance for the centric distribution are considerably larger than for the acentric. For both distributions the variance of the scaling factor approaches zero when n and m become large. The variances are large for m small, in fact `infinite' if the number of terms averaged in the denominator is sufficiently small. These biases are readily removed by multiplying [Y/Z] by [(m-1)/m] or [(m-2)/m]. Many methods of estimating scaling factors – perhaps most – also introduce bias (Wilson, 1975[link]; Lomer & Wilson, 1975[link]; Wilson, 1976[link], 1978c[link]) that is not so easily removed. Wilson (1986a[link]) has given reasons for supposing that the bias of the ratio (2.1.6.7)[link] approximates to [1+{{\sigma^{2}(I)}\over{m\langle I \rangle^{2}}}, \eqno(2.1.6.17)]whatever the intensity distribution. Equations (2.1.6.12)[link] and (2.1.6.15)[link] are consistent with this.

2.1.6.3. Intensities scaled to the local average

| top | pdf |

When the [G_{i}]'s are a subset of the [H_{i}]'s, the beta distributions of the second kind are replaced by beta distributions of the first kind, with means and variances readily found from Table 2.1.5.1[link]. The distribution of such a ratio is chiefly of interest when Y relates to a single reflection and Z relates to a group of m intensities including Y. This corresponds to normalizing intensities to the local average. Its distribution is [p(I/\langle I \rangle)\,{\rm d}(I/\langle I \rangle) = \beta_{1}(I/n\langle I \rangle\semi\ 1,n-1)\,{\rm d}(I/n\langle I \rangle) \eqno(2.1.6.18)]in the acentric case, with an expected value of [I/\langle I \rangle] of unity; there is no bias, as is obvious a priori. The variance of [I/\langle I \rangle] is [\sigma^{2} = {{n-1}\over{n+1}}, \eqno(2.1.6.19)]which is less than the variance of the intensities normalized to an `infinite' population by a fraction of the order of [2/n]. Unlike the variance of the scaling factor, the variance of the normalized intensity approaches unity as n becomes large. For intensities having a centric distribution, the distribution normalized to the local average is given by [{p(I/\langle I \rangle)\,{\rm d}(I/\langle I \rangle) = \beta_{1}[I/n\langle I \rangle\semi\ 1/2,(n-1)/2]\,{\rm d}(I/n\langle I \rangle),} \eqno(2.1.6.20)]with an expected value of [I/\langle I \rangle] of unity and with variance [\sigma^{2} = {{2(n-1)}\over{n+2}}, \eqno(2.1.6.21)]less than that for an `infinite' population by a fraction of about [3/n].

Similar considerations apply to intensities normalized to Σ in the usual way, since they are equal to those normalized to [\langle I \rangle] multiplied by [\langle I \rangle/\Sigma].

2.1.6.4. The use of normal approximations

| top | pdf |

Since [J_{n}] and [K_{m}] [equations (2.1.6.1)[link] and (2.1.6.8)[link]] are sums of identically distributed variables conforming to the conditions of the central-limit theorem, it is tempting to approximate their distributions by normal distributions with the correct mean and variance. This would be reasonably satisfactory for the distributions of [J_{n}] and [K_{m}] themselves for quite small values of n and m, but unsatisfactory for the distribution of their ratio for any values of n and m, even large. The ratio of two variables with normal distributions is notorious for its rather indeterminate mean and `infinite' variance, resulting from the `tail' of the denominator distributions extending through zero to negative values. The leading terms of the ratio distribution are given by Kendall & Stuart (1977[link], p. 288).

References

International Tables for Crystallography (2004). Vol. C, Mathematical, Physical and Chemical Tables, edited by E. Prince. Dordrecht: Kluwer Academic Publishers.
Kendall, M. & Stuart, A. (1977). The Advanced Theory of Statistics, Vol. 1, 4th ed. London: Griffin.
Lomer, T. R. & Wilson, A. J. C. (1975). Scaling of intensities. Acta Cryst. B31, 646–647.
Rabinovich, D. & Shakked, Z. (1984). A new approach to structure determination of large molecules by multi-dimensional search methods. Acta Cryst. A40, 195–200.
Srinivasan, R. & Parthasarathy, S. (1976). Some Statistical Applications in X-ray Crystallography. Oxford: Pergamon Press.
Wilson, A. J. C. (1975). Effect of neglect of dispersion on apparent scale and temperature factors. In Anomalous Scattering, edited by S. Ramaseshan & S. C. Abrahams, pp. 325–332. Copenhagen: Munksgaard.
Wilson, A. J. C. (1976). Statistical bias in least-squares refinement. Acta Cryst. A32, 994–996.
Wilson, A. J. C. (1978c). Statistical bias in scaling factors: Erratum. Acta Cryst. B34, 1749.
Wilson, A. J. C. (1986a). Distributions of sums and ratios of sums of intensities. Acta Cryst. A42, 334–339.








































to end of page
to top of page