International
Tables for Crystallography Volume B Reciprocal space Edited by U. Shmueli © International Union of Crystallography 2006 
International Tables for Crystallography (2006). Vol. B, ch. 2.1, pp. 190209
https://doi.org/10.1107/97809553602060000554 Chapter 2.1. Statistical properties of the weighted reciprocal lattice^{a}School of Chemistry, Tel Aviv University, Tel Aviv 69 978, Israel, and ^{b}St John's College, Cambridge, England This chapter has two purposes: (i) it gives an introduction to the principles of probability, which plays an important role in most methods of structure determination, and (ii) it describes the application of such techniques to the first phase of structure determination – the resolution of spacegroup ambiguities. The introduction in Section 2.1.1 justifies briefly the application of statistical methods to the structurefactor function. In Sections 2.1.2 and 2.1.3, a discussion is presented of the average intensity of general reflections and of zones and rows, with particular attention to the relation of these averages to symmetry elements present in the crystal. This is followed in Section 2.1.4 by mathematical preliminaries to the basics of the calculus of probabilities and the central limit theorem is introduced. In Section 2.1.5, probability density functions (p.d.f.'s) are derived which allow one to determine whether or not the crystal is centrosymmetric. These p.d.f.'s are widely applicable to structures with a large number of not too dissimilar atoms in the asymmetric unit. Such functions for structures containing dispersive scatterers and noncrystallographic centres of symmetry are also presented, and distributions of sums, averages and ratios of intensities are discussed in Section 2.1.6. All these p.d.f.'s are based on the central limit theorem and are termed ideal p.d.f.'s. However, these ideal p.d.f.'s are no longer applicable when an outstandingly heavy atom is present in the asymmetric unit. Two approaches are discussed in this chapter which may resolve this difficulty: (i) the correctionfactor approach (Section 2.1.7), well known from classical probability as Gram–Charlier or Edgeworth p.d.f.'s, and (ii) the Fourier method (Section 2.1.8), only recently introduced to crystallography. The Gram–Charlier p.d.f.'s depend on even moments of the magnitude of the structure factor. General expressions for such moments of all orders are given for space groups of low symmetry, and the first four even moments are tabulated for all 230 space groups. The Fourier p.d.f.'s depend on their characteristic functions, or their Fourier transforms. Expressions for the atomic contributions to the characteristic function are tabulated for space groups up to and including the cubic space group . Higher cubic space groups can be satisfactorily treated with the correctionfactor p.d.f.'s. A comparison of these two nonideal methods indicates that in the instances in which departures from the central limit theorem predictions are large, the Fourier method is definitely superior to the correctionfactor method. 
The structure factor of the reflection is given by where is the atomic scattering factor [complex if there is appreciable dispersion; see Chapter 1.2 and IT C (2004, Section 4.2.6 )], are the fractional coordinates of the jth atom and N is the number of atoms in the unit cell. The present chapter is concerned with the statistical properties of the structure factor F and the intensity , such as their average values, variances, higher moments and their probability density distributions.
Equation (2.1.1.1) expresses F as a function of two conceptually different sets of variables: taking on integral values in reciprocal space and in general having nonintegral values in direct space, although the special positions tabulated for each space group in IT A (2005) may include the integers 0 and 1. In special positions, the nonintegers often include rational fractions, but in general positions they are in principle irrational. Although and appear to be symmetrical variables in (2.1.1.1), these limitations on their values mean that one can consider two different sets of statistical properties. In the first we seek, for example, the average intensity of the reflection (indices fixed) as the positional parameters of the N atoms are distributed with equal probability over the continuous range 0–1. In the second, we seek, for example, the average intensity of the observable reflections (or of a subgroup of them having about the same value of ) with the values of held constant at the values they have, or are postulated to have, in a crystal structure. Other examples are obtained by substituting the words `probability density' for `average intensity'. For brevity, we may call the statistics resulting from the first process fixedindex (continuously variable parameters being understood), and those resulting from the second process fixedparameter (integral indices being understood). Theory based on the first process is (comparatively) easy; theory based on the second hardly exists, although there is a good deal of theory concerning the conditions under which the two processes will lead to the same result (Hauptman & Karle, 1953; Giacovazzo, 1977, 1980). Mathematically, of course, the condition is that the phase angle should be distributed with uniform probability over the range 0–2π, whichever set of variables is regarded as fixed, but it is not clear when this distribution can be expected in practice for fixedparameter averaging. The usual conclusion is that the uniform distribution will be realized if there are enough atoms, if the atomic coordinates do not approximate to rational fractions, if there are enough reflections and if stereochemical effects are negligible (Shmueli et al., 1984).
Obviously, the second process (fixed parameters, varying integral indices) corresponds to the observable reality, and various approximations to it have been attempted, in preference to assuming its equivalence with the first. For example, a third (approximate) method of averaging has been used (Wilson, 1949, 1981): are held fixed and are treated as continuous variables.
The process may be illustrated by evaluating, or attempting to evaluate, the average intensity of reflection by the three processes. The intensity of reflection is given by multiplying equation (2.1.1.1) by its complex conjugate: where is the sum of the squares of the moduli of the atomic scattering factors. Wilson (1942) argued, without detailed calculation, that the average value of the exponential term would be zero and hence that Averaging equation (2.1.2.3) for fixed, ranging uniformly over the unit cell – the first process described above – gives this result identically, without complication or approximation. Ordinarily the second process cannot be carried out. We can, however, postulate a special case in which it is possible. We take a homoatomic structure and before averaging we correct the f's for temperature effects and the falloff with , so that is the same for all the atoms and is independent of . If the range of over which the expression for I has to be averaged is taken as a parallelepiped in reciprocal space with h ranging from to , k from to , l from to , equation (2.1.2.2) can be factorized into the product of the sums of three geometrical progressions. Algebraic manipulation then easily leads to where , and . The terms with give , but the remaining terms are not zero. Because of the periodic nature of the trigonometric terms, the effective coordinate differences are never greater than 0.5 and in a structure of any complexity there will be many much less than 0.5. For , in fact, becomes the square of the modulus of the sum of the atomic scattering factors, where and not the sum of the squares of their moduli; for larger , rapidly decreases to and then oscillates about that value. Wilson (1949, especially Section 2.1.1) suggested that the regions of averaging should be chosen so that at least one index of every reflection is if is to be identified with , and this has proven to be a useful ruleofthumb.
The third process of averaging replaces the sum over integral values of the indices by an integration over continuous values, the appropriate values of the limits in this example being to . The effect is to replace the sines in the denominators, but not in the numerators, of equation (2.1.2.6) by their arguments, and this is equivalent to the approximation in the denominators only. This is a good approximation for atoms close together in the structure and thus giving the largest terms in the sums in equation (2.1.2.6), and gives the correct sign and order of magnitude even for x having its maximum value of .
The preceding section has used mathematical arguments. From a physical point of view, the radiation diffracted by atoms that are resolved will interfere destructively, so that the resulting intensity will be the sum of the intensities diffracted by individual atoms, whereas that from completely unresolved atoms will interfere constructively, so that amplitudes rather than intensities add. In intermediate cases there will be partial constructive interference. Resolution in accordance with the Rayleigh (1879) criterion requires that should be greater than half the reciprocal of the minimum interatomic distance in the crystal (Wilson, 1979); full resolution requires a substantial multiple of this. This criterion is essentially equivalent to that proposed from the study of a special case of the second process in the preceding section.
In organic compounds there are very many interatomic distances of about 1.5 or 1.4 Å. Adoption of the preceding criterion would mean that the inner portion of the region of reciprocal space accessible by the use of copper radiation is not within the sphere of intensity statistics based on fixedindex (first process) averaging. No substantial results are available for fixedparameter (second process) averaging, and very few from the approximation to it (third process).
To the extent to which the third process is acceptable, an approximation to the variation of with is obtainable. The exponent in equation (2.1.2.2) can be written as where s is the radial distance in reciprocal space, is the distance from the jth to the kth atom and is the angle between the vectors s and r. Averaging over a sphere of radius s, with treated as the colatitude, gives This is the familiar Debye expression. It has the correct limits for s zero and s large, and is in accord with the argument from resolution.
In the preceding discussion there has been a tacit assumption that the lattice is primitive. A centred crystal can always be referred to a primitive lattice and if this is done no change is required. If the centred lattice is retained, many reflections are identically zero and the intensity of the nonzero reflections is enhanced by a factor of two (I and C lattices) or four (F lattice), so that the average intensity of all the reflections, zero and nonzero taken together, is unchanged.
Other symmetry elements affect only zones and rows of reflections, and so do not affect the general average when the total number of reflections is large. Their effect on zones and rows is discussed in Section 2.1.3.
Symmetry elements can be divided into two types: those that cause systematic absences and those that do not. Those producing systematic absences (glide planes and screw axes) produce at the same time groups of reflections (confined to zones and rows in reciprocal space, respectively) with an average intensity an integral^{1} multiple of the general average. The effects for single symmetry elements of this type are given in Table 2.1.3.1 for the general reflections and separately for any zones and rows that are affected. The `average multipliers' are given in the column headed ; `distribution' and `distribution parameters' are treated in Section 2.1.5. As for the centring, the fraction of reflections missing and the integer multiplying the average are related in such a way that the overall intensity is unchanged. The mechanism for compensation for the reflections with enhanced intensity is obvious.

Certain symmetry elements not producing absences (mirror planes and rotation axes) cause equivalent atoms to coincide in a plane or a line projection and hence produce a zone or row in reciprocal space for which the average intensity is an integral multiple of the general average (Wilson, 1950); the effects of single such symmetry elements are given in Table 2.1.3.2. There is, however, no obvious mechanism for compensation for this enhancement. When reflections are few this may be an important matter in assigning an approximate absolute scale by comparing observed and calculated intensities. Wilson (1964), Nigam (1972) and Nigam & Wilson (1980), noting that in such cases the finite size of atoms results in forbidden ranges of positional parameters, have shown that there is a diminution of the intensity of layers (rows) in the immediate neighbourhood of the enhanced zones (rows), just sufficient to compensate for the enhancement. In forming general averages, therefore, reflections from enhanced zones or rows should be included at their full intensity, not divided by the multiplier; the matter is discussed in more detail by Wilson (1987 a). It should be noted, however, that organic structures containing molecules related by rotation axes are rare, and such structures related by mirror planes are even rarer (Wilson, 1993).

Further alterations of the intensities occur if two or more such symmetry elements are present in the space group. The effects were treated in detail by Rogers (1950), who used them to construct a table for the determination of space groups by supplementing the usual knowledge of Laue group with statistical information. Only two pairs of space groups, the orthorhombic and , and their cubic supergroups and , remained unresolved. Examination of this table shows that what statistical information does is to resolve the Laue group into point groups; the further resolution into space groups is equivalent to the use of Table 3.1.4.1 in IT A (2005). The statistical consequences of each point group, as given by Rogers, are reproduced in Table 2.1.3.3.
Note. The pairs of point groups, 1 and and 3 and , not distinguished by average multiples, may be distinguished by their centric and acentric probability density functions.
^{†}The entry for the principal zone for the point group 422 was given incorrectly as 2 in the first edition of this volume.

For the purpose of this chapter, `ideal' probability distributions or probability density functions are the asymptotic forms obtained by the use of the centrallimit theorem when the number of atoms in the unit cell, N, is sufficiently large. In order to derive them it is necessary to outline the properties of characteristic functions and to state alternative conditions for the validity of the centrallimit theorem; the distributions themselves are derived in Section 2.1.5.
The average value of is very important in probability theory; it is called the characteristic function of the distribution and is denoted by or, when no confusion can arise, by . It exists for all legitimate distributions, whether discrete or continuous. In the continuous case it is given by and is thus the Fourier transform of . In many cases it can be obtained from known integrals. For example, for the Cauchy distribution, and for the normal distribution, Since the characteristic function is the Fourier transform of the distribution function, the converse is true, and if the characteristic function is known the probability distribution function can be obtained by the use of Fourier inversion theorem, An alternative approach to the derivation of the distribution from a known characteristic function will be discussed below.
The most important property of characteristic functions in crystallography is the following: if x and y are independent random variables with characteristic functions and , the characteristic function of their sum is the product Obviously this can be extended to any number of independent random variables.
When the moments exist, the characteristic function can be expanded in a power series in which the kth term is . If the power series is substituted in equation (2.1.4.1), one obtains The moments are written with primes in order to indicate that equation (2.1.4.10) is valid for moments about an arbitrary origin as well as for moments about the mean. If the random variable is transformed by a change of origin and scale, say the characteristic function for y becomes
A function that is often more useful than the characteristic function is its logarithm, the cumulantgenerating function: where the k's are called the cumulants and may be regarded as being defined by the equation. They can be evaluated in terms of the moments by combining the series (2.1.4.10) for with the ordinary series for the logarithm and equating the coefficients of . In most cases the process as described is tedious, but it can be shortened by use of a general method [Stuart & Ord (1994), Section 3.14, pp. 87–88; Exercise 3.19, p. 119]. Obviously, the cumulants exist only if the moments exist. The first few relations are Such expressions and their converses up to are given by Stuart & Ord (1994, pp. 88–91). Since all the cumulants except can be expressed in terms of the central moments only (i.e., those unprimed), only is changed by a change of the origin. Because of this property, they are sometimes called the semiinvariants (or seminvariants) of the distribution. Since addition of random variables is equivalent to the multiplication of their characteristic functions [equation (2.1.4.8)] and multiplication of functions is equivalent to the addition of their logarithms, each cumulant of the distribution of the sum of a number of random variables is equal to the sum of the cumulants of the distribution functions of the individual variables – hence the name cumulants. Although the cumulants (except ) are independent of a change of origin, they are not independent of a change of scale. As for the moments, a change of scale simply multiplies them by a power of the scale factor; if The cumulants of the normal distribution are particularly simple. From equation (2.1.4.5), the cumulantgenerating function of a normal distribution is all cumulants with are identically zero.
A simple form of this important theorem can be stated as follows:
If are independent and identically distributed random variables, each of them having the same mean m and variance , then the sum tends to be normally distributed – independently of the distribution(s) of the individual random variables – with mean and variance , provided n is sufficiently large.
In order to prove this theorem, let us define a standardized random variable corresponding to the sum , i.e., such that its mean is zero and its variance is unity: where is a standardized single random variable. The characteristic function of is therefore given by where the brackets denote the operation of averaging with respect to the appropriate probability density function (p.d.f.) [cf. equation (2.1.4.1)]. Equation (2.1.4.22) follows from equation (2.1.4.21) by the assumption of independence, while the assumption of identically distributed variables leads to the identity of the characteristic functions of the individual variables – as seen in equation (2.1.4.23).
On the assumption that moments of all the orders exist – a most plausible assumption in situations usually encountered in structurefactor statistics – we can now expand the characteristic function of a single variable in a power series [cf. equation (2.1.4.10)]: since , and the quantity denoted by in (2.1.4.24) is given by The characteristic function of is therefore Now, as is seen from (2.1.4.25), for every fixed t the quantity tends to zero as n tends to infinity. The cumulantgenerating function of the standardized sum then becomes and the logarithm on the righthand side of equation (2.1.4.27) has the form with as . We may therefore use the expansion which is valid for . We then obtain and finally, for every fixed t, Since the logarithm is a continuous function of t, it follows directly that The righthand side of (2.1.4.29) is just the characteristic function of a standardized normal p.d.f., i.e., a normal p.d.f. with zero mean and unit variance [cf. equation (2.1.4.5)]. The asymptotic expression for the p.d.f. of the standardized sum is therefore obtained as which proves the above version of the centrallimit theorem.
Surprisingly, this theorem has a very wide applicability and values of n as low as 30 are often large enough for the theorem to be useful. Situations in which the normal p.d.f. must be modified or replaced by an altogether different one are dealt with in Sections 2.1.7 and 2.1.8 of this chapter.
The above outline of a proof of the centrallimit theorem depended on the existence of moments of all orders. The components of structure factors always possess finite moments of all orders, but the existence of moments beyond the second is not necessary for the validity of the theorem and it can be proved under much less stringent conditions. In fact, if all the random variables in equation (2.1.4.19) have the same distribution – as in a homoatomic structure – the only requirement is that the second moments of the distributions should exist [the Lindeberg–Lévy theorem (e.g. Cramér, 1951)]. If the distributions are not the same – as in a heteroatomic structure – some further condition is necessary to ensure that no individual random variable dominates the sum. The Liapounoff proof requires the existence of third absolute moments, but this is regarded as aesthetically displeasing; a theorem that ultimately involves only means and variances should require only means and variances in the proof. The Lindeberg–Cramér conditions meet this aesthetic criterion. Roughly, the conditions are that , the variance of the sum, should tend to infinity and , where is the variance of the jth random variable, should tend to zero for all j as n tends to infinity. The precise formulation is quoted by Kendall & Stuart (1977, p. 207).
The centrallimit theorem, under certain conditions, remains valid even when the variables summed in equation (2.1.4.19) are not independent. The conditions have been investigated by Bernstein (1922, 1927); roughly they amount to requiring that the variables should not be too closely correlated. The theorem applies, in particular, when each is related to a finite number, , of its neighbours, when the x's are said to be dependent. The dependence seems plausible for crystallographic applications, since the positions of atoms close together in a structure are closely correlated by interatomic forces, whereas those far apart will show little correlation if there is any flexibility in the asymmetric unit when unconstrained. Harker's (1953) idea of `globs' seems equivalent to dependence. Longrange stereochemical effects, as in pseudographitic aromatic hydrocarbons, would presumably produce longrange correlations and make dependence less plausible. If Bernstein's conditions are satisfied, the centrallimit theorem would apply, but the actual value of would have to be used for the variance, instead of the sum of the variances of the random variables in (2.1.4.19). Because of the correlations the two values are no longer equal.
French & Wilson (1978) seem to have been the first to appeal explicitly to the centrallimit theorem extended to nonindependent variables, but many previous workers [for typical references, see Wilson (1981)] tacitly made the replacement – in the Xray case substituting the local mean intensity for the sum of the squares of the atomic scattering factors.
In applications of the centrallimit theorem, and its extensions, to intensity statistics the 's of equation (2.1.4.19) have the form (atomic scattering factor of the jth atom) times (a trigonometric expression characteristic of the space group and Wyckoff position; also known as the trigonometric structure factor). These trigonometric expressions for all the space groups, and general Wyckoff positions, are given in Tables A1.4.3.1 through A1.4.3.7 , and their first few even moments (fixedindex averaging) are given in Table 2.1.7.1. One cannot, of course, conclude that the magnitudes of the structure factor always have a normal distribution – even if the structure is homoatomic; one must look at each problem and see what components of the structure factor can be put in the form (2.1.4.19), deduce the m and to be used for each, and combine the components to obtain the asymptotic (large N, not large x) expression for the problem in question. Ordinarily the components are the real and the imaginary parts of the structure factor; the structure factor is purely real only if the structure is centrosymmetric, the spacegroup origin is chosen at a crystallographic centre and the atoms are nondispersive.
The ideal acentric distributions are obtained by applying the centrallimit theorem to the real and the imaginary parts of the structure factor, as given by equation (2.1.1.1). Consider first a crystal with no rotational symmetry (space group P1). The real part, A, of the structure factor is then given by where N is the number of atoms in the unit cell and is the phase angle of the jth atom. The centrallimit theorem then states that A tends to be normally distributed about its mean value with variance equal to its meansquare deviation from its mean. Under the assumption that the phase angles are uniformly distributed on the 0–2π range, the mean value of each cosine is zero, so that its variance is Under the same assumption, the mean value of each is onehalf, so that the variance becomes where Σ is the sum of the squares of the atomic scattering factors [cf. equation (2.1.2.4)]. The asymptotic form of the distribution of A is therefore given by A similar calculation, with sines instead of cosines, gives an analogous distribution for the imaginary part B, so that the joint probability of the real and imaginary parts of F is Ordinarily, however, we are more interested in the distribution of the magnitude, , of the structure factor than in the distribution of A and B. Using polar coordinates in equation (2.1.5.5) [, ] and integrating over the angle φ gives It is usually convenient, in structurefactor and intensity statistics, to express the results in terms of the normalized structure factor E and its magnitude . If has been put on an absolute scale (see Section 2.2.4.3 ), we have so that is the normalizedstructurefactor version of (2.1.5.6).
Distributions resulting from noncentrosymmetric crystals are known as acentric distributions; those arising from centrosymmetric crystals are known as centric. These adjectives are used to describe distributions, not crystal symmetry.
When a nondispersive crystal is centrosymmetric, and the spacegroup origin is chosen at a crystallographic centre of symmetry, the imaginary part B of its structure amplitude is zero. In the simplest case, space group , the contribution of the jth atom plus its centrosymmetric counterpart is . The calculation of goes through as before, with allowance for the fact that there are pairs instead of N independent atoms, giving or equivalently or
Additional crystallographic symmetry elements do not produce any essential alterations in the ideal centric or acentric distribution; their main effect is to replace the parameter Σ by a `distribution parameter', called S by Wilson (1950) and Rogers (1950), in certain groups of reflections. In addition, in noncentrosymmetric space groups, the distribution of certain groups of reflections becomes centric, though the general reflections remain acentric. The changes are summarized in Tables 2.1.3.1 and 2.1.3.2. The values of S are integers for lattice centring, glide planes and those screw axes that produce absences, and approximate integers for rotation axes and mirror planes; the modulations of the average intensity in reciprocal space outlined in Section 2.1.3.2 apply.
It should be noted that if intensities are normalized to the average of the group to which they belong, rather than to the general average, the distributions given in equations (2.1.5.8) and (2.1.5.11) are not affected.
The distributions just derived are asymptotic, as they are limiting values for large N. They are the only ideal distributions, in this sense, when there is only strict crystallographic symmetry and no dispersion. However, other ideal (asymptotic) distributions arise when there is noncrystallographic symmetry, or if there is dispersion. The subcentric distribution, where is a modified Bessel function of the first kind and k is the ratio of the scattering from the centrosymmetric part to the total scattering, arises when a noncentrosymmetric crystal contains centrosymmetric parts or when dispersion introduces effective noncentrosymmetry into the scattering from a centrosymmetric crystal (Srinivasan & Parthasarathy, 1976, ch. III; Wilson, 1980a,b; Shmueli & Wilson, 1983). The bicentric distribution arises, for example, when the `asymmetric unit in a centrosymmetric crystal is a centrosymmetric molecule' (Lipson & Woolfson, 1952); is a modified Bessel function of the second kind. There are higher hypercentric, hyperparallel and sesquicentric analogues (Wilson, 1952; Rogers & Wilson, 1953; Wilson, 1956). The ideal subcentric and bicentric distributions are expressed in terms of known functions, but the higher hypercentric and the sesquicentric distributions have so far been studied only through their moments and integral representations. Certain hypersymmetric distributions can be expressed in terms of Meijer's G functions (Wilson, 1987b).
When only the intrinsic probability distributions are being considered, it does not greatly matter whether the variable chosen is the intensity of reflection (I), or its positive square root, the modulus of the structure factor , since both are necessarily real and nonnegative. In an obvious notation, the relation between the intensity distribution and the structurefactor distribution is or Statistical fluctuations in counting rates, however, introduce a small but finite probability of negative observed intensities (Wilson, 1978a, 1980a) and thus of imaginary structure factors. This practical complication is treated in IT C (2004, Parts 7 and 8 ).
Both the ideal centric and acentric distributions are simple members of the family of gamma distributions, defined by where n is a parameter, not necessarily integral, and is the gamma function. Thus the ideal acentric intensity distribution is and the ideal centric intensity distribution is The properties of gamma distributions and of the related beta distributions, summarized in Table 2.1.5.1, are used in Section 2.1.6 to derive the probability density functions of sums and of ratios of intensities drawn from one of the ideal distributions.

The integral of the probability density function from the lower end of its range up to an arbitrary value x is called the cumulative probability distribution, or simply the distribution function, , of x. It can always be written if the lower end of its range is not actually one takes as identically zero between and the lower end of its range. For the distribution of A [equation (2.1.5.4) or (2.1.5.9)] the lower limit is in fact ; for the distribution of , , I and the lower end of the range is zero. In such cases, equation (2.1.5.21) becomes In crystallographic applications the cumulative distribution is usually denoted by , rather than by the capital letter corresponding to the probability density function designation. The cumulative forms of the ideal acentric and centric distributions (Howells et al., 1950) have found many applications. For the acentric distribution of [equation (2.1.5.8)] the integration is readily carried out: The integral for the centric distribution of [equation (2.1.5.11)] cannot be expressed in terms of elementary functions, but the integral required has so many important applications in statistics that it has been given a special name and symbol, the error function erf(x), defined by For the centric distribution, then The error function is extensively tabulated [see e.g. Abramowitz & Stegun (1972), pp. 310–311, and a closely related function on pp. 966–973].
In Section 2.1.2.1, it was shown that the average intensity of a sufficient number of reflections is [equation (2.1.2.4)]. When the number of reflections is not `sufficient', their mean value will show statistical fluctuations about ; such statistical fluctuations are in addition to any systematic variation resulting from nonindependence of atomic positions, as discussed in Sections 2.1.2.1–2.1.2.3. We thus need to consider the probability density functions of sums like and averages like where is the intensity of the ith reflection. The probability density distributions are easily obtained from a property of gamma distributions: If are independent gammadistributed variables with parameters , their sum is a gammadistributed variable with parameter p equal to the sum of the parameters. The sum of n intensities drawn from an acentric distribution thus has the distribution the parameters of the variables added are all equal to unity, so that their sum is p. Similarly, the sum of n intensities drawn from a centric distribution has the distribution each parameter has the value of onehalf. The corresponding distributions of the averages of n intensities are then for the acentric case, and for the centric. In both cases the expected value of Y is and the variances are and , respectively, just as would be expected.
Ratios like where is given by equation (2.1.6.1), and the 's are the intensities of a set of reflections (which may or may not overlap with those included in ), are used in correlating intensities measured under different conditions. They arise in correlating reflections on different layer lines from the same or different specimens, in correlating the same reflections from different crystals, in normalizing intensities to the local average or to , and in certain systematic trialanderror methods of structure determination (see Rabinovich & Shakked, 1984, and references therein). There are three main cases:
Aside from the scale factor, in case (i) and will differ chiefly through relatively small statistical fluctuations and uncorrected systematic errors, whereas in case (ii) the differences will be relatively large because of the inherent differences in the intensities. Here we are concerned only with cases (ii) and (iii); the practical problems of case (i) are postponed to IT C (2004).
There is little in the crystallographic literature concerning the probability distribution of sums like (2.1.6.1) or ratios like (2.1.6.7); certain results are reviewed by Srinivasan & Parthasarathy (1976, ch. 5), but with a bias toward partially related structures that makes it difficult to apply them to the immediate problem.
In case (ii) ( and independent), acentric distribution, Table 2.1.5.1 gives the distribution of the ratio where is a beta distribution of the second kind, Y is given by equation (2.1.6.2) and Z by where n is the number of intensities included in the numerator and m is the number in the denominator. The expected value of is then with variance One sees that is a biased estimate of the scaling factor between two sets of intensities and the bias, of the order of , depends only on the number of intensities averaged in the denominator. This may seem odd at first sight, but it becomes plausible when one remembers that the mean of a quantity is an unbiased estimator of itself, but the reciprocal of a mean is not an unbiased estimator of the mean of a reciprocal. The mean exists only if and the variance only for .
In the centric case, the expression for the distribution of the ratio of the two means Y and Z becomes with the expected value of equal to and with its variance equal to For the same number of reflections, the bias in and the variance for the centric distribution are considerably larger than for the acentric. For both distributions the variance of the scaling factor approaches zero when n and m become large. The variances are large for m small, in fact `infinite' if the number of terms averaged in the denominator is sufficiently small. These biases are readily removed by multiplying by or . Many methods of estimating scaling factors – perhaps most – also introduce bias (Wilson, 1975; Lomer & Wilson, 1975; Wilson, 1976, 1978c) that is not so easily removed. Wilson (1986a) has given reasons for supposing that the bias of the ratio (2.1.6.7) approximates to whatever the intensity distribution. Equations (2.1.6.12) and (2.1.6.15) are consistent with this.
When the 's are a subset of the 's, the beta distributions of the second kind are replaced by beta distributions of the first kind, with means and variances readily found from Table 2.1.5.1. The distribution of such a ratio is chiefly of interest when Y relates to a single reflection and Z relates to a group of m intensities including Y. This corresponds to normalizing intensities to the local average. Its distribution is in the acentric case, with an expected value of of unity; there is no bias, as is obvious a priori. The variance of is which is less than the variance of the intensities normalized to an `infinite' population by a fraction of the order of . Unlike the variance of the scaling factor, the variance of the normalized intensity approaches unity as n becomes large. For intensities having a centric distribution, the distribution normalized to the local average is given by with an expected value of of unity and with variance less than that for an `infinite' population by a fraction of about .
Similar considerations apply to intensities normalized to in the usual way, since they are equal to those normalized to multiplied by .
Since and [equations (2.1.6.1) and (2.1.6.8)] are sums of identically distributed variables conforming to the conditions of the centrallimit theorem, it is tempting to approximate their distributions by normal distributions with the correct mean and variance. This would be reasonably satisfactory for the distributions of and themselves for quite small values of n and m, but unsatisfactory for the distribution of their ratio for any values of n and m, even large. The ratio of two variables with normal distributions is notorious for its rather indeterminate mean and `infinite' variance, resulting from the `tail' of the denominator distributions extending through zero to negative values. The leading terms of the ratio distribution are given by Kendall & Stuart (1977, p. 288).
The probability density functions (p.d.f.'s) of the magnitude of the structure factor, presented in Section 2.1.5, are based on the centrallimit theorem discussed above. In particular, the centric and acentric p.d.f.'s given by equations (2.1.5.11) and (2.1.5.8), respectively, are expected to account for the statistical properties of diffraction patterns obtained from crystals consisting of nearly equal atoms, which obey the fundamental assumptions of uniformity and independence of the atomic contributions and are not affected by noncrystallographic symmetry and dispersion. It is also assumed there that the number of atoms in the asymmetric unit is large. Distributions of structurefactor magnitudes which are based on the centrallimit theorem, and thus obey the above assumptions, have been termed `ideal', and the subjects of the following sections are those distributions for which some of the above assumptions/restrictions are not fulfilled; the latter distributions will be called `nonideal'.
We recall that the assumption of uniformity consists of the requirement that the fractional part of the scalar product be uniformly distributed over the [0, 1] interval, which holds well if are rationally independent (Hauptman & Karle, 1953), and permits one to regard the atomic contribution to the structure factor as a random variable. This is of course a necessary requirement for any statistical treatment. If, however, the atomic composition of the asymmetric unit is widely heterogeneous, the structure factor is then a sum of unequally distributed random variables and the Lindeberg–Lévy version of the centrallimit theorem (cf. Section 2.1.4.4) cannot be expected to apply. Other versions of this theorem might still predict a normal p.d.f. of the sum, but at the expense of a correspondingly large number of terms/atoms. It is well known that atomic heterogeneity gives rise to severe deviations from ideal behaviour (e.g. Howells et al., 1950) and one of the aims of crystallographic statistics has been the introduction of a correct dependence on the atomic composition into the nonideal p.d.f.'s [for a review of the early work on nonideal distributions see Srinivasan & Parthasarathy (1976)]. A somewhat less well known fact is that the dependence of the p.d.f.'s of on spacegroup symmetry becomes more conspicuous as the composition becomes more heterogeneous (e.g. Shmueli, 1979; Shmueli & Wilson, 1981). Hence both the composition and the symmetry dependence of the intensity statistics are of interest. Other problems, which likewise give rise to nonideal p.d.f.'s, are the presence of heavy atoms in (variable) special positions, heterogeneous structures with complete or partial noncrystallographic symmetry, and the presence of outstandingly heavy dispersive scatterers.
The need for theoretical representations of nonideal p.d.f.'s is exemplified in Fig. 2.1.7.1(a), which shows the ideal centric and acentric p.d.f.'s together with a frequency histogram of values, recalculated for a centrosymmetric structure containing a platinum atom in the asymmetric unit of (Faggiani et al., 1980). Clearly, the deviation from the Gaussian p.d.f., predicted by the centrallimit theorem, is here very large and a comparison with the possible ideal distributions can (in this case) lead to wrong conclusions.
Two general approaches have so far been employed in derivations of nonideal p.d.f.'s which account for the abovementioned problems: the correctionfactor approach, to be dealt with in the following sections, and the more recently introduced Fourier method, to which Section 2.1.8 is dedicated. In what follows, we introduce briefly the mathematical background of the correctionfactor approach, apply this formalism to centric and acentric nonideal p.d.f.'s, and present the numerical values of the moments of the trigonometric structure factor which permit an approximate evaluation of such p.d.f.'s for all the threedimensional space groups.
Suppose that is a p.d.f. which accurately describes the experimental distribution of the random variable x, where x is related to a sum of random variables and can be assumed to obey (to some approximation) an ideal p.d.f., say , based on the centrallimit theorem. In the correctionfactor approach we seek to represent as where are coefficients which depend on the cause of the deviation of from the centrallimit theorem approximation and are suitably chosen functions of x. A choice of the set is deemed suitable, if only from a practical point of view, if it allows the convenient introduction of the cause of the above deviation of into the expansion coefficients . This requirement is satisfied – also from a theoretical point of view – by taking as a set of polynomials which are orthogonal with respect to the ideal p.d.f., taken as their weight function (e.g. Cramér, 1951). That is, the functions so chosen have to obey the relationship where is the range of existence of all the functions involved. It can be readily shown that the coefficients are given by where the brackets in equation (2.1.7.3) denote averaging with respect to the unknown p.d.f. and is the coefficient of the nth power of x in the polynomial . The coefficients are thus directly related to the moments of the nonideal distribution and the coefficients of the powers of x in the orthogonal polynomials. The latter coefficients can be obtained by the Gram–Schmidt procedure (e.g. Spiegel, 1974), or by direct use of the Szegö determinants (e.g. Cramér, 1951), for any weight function that has finite moments. However, the feasibility of the present approach depends on our ability to obtain the moments without the knowledge of the nonideal p.d.f., .
We shall summarize here the nonideal centric and acentric distributions of the magnitude of the normalized structure factor E (e.g. Shmueli & Wilson, 1981; Shmueli, 1982). We assume that (i) all the atoms are located in general positions and have rationally independent coordinates, (ii) all the scatterers are dispersionless, and (iii) there is no noncrystallographic symmetry. Arbitrary atomic composition and spacegroup symmetry are admitted. The appropriate weight functions and the corresonding orthogonal polynomials are where and are Hermite and Laguerre polynomials, respectively, as defined, for example, by Abramowitz & Stegun (1972). Equations (2.1.7.2), (2.1.7.3) and (2.1.7.4) suffice for the general formulation of the above nonideal p.d.f.'s of . Their full derivation entails (i) the expression of a sufficient number of moments of in terms of absolute moments of the trigonometric structure factor (e.g. Shmueli & Wilson, 1981; Shmueli, 1982) and (ii) calculation of the latter moments for the various symmetries (Wilson, 1978b; Shmueli & Kaldor, 1981, 1983). The notation below is similar to that employed by Shmueli (1982).
These nonideal p.d.f.'s of , for which the first five expansion terms are available, are given by and for centrosymmetric and noncentrosymmetric space groups, respectively, where and are the ideal centric and acentric p.d.f.'s [see (2.1.7.4)] and the unified form of the coefficients and , for 2, 3, 4 and 5, is (Shmueli, 1982), where U = 35 or 18, V = 210 or 100 and W = 3150 or 900 according as or is required, respectively, and the other quantities in equation (2.1.7.7) are given below. The compositiondependent terms in equations (2.1.7.7) are where m is the number of atoms in the asymmetric unit, are their scattering factors, and the symmetry dependence is expressed by the coefficients in equation (2.1.7.7), as follows: where according as the space group is centrosymmetric or noncentrosymmetric, respectively, and in equation (2.1.7.9) is given by where is the kth absolute moment of the trigonometric structure factor In equation (2.1.7.12), g is the number of general equivalent positions listed in IT A (2005) for the space group in question, times the multiplicity of the Bravais lattice, is the sth spacegroup operator and is an atomic position vector.
The cumulative distribution functions, obtained by integrating equations (2.1.7.5) and (2.1.7.6), are given by and for centrosymmetric and noncentrosymmetric space groups, respectively, where the coefficients are defined in equations (2.1.7.7)–(2.1.7.12) . Note that the first term on the righthand side of equation (2.1.7.13) and the first two terms on the righthand side of equation (2.1.7.14) are just the cumulative distributions derived from the ideal centric and acentric p.d.f.'s in Section 2.1.5.6.
The moments were compiled for all the space groups by Wilson (1978b) for 1 and 2, and by Shmueli & Kaldor (1981, 1983) for 1, 2, 3 and 4. These results are presented in Table 2.1.7.1. Closed expressions for the normalized moments were obtained by Shmueli (1982) for the triclinic, monoclinic and orthorhombic space groups except and (see Table 2.1.7.2). The compositiondependent terms, , are most conveniently computed as weighted averages over the ranges of which were used in the construction of the Wilson plot for the computation of the values.
Note. hkl subsets: (1) ; (2) ; (3) ; (4) ; (5) ; (6) ; (7) ; (8) ; (9) ; (10) ; (11) ; (12) ; (13) ; (14) ; (15) hkl all even; (16) only one index odd; (17) only one index even; (18) hkl all odd; (19) two indices odd; (20) ; (21) .
^{†}And the enantiomorphous space group.


As noted in Section 2.1.8.7 below, the Fourier representation of the probability distribution of is usually much better than the particular orthogonalfunction representation discussed in Section 2.1.7.3. Many, perhaps most, nonideal centric distributions look like slight distortions of the ideal (Gaussian) distribution and have no resemblance to a cosine function. The empirical observation thus seems paradoxical. The probable explanation has been pointed out by Wilson (1986b). A truncated Fourier series is a best approximation, in the leastsquares sense, to the function represented. The particular orthogonalfunction approach used in equation (2.1.7.5), on the other hand, is not a leastsquares approximation to , but is a leastsquares approximation to The usual expansions (often known as Gram–Charlier or Edgeworth) thus give great weight to fitting the distribution of the (compararively few) strong reflections, at the expense of a poor fit for the (much more numerous) weaktomedium ones. Presumably, a similar situation exists for the representation of acentric distributions, but this has not been investigated in detail. Since the centric distributions often look nearly Gaussian, one is led to ask if there is an expansion in orthogonal functions that (i) has the leading term and (ii) is a leastsquares (as well as an orthogonalfunction)^{2} fit to . One does exist, based on the orthogonal functions where is the Gaussian distribution (MyllerLebedeff, 1907). Unfortunately, no reasonably simple relationship between the coefficients and readily evaluated properties of has been found, and the MyllerLebedeff expansion has not, as yet, been applied in crystallography. Although Stuart & Ord (1994, p. 112) dismiss it in a threeline footnote, it does have important applications in astronomy (van der Marel & Franx, 1993; Gerhard, 1993).
The starting point of the method described in the previous section is the centrallimit theorem approximation, and the method consists of finding correction factors which result in better approximations to the actual p.d.f. Conceptually, this is equivalent to improving the approximation of the characteristic function [cf. equation (2.1.4.10)] over that which led to the centrallimit theorem result.
The method to be described in this section does not depend on any initial approximation and will be shown to utilize the dependence of the exact value of the characteristic function on the spacegroup symmetry, atomic composition and other factors. This approach has its origin in a simple but ingenious observation by Barakat (1974), who noted that if a random variable has lower and upper bounds then the corresponding p.d.f. can be nonzero only within these bounds and can therefore be expanded in an ordinary Fourier series and set to zero (identically) outside the bounded interval. Barakat's (1974) work dealt with intensity statistics of laser speckle, where sinusoidal waves are involved, as in the present problem. This method was applied by Weiss & Kiefer (1983) to testing the accuracy of a steepestdescents approximation to the exact solution of the problem of random walk, and its first application to crystallographic intensity statistics soon followed (Shmueli et al., 1984). Crystallographic (e.g. Shmueli & Weiss, 1987; Rabinovich et al., 1991a,b) and noncrystallographic (Shmueli et al., 1985; Shmueli & Weiss, 1985a; Shmueli, Weiss & Wilson, 1989; Shmueli et al., 1990) symmetry was found to be tractable by this approach, as well as joint conditional p.d.f.'s of several structure factors (Shmueli & Weiss, 1985b, 1986; Shmueli, Rabinovich & Weiss, 1989). The Fourier method is illustrated below by deriving the exact counterparts of equations (2.1.7.5) and (2.1.7.6) and specifying them for some simple symmetries. We shall then indicate a method of treating higher symmetries and present results which will suffice for evaluation of Fourier p.d.f.'s of for a wide range of space groups.
We assume, as before, that (i) the atomic phase factors [cf. equation (2.1.1.2)] are uniformly distributed on (0–2) and (ii) the atomic contributions to the structure factor are independent. For a centrosymmetric space group, with the origin chosen at a centre of symmetry, the random variable is the (real) normalized structure factor E and its bounds are and , where Here, is the maximum possible value of E and is the conventional scattering factor of the jth atom, including its temperature factor. The p.d.f., , can be nonzero in the range () only and can thus be expanded in the Fourier series where . Only the real part of is relevant. The Fourier coefficients can be obtained in the conventional manner by integrating over the range (), Since, however, for and , it is possible and convenient to replace the limits of integration in equation (2.1.8.3) by infinity. Thus Equation (2.1.8.4) shows that is a Fourier transform of the p.d.f. and, as such, it is the value of the corresponding characteristic function at the point [i.e., , where the characteristic function is defined by equation (2.1.4.1)]. It is also seen that is the expected value of the exponential . It follows that the feasibility of the present approach depends on one's ability to evaluate the characteristic function in closed form without the knowledge of the p.d.f.; this is analogous to the problem of evaluating absolute moments of the structure factor for the correctionfactor approach, discussed in Section 2.1.7. Fortunately, in crystallographic applications these calculations are feasible, provided individual isotropic motion is assumed. The formal expression for the p.d.f. of , for any centrosymmetric space group, is therefore where use is made of the assumption that , and the Fourier coefficients are evaluated from equation (2.1.8.4).
The p.d.f. of for a noncentrosymmetric space group is obtained by first deriving the joint p.d.f. of the real and imaginary parts of E and then integrating out its phase. The general expression for E is where is the phase of E. The required joint p.d.f. is and introducing polar coordinates and , where and , we have Integrating out the phase , we obtain where is the Bessel function of the first kind (e.g. Abramowitz & Stegun, 1972). This is a general form of the p.d.f. of for a noncentrosymmetric space group. The Fourier coefficients are obtained, similarly to the above, as and the average in equation (2.1.8.10), just as that in equation (2.1.8.4), is evaluated in terms of integrals over the appropriate trigonometric structure factors. In terms of the characteristic function for a joint p.d.f. of A and B, the Fourier coefficient in equation (2.1.8.10) is given by .
We shall denote the characteristic function by if it corresponds to a Fourier coefficient of a Fourier series for a centrosymmetric space group and by or by , where and , if it corresponds to a Fourier series for a noncentrosymmetric space group.
Equations (2.1.8.5) and (2.1.8.9) are the exact counterparts of equations (2.1.7.5) and (2.1.7.6), respectively. The computational effort required to evaluate equation (2.1.8.9) is somewhat greater than that for (2.1.8.5), because a double Fourier series has to be summed. The p.d.f. for any noncentrosymmetric space group can be expressed by a double Fourier series, but this can be simplified if the characteristic function depends on alone, rather than on and separately. In such cases the p.d.f. of for a noncentrosymmetric space group can be expanded in a single Fourier–Bessel series (Barakat, 1974; Weiss & Kiefer, 1983; Shmueli et al., 1984). The general form of this expansion is where and where is the Bessel function of the first kind, and is the uth root of the equation ; the atomic contribution to equation (2.1.8.13) is computed as The roots are tabulated in the literature (e.g. Abramowitz & Stegun, 1972), but can be most conveniently computed as follows. The first five roots are given by and the higher ones can be obtained from McMahon's approximation (cf. Abramowitz & Stegun, 1972) where . For the values given by equation (2.1.8.15) have a relative error less than 10^{−11} so that no refinement of roots of higher orders is needed (Shmueli et al., 1984). Numerical computations of single Fourier–Bessel series are of course faster than those of the double Fourier series, but both representations converge fairly rapidly.
Consider the Fourier coefficient of the p.d.f. of for the centrosymmetric space group . The normalized structure factor is given by and the Fourier coefficient is Equation (2.1.8.20) is obtained from equation (2.1.8.19) if we make use of the assumption of independence, the assumption of uniformity allows us to rewrite equation (2.1.8.20) as (2.1.8.21), and the expression in the braces in the latter equation is just a definition of the Bessel function (e.g. Abramowitz & Stegun, 1972).
Let us now consider the Fourier coefficient of the p.d.f. of for the noncentrosymmetric space group . We have These expressions for A and B are substituted in equation (2.1.8.10), resulting in Equation (2.1.8.24) leads to (2.1.8.25) by introducing polar coordinates analogous to those leading to equation (2.1.8.8), and equation (2.1.8.26) is then obtained by making use of the assumptions of independence and uniformity in an analogous manner to that detailed in equations (2.1.8.12)–(2.1.8.22) above.
The righthand side of equation (2.1.8.26) is to be used as a Fourier coefficient of the double Fourier series given by (2.1.8.9). Since, however, this coefficient depends on alone rather than on m and n separately, the p.d.f. of for P1 can also be represented by a Fourier–Bessel series [cf. equation (2.1.8.11)] with coefficient where is the uth root of the equation .
We now illustrate the methodology of deriving characteristic functions for space groups of higher symmetries, following the method of Rabinovich et al. (1991a,b). The derivation is performed for the space group P [No. 174]. According to Table A1.4.3.6 , the real and imaginary parts of the normalized structure factor are given by and where Note that , i.e., one of these contributions depends on the other two; this is a recurring problem in calculations pertaining to trigonal and hexagonal systems. For brevity, we write directly the general form of the characteristic function from which the functional form of the Fourier coefficient can be readily obtained. The characteristic function is given by where , and the assumption of independence was used. If we further employ the assumption of uniformity, while remembering that the angular variables are not independent, the characteristic function can be written as where is the Fourier representation of the periodic delta function. Equation (2.1.8.34) then becomes If we change the variable to , becomes and . Hence The imaginary part of the summation, involving Bessel functions of odd orders, vanishes upon integration and the latter is restricted to the positive quadrant in . Thus, upon replacing cosines by sines (this is permissible at this stage) the atomic contribution to the characteristic function becomes and a double Fourier series must be used for the p.d.f.
Expressions for the atomic contributions to the characteristic functions were obtained by Rabinovich et al. (1991a) for a wide range of space groups, by methods similar to those described above. These expressions are collected in Table 2.1.8.1 in terms of symbols which are defined below. The following abbreviations are used in the subsequent definitions of the symbols: and the symbols appearing in Table 2.1.8.1 are given below: where and
^{†}And the enantiomorphous space group.

The averages appearing in the above summary are, in general, computed as except and which are computed as where is any of the atomic characteristic functions indicated above. The superscripts preceding the symbols in the above summary are appended to the corresponding symbols in Table 2.1.8.1 on their first occurrence.
As pointed out above, the representation of the p.d.f.'s by Fourier series is also applicable to effects of noncrystallographic symmetry. Thus, Shmueli et al. (1985) obtained the following Fourier coefficient for the bicentric distribution in the space group to be used with equation (2.1.8.5). Furthermore, if we use the important property of the characteristic function as outlined in Section 2.1.4.1, it is easy to write down the Fourier coefficient for a asymmetric unit containing a centrosymmetric fragment centred at a noncrystallographic centre and a number of atoms not related by symmetry. This Fourier for the above partially bicentric arrangement is a product of expressions (2.1.8.17) and (2.1.8.41), with the appropriate number of atoms in each factor (Shmueli & Weiss, 1985a). While the purely bicentric p.d.f. obtained by using (2.1.8.41) with (2.1.8.5) is significantly different from the ideal bicentric p.d.f. given by equation (2.1.5.13) only when the atomic composition is sufficiently heterogeneous, the above partially bicentric p.d.f. appears to be a useful development even for an equalatom structure.
The problem of the coexistence of several noncrystallographic centres of symmetry within the asymmetric unit of P , and its effect on the p.d.f. of , was examined by Shmueli, Weiss & Wilson (1989) by the Fourier method. The latter study indicates that the strongest effect is produced by the presence of a single noncrystallographic centre.
Another kind of noncrystallographic symmetry is that arising from the presence of centrosymmetric fragments in a noncentrosymmetric structure – the subcentric arrangement already discussed in Section 2.1.5.4. A Fourierseries representation of a nonideal p.d.f. corresponding to this case was developed by Shmueli, Rabinovich & Weiss (1989), and was also applied to the mathematically equivalent effects of dispersion and presence of heavy scatterers in centrosymmetric special positions in a noncentrosymmetric space group.
A variety of other nonideal p.d.f.'s occur when heavy atoms are present in special positions (Shmueli & Weiss, 1988). Without going into the details of this development, it can be noted that if the atoms are distributed among k types of Wyckoff positions, the characteristic function corresponding to the p.d.f. of is a product of the k characteristic functions, each of which is related to one of these special positions; the same property of the characteristic function as that in Section 2.1.4.1 is here utilized.
The need for theoretical nonideal distributions was exemplified by Fig. 2.1.7.1(a), referred to above, and the performance of the two approaches described above, for this particular example, is shown in Fig. 2.1.7.1(b). Briefly, the Fourier p.d.f. shows an excellent agreement with the histogram of recalculated values, while the agreement attained by the Hermite correction factor is much less satisfactory, even for the (longest available to us) fiveterm expansion. It must be pointed out that (i) the inadequacy of `short' correction factors, in the example shown, is due to the large deviation from the ideal behaviour and (ii) the number of terms used there in the Fourier summation is twenty, whereafter the summation is terminated. Obviously, the computation of twenty (or more) Fourier coefficients is easier than that of five terms in the correction factor. The convergence of the Fourier series is very satisfactory. It appears that the (analytically) exact Fourier approach is the preferred one in cases of large or intermediate deviations, while the correctionfactor approach may cope well with small ones. As far as the availability of symmetrydependent centric and acentric p.d.f.'s is concerned, correction factors are available for all the space groups (see Table 2.1.7.1), while Fourier coefficients of p.d.f.'s are available for the first 206 space groups (see Table 2.1.8.1). It should be pointed out that p.d.f.'s based on the correctionfactor method cope very well with cubic symmetries higher than , even if the asymmetric unit of the space group is strongly heterogeneous (Rabinovich et al., 1991b).
Both approaches described in this section are related to the characteristic function of the required p.d.f. The correctionfactor p.d.f.'s (2.1.7.5) and (2.1.7.6) can be obtained by expanding the logarithm of the appropriate characteristic function in a series of cumulants [e.g. equation (2.1.4.13); see also Shmueli & Wilson (1982)], truncating the series and performing its termbyterm Fourier inversion. The Fourier p.d.f., on the other hand, is computed by forming a Fourier series whose coefficients are exact analytical forms of the characteristic function at points related to the summation indices [e.g. equations (2.1.8.5), (2.1.8.9) and (2.1.8.11), and Table 2.1.8.1] and truncating the series when the terms become small enough.
References
Abramowitz, M. & Stegun, I. A. (1972). Handbook of mathematical functions. New York: Dover.Barakat, R. (1974). Firstorder statistics of combined random sinusoidal waves with application to laser speckle patterns. Opt. Acta, 21, 903–921.
Bernstein, S. (1922). Sur la théorème limite du calcul des probabilités. Math. Ann. 85, 237–241.
Bernstein, S. (1927). Sur l'extension du théorème limite du calcul des probabilités aux sommes de quantités dépendantes. Math. Ann. 97, 1–59.
Cramér, H. (1951). Mathematical methods of statistics. Princeton University Press.
Faggiani, R., Lippert, B. & Lock, C. J. L. (1980). Heavy transition metal complexes of biologically important molecules. 4. Crystal and molecular structure of pentahydroxonium chloro(uracilatoN(1))(ethylenediamine)platinum(II)chloride (H_{5}O_{2})[PtCl(NH_{2}CH_{2}CH_{2}NH_{2})(C_{4}H_{5}N_{2}O_{2})]Cl, and chloro(thyminatoN(1))(ethylenediamine)platinum(II), PtCl(NH_{2}CH_{2}CH_{2}NH_{2})(C_{5}H_{5}N_{2}O_{2}). Inorg. Chem. 19, 295–300.
French, S. & Wilson, K. (1978). On the treatment of negative intensity observations. Acta Cryst. A34, 517–525.
Gerhard, O. E. (1993). Lineofsight velocity profiles in spherical galaxies: breaking the degeneracy between anisotropy and mass. Mon. Not. R. Astron. Soc. 265, 213–230.
Giacovazzo, C. (1977). On different probabilistic approaches to quartet theory. Acta Cryst. A33, 50–54.
Giacovazzo, C. (1980). Direct methods in crystallography. London: Academic Press.
Harker, D. (1953). The meaning of the average of for large values of interplanar spacing. Acta Cryst. 6, 731–736.
Hauptman, H. & Karle, J. (1953). Solution of the phase problem. I. The centrosymmetric crystal. Am. Crystallogr. Assoc. Monograph No. 3. Dayton, Ohio: Polycrystal Book Service.
Howells, E. R., Phillips, D. C. & Rogers, D. (1950). The probability distribution of Xray intensities. II. Experimental investigation and the Xray detection of centers of symmetry. Acta Cryst. 3, 210–214.
International Tables for Crystallography (2005). Vol. A. Spacegroup symmetry, edited by Th. Hahn. Heidelberg: Springer.
International Tables for Crystallography (2004). Vol. C. Mathematical, physical and chemical tables, edited by E. Prince. Dordrecht: Kluwer Academic Publishers.
Kendall, M. & Stuart, A. (1977). The advanced theory of statistics, Vol. 1, 4th ed. London: Griffin.
Lipson, H. & Woolfson, M. M. (1952). An extension of the use of intensity statistics. Acta Cryst. 5, 680–682.
Lomer, T. R. & Wilson, A. J. C. (1975). Scaling of intensities. Acta Cryst. B31, 646–647.
Marel, R. P. van der & Franx, M. (1993). A new method for the identification of nonGaussian line profiles in elliptical galaxies. Astrophys. J. 407, 525–539.
MyllerLebedeff, W. (1907). Die Theorie der Integralgleichungen in Anwendung auf einige Reihenentwicklungen. Math. Ann. 64, 388–416.
Nigam, G. D. (1972). On the compensation of Xray intensity. Indian J. Pure Appl. Phys. 10, 655–656.
Nigam, G. D. & Wilson, A. J. C. (1980). Compensation of excess intensity in space group P2. Acta Cryst. A36, 832–833.
Rabinovich, D. & Shakked, Z. (1984). A new approach to structure determination of large molecules by multidimensional search methods. Acta Cryst. A40, 195–200.
Rabinovich, S., Shmueli, U., Stein, Z., Shashua, R. & Weiss, G. H. (1991a). Exact randomwalk models in crystallographic statistics. VI. P.d.f.'s of for all plane groups and most space groups. Acta Cryst. A47, 328–335.
Rabinovich, S., Shmueli, U., Stein, Z., Shashua, R. & Weiss, G. H. (1991b). Exact randomwalk models in crystallographic statistics. VII. An allspacegroup study of the effects of atomic heterogeneity on the p.d.f.'s of . Acta Cryst. A47, 336–340.
Rayleigh, Lord (1879). Investigations in optics with special reference to the spectroscope. Philos. Mag. 8, 261–274.
Rogers, D. (1950). The probability distribution of Xray intensities. IV. New methods of determining crystal classes and space groups. Acta Cryst. 3, 455–464.
Rogers, D. & Wilson, A. J. C. (1953). The probability distribution of Xray intensities. V. A note on some hypersymmetric distributions. Acta Cryst. 6, 439–449.
Shmueli, U. (1979). Symmetry and compositiondependent cumulative distributions of the normalized structure amplitude for use in intensity statistics. Acta Cryst. A35, 282–286.
Shmueli, U. (1982). A study of generalized intensity statistics: extension of the theory and practical examples. Acta Cryst. A38, 362–371.
Shmueli, U. & Kaldor, U. (1981). Calculation of even moments of the trigonometric structure factor. Methods and results. Acta Cryst. A37, 76–80.
Shmueli, U. & Kaldor, U. (1983). Moments of the trigonometric structure factor. Acta Cryst. A39, 615–621.
Shmueli, U., Rabinovich, S. & Weiss, G. H. (1989). Exact conditional distribution of a threephase invariant in the space group P1. I. Derivation and simplification of the Fourier series. Acta Cryst. A45, 361–367.
Shmueli, U., Rabinovich, S. & Weiss, G. H. (1990). Exact randomwalk models in crystallographic statistics. V. Nonsymmetrically bounded distributions of structurefactor magnitudes. Acta Cryst. A46, 241–246.
Shmueli, U. & Weiss, G. H. (1985a). Centric, bicentric and partially bicentric intensity statistics. Structure and statistics in crystallography, edited by A. J. C. Wilson, pp. 53–66. Guilderland: Adenine Press.
Shmueli, U. & Weiss, G. H. (1985b). Exact joint probability distributions for centrosymmetric structure factors. Derivation and application to the Σ_{1} relationship in the space group . Acta Cryst. A41, 401–408.
Shmueli, U. & Weiss, G. H. (1986). Exact joint distribution of E_{h} , E_{k} and E_{h+k}, and the probability for the positive sign of the triple product in the space group . Acta Cryst. A42, 240–246.
Shmueli, U. & Weiss, G. H. (1987). Exact randomwalk models in crystallographic statistics. III. Distributions of for space groups of low symmetry. Acta Cryst. A43, 93–98.
Shmueli, U. & Weiss, G. H. (1988). Exact randomwalk models in crystallographic statistics. IV. P.d.f.'s of allowing for atoms in special positions. Acta Cryst. A44, 413–417.
Shmueli, U., Weiss, G. H. & Kiefer, J. E. (1985). Exact randomwalk models in crystallographic statistics. II. The bicentric distribution in the space group . Acta Cryst. A41, 55–59.
Shmueli, U., Weiss, G. H., Kiefer, J. E. & Wilson, A. J. C. (1984). Exact randomwalk models in crystallographic statistics. I. Space groups and . Acta Cryst. A40, 651–660.
Shmueli, U., Weiss, G. H. & Wilson, A. J. C. (1989). Explicit Fourier representations of nonideal hypercentric p.d.f.'s of . Acta Cryst. A45, 213–217.
Shmueli, U. & Wilson, A. J. C. (1981). Effects of spacegroup symmetry and atomic heterogeneity on intensity statistics. Acta Cryst. A37, 342–353.
Shmueli, U. & Wilson, A. J. C. (1982). Intensity statistics: nonideal distributions in theory and practice. In Crystallographic statistics: progress and problems, edited by S. Ramaseshan, M. F. Richardson & A. J. C. Wilson, pp. 83–97. Bangalore: Indian Academy of Sciences.
Shmueli, U. & Wilson, A. J. C. (1983). Generalized intensity statistics: the subcentric distribution and effects of dispersion. Acta Cryst. A39, 225–233.
Spiegel, M. R. (1974). Theory and problems of Fourier analysis. Schaum's Outline Series. New York: McGrawHill.
Srinivasan, R. & Parthasarathy, S. (1976). Some statistical applications in Xray crystallography. Oxford: Pergamon Press.
Stuart, A. & Ord, K. (1994). Kendall's advanced theory of statistics. Vol. 1. Distribution theory, 6th ed. London: Edward Arnold.
Weiss, G. H. & Kiefer, J. E. (1983). The Pearson random walk with unequal step sizes. J. Phys. A, 16, 489–495.
Wilson, A. J. C. (1942). Determination of absolute from relative intensity data. Nature (London), 150, 151–152.
Wilson, A. J. C. (1949). The probability distribution of Xray intensities. Acta Cryst. 2, 318–320.
Wilson, A. J. C. (1950). The probability distribution of Xray intensities. III. Effects of symmetry elements on zones and rows. Acta Cryst. 3, 258–261.
Wilson, A. J. C. (1952). Hypercentric and hyperparallel distributions of Xray intensities. Research (London), 5, 588–589.
Wilson, A. J. C. (1956). The probability distribution of Xray intensities. VII. Some sesquicentric distributions. Acta Cryst. 9, 143–144.
Wilson, A. J. C. (1964). The probability distribution of Xray intensities. VIII. A note on compensation for excess average intensity. Acta Cryst. 17, 1591–1592.
Wilson, A. J. C. (1975). Effect of neglect of dispersion on apparent scale and temperature factors. In Anomalous scattering, edited by S. Ramaseshan & S. C. Abrahams, pp. 325–332. Copenhagen: Munksgaard.
Wilson, A. J. C. (1976). Statistical bias in leastsquares refinement. Acta Cryst. A32, 994–996.
Wilson, A. J. C. (1978a). On the probability of measuring the intensity of a reflection as negative. Acta Cryst. A34, 474–475.
Wilson, A. J. C. (1978b). Variance of Xray intensities: effect of dispersion and higher symmetries. Acta Cryst. A34, 986–994.
Wilson, A. J. C. (1978c). Statistical bias in scaling factors: Erratum. Acta Cryst. B34, 1749.
Wilson, A. J. C. (1979). Problems of resolution and bias in the experimental determination of the electron density and other densities in crystals. Acta Cryst. A35, 122–130.
Wilson, A. J. C. (1980a). Relationship between `observed' and `true' intensity: effects of various counting modes. Acta Cryst. A36, 929–936.
Wilson, A. J. C. (1980b). Effect of dispersion on the probability distribution of Xray reflections. Acta Cryst. A36, 945–946.
Wilson, A. J. C. (1981). Can intensity statistics accommodate stereochemistry? Acta Cryst. A37, 808–810.
Wilson, A. J. C. (1986a). Distributions of sums and ratios of sums of intensities. Acta Cryst. A42, 334–339.
Wilson, A. J. C. (1986b). Fourier versus Hermite representations of probability distributions. Acta Cryst. A42, 81–83.
Wilson, A. J. C. (1987a). Treatment of enhanced zones and rows in normalizing intensities. Acta Cryst. A43, 250–252.
Wilson, A. J. C. (1987b). Functional form of the ideal hypersymmetric distributions of structure factors. Acta Cryst. A43, 554–556.
Wilson, A. J. C. (1993). Space groups rare for organic structures. III. Symmorphism and inherent symmetry. Acta Cryst. A49, 795–806.