International
Tables for Crystallography Volume C Mathematical, physical and chemical tables Edited by E. Prince © International Union of Crystallography 2006 
International Tables for Crystallography (2006). Vol. C, ch. 8.4, pp. 702706
https://doi.org/10.1107/97809553602060000612 Chapter 8.4. Statistical significance testsChapter 8.4 introduces the χ^{2} distribution and shows how it can be used to assess whether a model produced by a leastsquares fit is consistent with the data. The F distribution, the distribution of the ratio of two independent random variables that both have χ^{2} distributions, is derived. The F distribution and two others derived from it, Student's t distribution and Hamilton's Rfactor ratio distribution, can be used to decide whether one model is significantly better than another. The projection matrix is defined and the concept of leverage is introduced. The projection matrix can be used to determine which data points have the most influence in determining particular refined parameters. 
In Chapter 8.1 , we discussed the method of least squares and procedures for estimating the values of the adjustable parameters of a model that predicts the mean of a population from which experimental observations are drawn at random. Any model, however, will have some set of parameter values that gives the best leastsquares fit. We must now address the question of whether that best fit is adequate, that is, whether it is plausible, given the precision of the data, to accept the hypothesis that the model really is a correct representation of the phenomena that have been measured in the collection of the data. In this chapter, we discuss the probability density function for the sum of squared residuals if the individual residuals are drawn from a normal distribution, the χ^{2} distribution, and the conditions under which this p.d.f. may be assumed to approximate a practical case. Next, we discuss the F distribution, which is the distribution of the ratio of two independent, random variables, each of which has a χ^{2} distribution, and its use in comparing the fits of constrained and unconstrained versions of a model. We also discuss a test that is useful for a more general comparison of models. Finally, we discuss the variation among data points of their effectiveness in improving the precision of parameter estimates and the application of this analysis to the optimum design of experiments.
We have seen [equation (8.1.2.1 )] that the leastsquares estimate is derived by finding the minimum value of a sum of terms of the form and, further, that the precision of the estimate is optimized if the weight, , is the reciprocal of the variance of the population from which the observation is drawn, . Using this relation, (8.4.1.1) can be written Each term is the square of a difference between observed and calculated values, expressed as a fraction of the standard uncertainty of the observed value. But, by definition, where x has its unknown `correct' value, so that <R> = 1, and the expected value of the sum of n such terms is n. It can be shown (Draper & Smith, 1981) that each parameter estimated reduces this expected sum by one, so that, for p estimated parameters, where is the leastsquares estimate. The standard uncertainty of an observation of unit weight, also referred to as the goodnessoffit parameter, is defined by From (8.4.1.4), it follows that <G> = 1 for a correct model with weights assigned in accordance with (8.4.1.2).
A value of G that is close to one, if the weights have been assigned by , is an indicator that the model is consistent with the data. It should be noted that it is not necessarily an indicator that the model is `correct', because it does not rule out the existence of an alternative model that fits the data as well or better. An assessment of the adequacy of the fit of a given model depends, however, on what is meant by `close to one', which depends in turn on the spread of a probability density function for G. We saw in Chapter 8.1 that least squares with this weighting scheme would give the best, linear, unbiased estimate of the model parameters, with no restrictions on the p.d.f.s of the populations from which the observations are drawn except for the implicit assumption that the variances of these p.d.f.s are finite. To construct a p.d.f. for G, however, it is necessary to make an assumption about the shapes of the p.d.f.s for the observations. The usual assumption is that these p.d.f.s can be described by the normal p.d.f., The justification for this assumption comes from the centrallimit theorem, which states that, under rather broad conditions, the p.d.f. of the arithmetic mean of n observations drawn from a population with mean μ and variance σ^{2} tends, for large n, to a normal distribution with mean μ and variance . [For a discussion of the central limit theorem, see Cramér (1951).]
If we make the assumption of a normal distribution of errors and make the substitution z = (x − μ)/σ, (8.4.1.6) becomes The probability that will be less than χ^{2} is equal to the probability that z will lie in the interval , or Letting and substituting in (8.4.1.7), this becomes , so that The joint p.d.f. of the squares of two random variables, and , drawn independently from the same population with a normal p.d.f. is and the p.d.f. of the sum, , of these two terms is the integral over the joint p.d.f. of all pairs of and that add up to . This integral can be evaluated by use of the gamma and beta functions. The gamma function is defined for positive real x by Although this function is continuous for all , its value is of interest in the context of this analysis only for x equal to positive, integral multiples of 1/2. It can be shown that Γ(1/2) = , Γ(1) = 1, and Γ(x + 1) = xΓ(x). It follows that, for a positive integer, n, Γ(n) = (n −1)!, and that Γ(3/2) = , Γ(5/2) = , etc. The beta function is defined by It can be shown (Prince, 1994) that . Making the substitution , (8.4.1.12) becomes By a similar procedure, it can be shown that, if χ^{2} is the sum of ν terms, , , , , where all are drawn independently from a population with the p.d.f. given in (8.4.1.10), χ^{2} has the p.d.f. The parameter ν is known as the number of degrees of freedom, but this use of that term must not be confused with the conventional use in physics and chemistry. The p.d.f. in (8.4.1.16) is the chisquared distribution with ν degrees of freedom. Table 8.4.1.1 gives the values of χ^{2}/ν for which the cumulative distribution function (c.d.f.) Ψ(χ^{2}, ν) has various values for various choices of ν. This table is provided to enable verification of computer codes that may be used to generate more extensive tables. It was generated using a program included in the statistical library DATAPAC (Filliben, unpublished). Fortran code for this program appears in Prince (1994).

The quantity (n − p)G is the sum of n terms that have mean value (n − p)/n. Because the process of determining the leastsquares fit establishes p relations among them, however, only (n − p) of the terms are independent. The number of degrees of freedom is therefore ν = (n − p), and, if the model is correct, and the terms have been properly weighted, χ^{2} = (n − p)G^{2} has the chisquared distribution with (n − p) degrees of freedom. In crystallography, the number of degrees of freedom tends to be large, and the p.d.f. for G correspondingly sharp, so that even rather small deviations from G^{2} = 1 should cause one or both of the hypotheses of a correct model and appropriate weights to be rejected. It is common practice to assume that the model is correct, and that the weights have correct relative values, that is that they have been assigned by , where k is a number different from, usually greater than, one. G is then taken to be an estimate of k, and all elements of (A^{T}WA)^{−1} (Section 8.1.2 ) are multiplied by G^{2} to get an estimated variance–covariance matrix. The range of validity of this procedure is limited at best. It is discussed further in Chapter 8.5 .
Consider an unconstrained model with p parameters and a constrained one with q parameters, where . We wish to decide whether the constrained model represents an adequate fit to the data, or if the additional parameters in the unconstrained model provide, in some important sense, a better fit to the data. Provided the (p − q) additional columns of the design matrix, A, are linearly independent of the previous q columns, the sum of squared residuals must be reduced by some finite amount by adjusting the additional parameters, but we must decide whether this improved fit would have occurred purely by chance, or whether it represents additional information.
Let and be the weighted sums of squared residuals for the constrained and unconstrained models, respectively. If the constrained and unconstrained models are equally good representations of the data, and the weights have been assigned by , the expected values of the sums of squares are and , and, further, they should be distributed as χ^{2} with (n − q) and (n − p) degrees of freedom, respectively. Also, , and is distributed as χ^{2} with (p − q) degrees of freedom. and are not independent, but is the squared magnitude of a vector in a (p − q)dimensional subspace that is orthogonal to the (n − p)dimensional space of . Therefore, and are independent, random variables, each with a χ^{2} distribution. Let , , ν_{1} = p − q, and ν_{2} = n − p. The ratio F = should have a value close to one, even if the weights have relative rather than absolute values, but we need a measure of how far away from one this ratio can be before we must reject the hypothesis that the two models are equally good representations of the data. The conditional p.d.f. for F, given a value of , is and the marginal p.d.f. for is The marginal p.d.f. for F is obtained by integration of the joint p.d.f., yielding the result This p.d.f. is known as the F distribution with and degrees of freedom. Table 8.4.2.1 gives the values of F for which the c.d.f. is equal to 0.95 for various choices of ν_{1} and ν_{2}. Fortran code for the program from which the table was generated appears in Prince (1994).

The cumulative distribution function gives the probability that the F ratio will be less than some value by chance if the models are equally consistent with the data. It is therefore a necessary, but not sufficient, condition for concluding that the unconstrained model gives a significantly better fit to the data that be greater than 1 − α, where α is the desired level of significance. For example, if = 0.95, the probability is only 0.05 that a value of F this large or greater would have been observed if the two models were equally good representations of the data.
Hamilton (1964) observed that the F ratio could be expressed in terms of the crystallographic weighted R index, which is defined, for refinement on F (and similarly for refinement on F^{2}), by
Denoting by and the weighted R indices for the constrained and unconstrained models, respectively, and a c.d.f. for can be readily derived from this relation. A significance test based on is known as Hamilton's Rratio test; it is entirely equivalent to a test on the F ratio.
Tests based on F or the R ratio have several limitations. One important one is that they are applicable only when the parameters of one model form a subset of the parameters of the other. Also, the F test makes no distinction between improvement in fit as a result of small improvements throughout the entire data set and a large improvement in a small number of critically sensitive data points. A test that can be used for comparing arbitrary pairs of models, and that focuses attention on those data points that are most sensitive to differences in the models, was introduced by Williams & Kloot (1953; also Himmelblau, 1970; Prince, 1982).
Consider a set of observations, , and two models that predict values for these observations, and , respectively. We determine the slope of the regression line , where , and . Suppose model 1 is a perfect fit to the data, which have been measured with great precision, so that for all i. Under these conditions, λ = +1/2. Similarly, if model 2 is a perfect fit, λ = −1/2. Real experimental data, of course, are subject to random error, and λ in general would be expected to be less than 1/2. A leastsquares estimate of λ is and it has an estimated variance The hypothesis that the two models give equally good fits to the data can be tested by considering to be an unconstrained, oneparameter fit that is to be compared with a constrained, zeroparameter fit for which λ = 0. A p.d.f. for making this comparison can be derived from an F distribution with ν_{1} = 1 and ν_{2} = ν = (n − 1). If we let , and use we can derive a p.d.f. for t, which is This p.d.f. is known as Student's t distribution with ν degrees of freedom. Setting , the c.d.f. Ψ(t, ν) can be used to test the alternative hypotheses λ = 0 and λ = ±1/2. Table 8.4.3.1 gives the values of t for which the c.d.f. Ψ(t, ν) has various values for various values of ν. Fortran code for the program from which this table was generated appears in Prince (1994).

Again, it must be understood that the results of these statistical comparisons do not imply that either model is a correct one. A statistical indication of a good fit says only that, given the model, the experimenter should not be surprised at having observed the data values that were observed. It says nothing about whether the model is plausible in terms of compatibility with the laws of physics and chemistry. Nor does it rule out the existence of other models that describe the data as well as or better than any of the models tested.
When the method of least squares, or any variant of it, is used to refine a crystal structure, it is implicitly assumed that a model with adjustable parameters makes an unbiased prediction of the experimental observations for some (a priori unknown) set of values of those parameters. The existence of any reflection whose observed intensity is inconsistent with this assumption, that is that it differs from the predicted value by an amount that cannot be reconciled with the precision of the measurement, must cause the model to be rejected, or at least modified. In making precise estimates of the values of the unknown parameters, however, different reflections do not all carry the same amount of information (Shoemaker, 1968; Prince & Nicholson, 1985). For an obvious example, consider a spacegroup systematic absence. Except for possible effects of multiple diffraction or twinning, any observed intensity at a position corresponding to a systematic absence is proof that the screw axis or glide plane is not present. If no intensity is observed for any such reflection, however, any parameter values that conform to the space group are equally acceptable. It is to be expected, on the other hand, that some intensities will be extremely sensitive to small changes in some parameter, and that careful measurement of those intensities will lead to correspondingly precise estimates of the parameter values. For the purpose of precise structure refinement, it is useful to be able to identify the influential reflections.
Consider a vector of observations, y, and a model M(x). The elements of y define an ndimension space, and the model values, M_{i}(x), define a pdimensional subspace within it. The leastsquares solution [equation (8.1.2.7 )], is such that is the closest point to y that corresponds to some possible value of x. In (8.4.4.1), W = V^{−1} is the inverse of the variance–covariance matrix for the joint p.d.f. of the elements of y, and is a point in the pdimensional subspace close enough to so that the linear approximation [where ] is a good one. Let R be the Cholesky factor of W, so that , and let Z = RA, , and . The leastsquares estimate may then be written and Thus, the matrix P = Z(Z^{T}Z)Z^{T}, the projection matrix, is a linear relation between the observed data values and the corresponding calculated values. (Because , the matrix P is frequently referred to in the statistical literature as the hat matrix.) P^{2} = Z(Z^{T}Z)^{− 1}Z^{T}Z(Z^{T}Z)^{−1}Z^{T} = Z(Z^{T}Z)^{−1}Z^{T} = P, so that P is idempotent. P is an n × n positive semidefinite matrix with rank p, and its eigenvalues are either 1 (p times) or 0 (n − p times). Its diagonal elements lie in the range , and the trace of P is p, so that the average value of is p/n. Furthermore, A diagonal element of P is a measure of the influence that an observation has on its own calculated value. If is close to one, the model is forced to fit the ith data point, which puts a constraint on the value of the corresponding function of the parameters. A very small value of , because of (8.4.4.5), implies that all elements of the row must be small, and that observation has little influence on its own or any other calculated value. Because it is a measure of influence on the fit, is sometimes referred to as the leverage of the ith observation. Note that, because , the variance–covariance matrix for the elements of , is the variance–covariance matrix for , whose elements are functions of the elements of . A large value of means that is poorly defined by the elements of , which implies in turn that some elements of must be precisely defined by a precise measurement of .
It is apparent that, in a real experiment, there will be appreciable variation among observations in their leverage. It can be shown (Fedorov, 1972; Prince & Nicholson, 1985) that the observations with the greatest leverage also have the largest effect on the volume of the pdimensional confidence region for the parameter estimates. Because this volume is a rather gross measure, however, it is useful to have a measure of the influence of individual observations on individual parameters. Let be the variance–covariance matrix for a refinement including n observations, and let z be a row vector whose elements are z_{j} = σ for an additional observation. , the variance–covariance matrix with the additional observation included, is, by definition, which, in the linear approximation, can be shown to be The diagonal elements of the rank one matrix D = V_{n}z^{T}zV_{n}/(1 + zV_{n}z^{T}) are therefore the amounts that the variances of the estimates of individual parameters will be reduced by inclusion of the additional observation.
This result depends on the elements of Z and z not changing significantly in the (presumably small) shift from to . That this condition is satisfied may be verified by the following procedure. Find an approximation to by a line search along the line , and then evaluate B, a quasiNewton update such as the BFGS update (Subsection 8.1.4.3 ) at that point. If α = 1, and the gradient of the sum of squares vanishes, then the linear approximation is exact, and B is null. If for all i and j, then (8.4.4.7) can be expected to be an excellent approximation for a nonlinear model.
References
Cramér, H. (1951). Mathematical methods of statistics. Princeton, NJ: Princeton University Press.Draper, N. & Smith, H. (1981). Applied regression analysis. New York: John Wiley.
Fedorov, V. V. (1972). Theory of optimal experiments, translated by W. J. Studden & E. M. Klimko. New York: Academic Press.
Hamilton, W. C. (1964). Statistics in physical science: estimation, hypothesis testing and least squares. New York: Ronald Press.
Himmelblau, D. M. (1970). Process analysis by statistical methods. New York: John Wiley.
Prince, E. (1982). Comparison of the fits of two models to the same data set. Acta Cryst. B38, 1099–1100.
Prince, E. (1994). Mathematical techniques in crystallography and materials science, 2nd ed. Berlin/Heidelberg/New York/London/Paris/Tokyo/Hong Kong/Barcelona/Budapest: SpringerVerlag.
Prince, E. & Nicholson, W. L. (1985). Influence of individual reflections on the precision of parameter estimates in least squares refinement. Structure and statistics in crystallography, edited by A. J. C. Wilson, pp. 183–195. Guilderland, NY: Adenine Press.
Shoemaker, D. P. (1968). Optimization of counting time in computer controlled Xray and neutron singlecrystal diffractometry. Acta Cryst. A24, 136–142.
Williams, E. J. & Kloot, N. H. (1953). Interpolation in a series of correlated observations. Aust. J. Appl. Sci. 4, 1–17.