International
Tables for Crystallography Volume C Mathematical, physical and chemical tables Edited by E. Prince © International Union of Crystallography 2006 
International Tables for Crystallography (2006). Vol. C, ch. 8.4, pp. 702703

We have seen [equation (8.1.2.1 )] that the leastsquares estimate is derived by finding the minimum value of a sum of terms of the form and, further, that the precision of the estimate is optimized if the weight, , is the reciprocal of the variance of the population from which the observation is drawn, . Using this relation, (8.4.1.1) can be written Each term is the square of a difference between observed and calculated values, expressed as a fraction of the standard uncertainty of the observed value. But, by definition, where x has its unknown `correct' value, so that <R> = 1, and the expected value of the sum of n such terms is n. It can be shown (Draper & Smith, 1981) that each parameter estimated reduces this expected sum by one, so that, for p estimated parameters, where is the leastsquares estimate. The standard uncertainty of an observation of unit weight, also referred to as the goodnessoffit parameter, is defined by From (8.4.1.4), it follows that <G> = 1 for a correct model with weights assigned in accordance with (8.4.1.2).
A value of G that is close to one, if the weights have been assigned by , is an indicator that the model is consistent with the data. It should be noted that it is not necessarily an indicator that the model is `correct', because it does not rule out the existence of an alternative model that fits the data as well or better. An assessment of the adequacy of the fit of a given model depends, however, on what is meant by `close to one', which depends in turn on the spread of a probability density function for G. We saw in Chapter 8.1 that least squares with this weighting scheme would give the best, linear, unbiased estimate of the model parameters, with no restrictions on the p.d.f.s of the populations from which the observations are drawn except for the implicit assumption that the variances of these p.d.f.s are finite. To construct a p.d.f. for G, however, it is necessary to make an assumption about the shapes of the p.d.f.s for the observations. The usual assumption is that these p.d.f.s can be described by the normal p.d.f., The justification for this assumption comes from the centrallimit theorem, which states that, under rather broad conditions, the p.d.f. of the arithmetic mean of n observations drawn from a population with mean μ and variance σ^{2} tends, for large n, to a normal distribution with mean μ and variance . [For a discussion of the central limit theorem, see Cramér (1951).]
If we make the assumption of a normal distribution of errors and make the substitution z = (x − μ)/σ, (8.4.1.6) becomes The probability that will be less than χ^{2} is equal to the probability that z will lie in the interval , or Letting and substituting in (8.4.1.7), this becomes , so that The joint p.d.f. of the squares of two random variables, and , drawn independently from the same population with a normal p.d.f. is and the p.d.f. of the sum, , of these two terms is the integral over the joint p.d.f. of all pairs of and that add up to . This integral can be evaluated by use of the gamma and beta functions. The gamma function is defined for positive real x by Although this function is continuous for all , its value is of interest in the context of this analysis only for x equal to positive, integral multiples of 1/2. It can be shown that Γ(1/2) = , Γ(1) = 1, and Γ(x + 1) = xΓ(x). It follows that, for a positive integer, n, Γ(n) = (n −1)!, and that Γ(3/2) = , Γ(5/2) = , etc. The beta function is defined by It can be shown (Prince, 1994) that . Making the substitution , (8.4.1.12) becomes By a similar procedure, it can be shown that, if χ^{2} is the sum of ν terms, , , , , where all are drawn independently from a population with the p.d.f. given in (8.4.1.10), χ^{2} has the p.d.f. The parameter ν is known as the number of degrees of freedom, but this use of that term must not be confused with the conventional use in physics and chemistry. The p.d.f. in (8.4.1.16) is the chisquared distribution with ν degrees of freedom. Table 8.4.1.1 gives the values of χ^{2}/ν for which the cumulative distribution function (c.d.f.) Ψ(χ^{2}, ν) has various values for various choices of ν. This table is provided to enable verification of computer codes that may be used to generate more extensive tables. It was generated using a program included in the statistical library DATAPAC (Filliben, unpublished). Fortran code for this program appears in Prince (1994).

The quantity (n − p)G is the sum of n terms that have mean value (n − p)/n. Because the process of determining the leastsquares fit establishes p relations among them, however, only (n − p) of the terms are independent. The number of degrees of freedom is therefore ν = (n − p), and, if the model is correct, and the terms have been properly weighted, χ^{2} = (n − p)G^{2} has the chisquared distribution with (n − p) degrees of freedom. In crystallography, the number of degrees of freedom tends to be large, and the p.d.f. for G correspondingly sharp, so that even rather small deviations from G^{2} = 1 should cause one or both of the hypotheses of a correct model and appropriate weights to be rejected. It is common practice to assume that the model is correct, and that the weights have correct relative values, that is that they have been assigned by , where k is a number different from, usually greater than, one. G is then taken to be an estimate of k, and all elements of (A^{T}WA)^{−1} (Section 8.1.2 ) are multiplied by G^{2} to get an estimated variance–covariance matrix. The range of validity of this procedure is limited at best. It is discussed further in Chapter 8.5 .
References
Cramér, H. (1951). Mathematical methods of statistics. Princeton, NJ: Princeton University Press.Draper, N. & Smith, H. (1981). Applied regression analysis. New York: John Wiley.
Prince, E. (1994). Mathematical techniques in crystallography and materials science, 2nd ed. Berlin/Heidelberg/New York/London/Paris/Tokyo/Hong Kong/Barcelona/Budapest: SpringerVerlag.