International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 
International Tables for Crystallography (2006). Vol. F, ch. 11.3, pp. 218225
doi: 10.1107/97809553602060000676 Chapter 11.3. Integration, scaling, spacegroup assignment and post refinement^{a}MaxPlanckInstitut für medizinische Forschung, Abteilung Biophysik, Jahnstrasse 29, 69120 Heidelberg, Germany The key steps in the processing of diffraction data from single crystals are described. The topics covered include: the modelling of the positions of all the reflections recorded in the images; the integration of diffraction intensities; data correction, scaling and post refinement; and spacegroup assignment. The principles of the methods are described as they are employed by the program XDS (Section 25.2.9 ). 
Key steps in the processing of diffraction data from single crystals involve: (a) accurate modelling of the positions of all the reflections recorded in the images; (b) integration of diffraction intensities; (c) data correction, scaling and post refinement; and (d) spacegroup assignment. Much of the theory and many of the methods for carrying out these steps were developed about two decades ago for processing rotation data recorded on film and were later extended to exploit fully the capabilities of a variety of electronic area detectors; some CCD (chargecoupled device) and multiwire detectors allow the recording of finely sliced rotation data because of their fast data readout. In this chapter, the principles of the methods are described as they are employed by the program XDS (Section 25.2.9 ). These apply equally well to rotation images covering small or large oscillation ranges. A large number of other systems have been developed which differ in the details of the implementations. Some of these packages are described in Chapter 25.2 . The theory and practice of processing finesliced data have recently been discussed by Pflugrath (1997).
The observed diffraction pattern, i.e., the positions of the reflections recorded in the rotationdata images, is controlled by a small set of parameters which must be accurately determined before integration can start. Approximate values for some of these parameters are given by the experimental setup, whereas others may be completely unknown and must be obtained from the rotation images. This is achieved by automatic location of strong diffraction spots, extraction of a primitive lattice basis that yields integer indices for the observed reflections, and subsequent refinement of all parameters to minimize the discrepancies between observed and calculated spot positions in the data images.
In the rotation method, the incident beam wave vector of length (λ is the wavelength) is fixed while the crystal is rotated around a fixed axis described by a unit vector . points from the Xray source towards the crystal. It is assumed that the incident beam and the rotation axis intersect at one point at which the crystal must be located. This point is defined as the origin of a righthanded orthonormal laboratory coordinate system . This fixed but otherwise arbitrary system is used as a reference frame to specify the setup of the diffraction experiment.
Diffraction data are assumed to be recorded on a fixed planar detector. A righthanded orthonormal detector coordinate system is defined such that a point with coordinates X, Y in the detector plane is represented by the vector with respect to the laboratory coordinate system. The origin of the detector plane is found at a distance from the crystal position. It is assumed that the diffraction data are recorded on adjacent nonoverlapping rotation images, each covering a constant oscillation range with image No. 1 starting at spindle angle .
Diffraction geometry is conveniently expressed with respect to a righthanded orthonormal goniostat system . It is constructed from the rotation axis and the incident beam direction such that and . The origin of the goniostat system is defined to coincide with the origin of the laboratory system.
Finally, a righthanded crystal coordinate system and its reciprocal basis are defined to represent the unrotated crystal, i.e., at rotation angle , such that any reciprocallattice vector can be expressed as where are integers.
Using a Gaussian model, the shape of the diffraction spots is specified by two parameters: the standard deviations of the reflecting range and the beam divergence (see Section 11.3.2.3). This leads to an integration region around the spot defined by the parameters and , which are typically chosen to be 6–10 times larger than and , respectively.
Knowledge of the parameters , , , , , , , , , , and is sufficient to compute the location of all diffraction peaks recorded in the data images. Determination and refinement of these parameters are described in the following sections.
It is assumed here that accurate values of all parameters describing the diffraction experiment are available, permitting prediction of the positions of all diffraction peaks recorded in the data images. Let denote any arbitrary reciprocallattice vector if the crystal has not been rotated, i.e., at rotation angle . can be expressed by its components with respect to the orthonormal goniostat system as Depending on the diffraction geometry, may be rotated into a position fulfilling the reflecting condition. The required rotation angle ϕ and the coordinates X, Y of the diffracted beam at its intersection with the detector plane can be found from as follows.
Rotation by ϕ around axis changes into . The incident and diffracted beam wave vectors, and S, have their termini on the Ewald sphere and satisfy the Laue equations If denotes the distance of from the rotation axis, solutions for and ϕ can be obtained in terms of as In general, there are two solutions according to the sign of . If or , the Laue equations have no solution and the reciprocallattice point is in the `blind' region.
If , the diffracted beam intersects the detector plane at the point which leads to a diffraction spot recorded at detector coordinates
A reciprocallattice point crosses the Ewald sphere by the shortest route only if the crystal happens to be rotated about an axis perpendicular to both the diffracted and incident beam wave vectors, the `βaxis' , as introduced by Schutt & Winkler (1977). Rotation around the fixed axis , as enforced by the rotation camera, thus leads to an increase in the length of the shortest path by the factor . This has motivated the introduction of a coordinate system , specific for each reflection, which has its origin on the surface of the Ewald sphere at the terminus of the diffracted beam wave vector S, The unit vectors and are tangential to the Ewald sphere, while is perpendicular to and . The shape of a reflection, as represented with respect to , then no longer contains geometrical distortions resulting from the fixed rotation axis of the camera and the oblique incidence of the diffracted beam on a flat detector. Instead, all reflections appear as if they had followed the shortest path through the Ewald sphere and had been recorded on the surface of the sphere.
A detector pixel at X′, Y′ in the neighbourhood of the reflection centre X, Y, when the crystal is rotated by ϕ′ instead of ϕ, is mapped to the profile coordinates by the following procedure: ζ corrects for the increased path length of the reflection through the Ewald sphere and is closely related to the reciprocal Lorentz correction factor
Because of crystal mosaicity and beam divergence, the intensity of a reflection is smeared around the diffraction maximum. The fraction of total reflection intensity found in the volume element at can be approximated by Gaussian functions:
The intensity of a reflection can be completely recorded on one image, or distributed among several adjacent images. The fraction of total intensity recorded on image j, the `partiality' of the reflection, can be derived from the distribution function as The integral is evaluated by using a numerical approximation of the error function, erf (Abramowitz & Stegun, 1972).
While the spot centroids in the detector plane are usually good estimates for the detector position of the diffraction maximum, the angular centroid about the rotation axis, can be a rather poor guess for the true ϕ angle of the maximum. Its accuracy depends strongly on the value of ϕ and the size of the oscillation range relative to the mosaicity of the crystal. For a reflection fully recorded on image j, the value will always be obtained, which is correct only if ϕ accidentally happens to be close to the centre of the rotation range of the image. In contrast, the ϕ angle of a partial reflection recorded on images j and is closely approximated by . If many images contribute to the spot intensity, is always an excellent approximation to the ideal angular position ϕ when the Laue equations are satisfied; in fact, in the limiting case of infinitely finesliced data, it can be shown that .
Most refinement routines minimize the discrepancies between the predicted ϕ angles and their approximations obtained from the observed Z centroids, and must therefore carefully distinguish between fully and partially recorded reflections. This distinction is unnecessary, however, if observed Z centroids are compared with their analytic forms instead, because the sensitivity of the centroid positions to the diffraction parameters is correctly weighted in either case (see Section 11.3.2.8).
Recognition and refinement of the parameter values controlling the observed diffraction pattern begins with the extraction of a list of coordinates of strong spots occurring in the images. As implemented in XDS, this list is obtained by the following procedure. First, each pixel value is compared with the mean value and standard deviation of surrounding pixels in the same image and classified as a strong pixel if its value exceeds the mean by a given multiple (typically 3 to 5) of the standard deviation. Values of the strong pixels and their location addresses and image running numbers are stored in a hash table during spot search [for a discussion of the hash technique, see Wirth (1976)]. After processing a fixed number of images, or when the table is full, all strong pixels are labelled by a unique number identifying the spot to which they belong. By definition, any two such pixels which can be connected by direct strong neighbours in two or three dimensions (if there are adjacent images) belong to the same spot (equivalence class). The labelling is achieved by the highly efficient algorithm for the recording of equivalence classes developed by Rem (see Dijkstra, 1976). At the end of this procedure, the table is searched for spots that have no contributing strong pixel on the current or the previous image. These spots are complete and their centroids are evaluated and saved in a file. To make room for new strong pixels as the spot search proceeds, all entries of strong pixels that are no longer needed are removed from the hash table and the remaining ones are rehashed. On termination, a list of the centroids of strong spots is available.
Any reciprocallattice vector can be written in the form where h, k, l are integer numbers and are basis vectors of the lattice. The basis vectors which describe the orientation, metric and symmetry of the crystal, as well as the reflection indices h, k, l, have to be determined from the list of strong diffraction spots . Ideally, each spot corresponds to a reciprocallattice vector which satisfies the Laue equations after a crystal rotation by ϕ. Substituting the observed value Z′ for the unknown ϕ angle (see Section 11.3.2.4), is found from the observed spot coordinates as Unfortunately, the reciprocallattice vectors derived from the above list of strong diffraction spots often contain a number of `aliens' (spots arising from fluctuations of the background, from ice, or from satellite crystals) and a robust method has to be used which is still capable of recognizing the dominant lattice. One approach, suggested by Bricogne (1986) and implemented in a number of variants (Otwinowski & Minor, 1997; Steller et al., 1997), is to identify a lattice basis as the three shortest linear independent vectors , each at a maximum of the Fourier transform . Alternatively, a reciprocal basis for the dominant lattice can be determined from short differences between the reciprocallattice vectors (Howard, 1986; Kabsch, 1988a). As implemented in XDS, a lattice basis is found by the following procedure.
The list of given reciprocallattice points is first reduced to a small number m of lowresolution differencevector clusters . is the population of a differencevector cluster , that is the number of times the difference between any two reciprocallattice vectors is approximately equal to . In a second step, three linear independent vectors are selected among all possible triplets of differencevector clusters that maximize the function Q: The absolute maximum of Q is assumed if all difference vectors can be expressed as small integral multiples of the best triplet. Deviations from this ideal situation are quantified by the quality measure q. The value of q declines sharply if the expansion coefficients deviate by more than from their nearest integers or if the indices are absolutely larger than δ. The constraint on the allowed range of indices prevents the selection of a spurious triplet of very short difference vector clusters which might be present in the set. Excellent results have been obtained using and . The best vector triplet thus found is refined against the observed differencevector clusters. Finally, a reduced cell is derived from the refined reciprocalbase vector triplet as defined in IT A (2005), Chapter 9.2 .
Once a basis of the lattice is available, integral indices must be assigned to each reciprocallattice vector . Using the integers nearest to as indices of the reciprocallattice vectors could easily lead to a misindexing of longer vectors because of inaccuracies in the basis vectors and the initial values of the parameters describing the instrumental setup. A more robust solution of the indexing problem is provided by the local indexing method which assigns only small index differences between pairs of neighbouring reciprocallattice vectors (Kabsch, 1993).
The reciprocallattice points can be considered as the nodes of a tree. The tree connects the n points to each other with the connections as its branches. The length of a possible branch between nodes i and j is defined here as Reliable index differences are indicated by short branches; in fact, is 0 if none of the indices is absolutely larger than δ and the are integer values to within . Typical values of and δ are and . Defining the length of a tree as the sum of the lengths of its branches, a shortest tree among all possible trees is determined by the elegant algorithm described by Dijkstra (1976). Starting with arbitrary indices 0, 0, 0 for the root node, the local indexing method then consists of traversing the shortest tree and thereby assigning each node the indices of its predecessor plus the small index differences between the two nodes.
During traversal of the tree, each node is also given a subtree number. Starting with subtree number 1 for the root node, each successor node is given the same subtree number as its predecessor if the length of the connecting branch is below a minimal length . Otherwise its subtree number is incremented by 1. Thus all nodes in the same subtree have internally consistent reflection indices. Defining the size of a subtree by the number of its nodes, aliens are usually found in small subtrees. Finally, a constant index offset is determined such that the centroids of the observed reciprocallattice points belonging to the largest subtree and their corresponding grid vectors are as close as possible. This offset is added to the indices of each reciprocallattice point.
For a fixed detector, the diffraction pattern depends on the parameters and F. Starting values for the parameters can be obtained by the procedures described above that do not rely on prior knowledge of the crystal orientation, spacegroup symmetry or unitcell metric. Better estimates of the parameter values, as required for the subsequent integration step, can be obtained by the method of least squares from the list of n observed indexed reflection centroids , , , , , . In this method, the parameters are chosen to minimize a weighted sum of squares of the residuals The residuals between the calculated and observed spot centroids are
Let denote the k independent parameters for which initial estimates are available. Expanding the residuals to first order in the parameter changes gives The parameters should be changed in such a way as to minimize , which implies for . The are found as the solution of the k normal equations The parameters are corrected by and a new cycle of refinement is started until a minimum of E is reached. The weights are calculated with the current guess for at the beginning of each cycle.
The derivatives appearing in the normal equations can be worked out from the definitions given in Sections 11.3.2.2 and 11.3.2.4, and only the form of the gradient of the Z residuals is shown. Assuming is constant for each reflection, the gradients of the Z residuals are obtained from the chain rule and the relation . Obviously, is small for a fully recorded reflection because of the small values of all exponentials appearing in . In contrast, the gradient for a partial reflection, equally recorded on two adjacent images, is most sensitive to parameter variations because one of the exponentials assumes its maximum value. In the limiting case of infinitely finesliced data, it can be shown that . Thus, the refinement scheme based on observed Z centroids, as described here and implemented in XDS, is applicable to finesliced data – and to data recorded with a large oscillation range as well.
A fundamental requirement for a general integration method is that it should distinguish carefully between signal and background points within its integration domain. For weak reflections, this distinction cannot be made reliably because of the errors superimposed on the signal. The problem can be solved, however, provided that both weak and strong reflections share the same profile shape – an assumption that has been adopted by most dataprocessing packages.
The intensity distribution of a reflection can be modelled analytically or derived from the observed profiles of neighbouring strong spots. For the rotation method, the profile shape depends strongly on the specific path of the reflection through the Ewald sphere and on variations in the angle of incidence of the diffracted beam on a flat detector. These geometrical distortions can be eliminated by mapping the reflections onto the coordinate system defined in Section 11.3.2.3, which simplifies the task of modelling the expected intensity distribution as all reflection profiles become similar.
The region around a spot is defined by the two parameters and , which represent spot diameter and reflecting range, respectively. It is assumed that the coordinates of all image pixels contributing to the intensity of a spot satisfy and when mapped to the profile coordinate system defined in Section 11.3.2.3. Regions of neighbouring reflections may overlap. As implemented in XDS, potential overlap is dealt with by a simple strategy: pixels within the overlap region are assigned to the nearest spot. This is carried out in two steps. First, reflections predicted to occur on a given rotation image are found by generating and testing all possible indices h, k, l up to the highest resolution recorded by the detector. Reflection indices, coordinates of the diffracted beam wave vector and the expected fraction of spot intensity recorded on the image are saved in a table. In the second step, each reflection boundary is traced in the image and corrected to exclude pixels belonging to overlapping reflections, which are rapidly located in the table by the hash technique. The image scaling factor obtained from the mean image background and the neighbourhood pixel values belonging to the reflections recorded in the image are saved on a scratch file dedicated to the currently processed data image.
At regular intervals, these files are merged such that all pixel values belonging to a spot found in the contributing images follow each other. Reflections for which contributing pixels are expected further ahead in data processing are just copied to a scratch output file. The other reflections are mapped to the Ewald sphere, as described below, and their threedimensional profiles and accompanying information are routed to the main output file of the spotextraction step. After the filemerging procedure, spot extraction continues.
The region around a spot is assumed to have been chosen to be large enough to include a sufficient number of pixels which can be used for determination of the background. Background determination, as implemented in XDS, begins by sorting all pixels belonging to a reflection by increasing intensity. For weak or absent reflections, these values should represent a random sample drawn from a normal distribution. If this is not the case, the pixel with the largest intensity is removed until the sampling distribution of the remaining smaller items satisfies the expected distribution. This method will also exclude pixels with unexpectedly high values, such as ice reflections. The background, determined as the mean value of the accepted pixels, is systematically overestimated for strong spots because of some residual intensity extending into the accepted background pixels. This residual intensity is estimated from the expected distribution defined in Section 11.3.2.3 and removed from the final background value.
Reflection profiles are represented on the Ewald sphere within a domain comprising equidistant gridpoints along , respectively. The sampling distances between adjacent grid points are then , . Thus, grid coordinate covers the set of rotation angles Contributions to the spot intensity come from one or several adjacent data images , each covering the set of rotation angles Assuming Gaussian profiles along for all reflections (see Section 11.3.2.3), the fraction of counts (after subtraction of the background) contributed by data frame j to grid coordinate is where . The integrals can be expressed in terms of the error function, for which efficient numerical approximations are available (Abramowitz & Stegun, 1972). Finally, each pixel on data image j belonging to the reflection is subdivided into areas of equal size, and of the pixel signal is added to the profile value at grid coordinates corresponding to each subdivision.
This complicated procedure leads to more uniform intensity profiles for all reflections than using their untransformed shape. This simplifies the task of modelling the expected intensity distribution needed for integration by profile fitting. As implemented in XDS, reference profiles are learnt every 5° of crystal rotation at nine positions on the detector, each covering an equal area of the detector face. In the learning phase, profile boxes of the strong reflections are normalized and added to their nearest reference profile boxes. The contributions are weighted according to the distance from the location of the reference profile. Each grid point within the average profile boxes is classified as signal if it is above 2% of the peak maximum. Finally, each profile is scaled such that the sum of its signal pixels normalizes to one. The analytic expression defined in Section 11.3.2.3 for the expected intensity distribution is only a rough initial approximation which is now replaced by the empirical reference profiles.
If an expected intensity distribution of the observed profile is given in a domain , the reflection intensity I can be estimated as which minimizes the function are background, contents and variance of pixels observed in a subdomain of the expected distribution. The background underneath a diffraction spot is often assumed to be a constant which is estimated from the neighbourhood around the reflection. Determination of reflection intensities by profile fitting has a long tradition (Diamond, 1969; Ford, 1974; Kabsch, 1988b; Otwinowski, 1993). Implementations of the method differ mainly in their assumptions about the variances . Ford uses constant variances, which works well for films, which have a high intrinsic background. In XDS, which was originally designed for a multiwire detector, was assumed, which results in a straight summation of backgroundsubtracted counts within the expected profile region, . This particular simple formula is very satisfactory for the low background typical of these detectors. For the general case, however, better results can be obtained by using for the pixel variances as shown by Otwinowski and implemented in DENZO and in the later version of XDS. Starting with , the intensity is now found by an iterative process which is terminated if the new intensity estimate becomes negative or does not change within a small tolerance, which is usually reached after three cycles. It can be shown that the solution thus obtained is unique.
Usually, many statistically independent observations of symmetryrelated reflections are recorded in the rotation images taken from one or several similar crystals of the same compound. The squared structurefactor amplitudes of equivalent reflections should be equal and the idea of scaling is to exploit this a priori knowledge to determine a correction factor for each observed intensity. These correction factors compensate to some extent for effects such as radiation damage, absorption, and variations in detector sensitivity and exposure times, as well as variations in size and disorder between different crystals.
The usual methods of scaling split the data into batches of roughly the same size, each covering one or more adjacent rotation images, and then determine a single scaling factor for all reflections in each batch. Neighbouring reflections may then receive quite different corrections if they are assigned to different batches. Since the selection of batch boundaries is to some extent arbitrary, a more continuous correction function would be preferable. This function could be modelled analytically (for example by using spherical harmonics) or empirically, as implemented in XSCALE and described below.
For each reflection, observational equations are defined as The subscript h represents the unique reflection indices and l enumerates all symmetryrelated reflections to h. By definition, the unique reflection indices have the largest h, then k, then l value occuring in the set of all indices related by symmetry to the original indices, including Friedel mates. Thus, two reflections are symmetryrelated if and only if their unique indices are identical. is the unknown `true' intensity and , are symmetryrelated observed intensities and their standard deviations, respectively. The subscript α denotes the coordinates at which the scaling function should be evaluated. As implemented in XDS and XSCALE, denotes nine positions uniformly distributed in the detector plane at the beginning of data collection, the same positions on the detector but after the crystal has been rotated by, say, 5°, and so on. The scaling factors and the estimated intensities are found at the minimum of the function
The main difference from the method of Fox & Holmes (1966) is the introduction of the weights . These weights depend upon the distance between each reflection hl and the positions α. They are monotonically decreasing functions of this distance, implemented as Gaussians in XDS and XSCALE. This results in a smoothing of the scaling factors since each reflection contributes to the observational equations in proportion to the weights .
Minimization of Ψ is done iteratively. After each step, the are replaced by and rescaled to a mean value of 1. The corrections are determined from the normal equations where
In case a `true' intensity is available from a reference data set, the nondiagonal elements are omitted from the sum over h in the normal matrix . The corrections are expanded in terms of the eigenvectors of the normal matrix, thereby avoiding shifts along eigenvectors with very small eigenvalues (Diamond, 1966). This filtering method is essential since the normal matrix has zero determinant if no reference data set is available.
The number of fully recorded reflections on each single image rapidly declines for small oscillation ranges and the complete intensities of the partially recorded reflections have to be estimated. This presented a serious obstacle in early structural work on virus crystals, as the crystal had to be replaced after each exposure on account of radiation damage. A solution of this problem, the `post refinement' technique, was found by Schutt, Winkler and Harrison, and variants of this powerful method have been incorporated into most datareduction programs [for a detailed discussion, see Harrison et al. (1985); Rossmann (1985)]. The method derives complete intensities of reflections only partially recorded on an image from accurate estimates for the fractions of observed intensity, the `partiality'. The partiality of each reflection can always be calculated as a function of orientation, unitcell metric, mosaic spread of the crystal and model intensity distributions. Obviously, the accuracy of the estimated full reflection intensity then strongly depends on a precise knowledge of the parameters describing the diffraction experiment. Usually, for many of the partial reflections, symmetryrelated fully recorded ones can be found, and the list of such pairs of intensity observations can be used to refine the required parameters by a leastsquares procedure. Clearly, this refinement is carried out after all images have been processed, which explains why the procedure is called `post refinement'.
Adjustments of the diffraction parameters are determined by minimization of the function E, which is defined as the weighted sum of squared residuals between calculated and observed partial intensites. Here, is the intensity recorded on image j of a partial reflection with indices summarized as is the mean of the observed intensities of all fully recorded reflections symmetryequivalent to is the inverse scaling factor of image is the calculated spindle angle of reflection hj at diffraction and is the computed fraction of total intensity recorded on image j.
Expansion of the residuals to first order in the parameter changes and minimization of leads to the k normal equations Often, the normal matrix is illconditioned, since changes in some unitcell parameters or small rotations of the crystal about the incident Xray beam do not significantly affect the calculated partiality . To take care of these difficulties, the system of equations is rescaled to yield unit diagonal elements for the normal matrix and the correction vector is filtered by projection into a subspace defined by the eigenvectors of the normal matrix with sufficiently large eigenvalues (Diamond, 1966).
The parameters are corrected by the filtered and a new cycle of refinement is started until a minimum of E is reached. The weights, residuals and their gradients are calculated using the current values for and at the beginning of each cycle. The derivatives appearing in the normal equations can be worked out from the definitions given in Sections 11.3.2.2 and 11.3.2.4 (to simplify the following equations, the subscript hj is omitted). The fraction of the total intensity can be expressed in terms of the error function (see Section 11.3.2.4) as Using the relation , the derivatives of are It remains to work out the derivatives , and (not shown here). As discussed in detail by Greenhough & Helliwell (1982), spectral dispersion and asymmetric beam cross fire lead to some variation of , which makes it necessary to include additional parameters in the list . The effect of these parameters on the partiality is dealt with easily by the derivatives .
The refinement scheme described above requires initial scaling factors . With the now improved estimates for the partialities , a new set of scaling factors can be obtained by the method outlined in Section 11.3.4. This alternating procedure of scaling and post refinement usually converges within three cycles.
The use of error functions for modelling partiality, as implicated by a Gaussian model for describing spot shape, was chosen here for reasons of conceptual simplicity and coherence. This choice is unlikely to alter significantly the results of post refinement that are based on other functions of similar form [see the discussion by Rossmann (1985)].
Identification of the correct space group is not always an easy task and should be postponed for as long as possible. Fortunately, all data processing as implemented in the program XDS can be carried out even in the absence of any knowledge of crystal symmetry and cell constants. In this case, a reduced cell is extracted from the observed diffraction pattern and processing of the data images continues to completion as if the crystal were triclinic. Clearly, the reflection indices then refer to the reduced cell and must be reindexed once the space group is known. For all space groups, the required reindexing transformation is linear and involves only whole numbers as shown in Part 9 of IT A. The following description and example are taken from Kabsch (1993).
Spacegroup assignment is carried out in two steps under control of the crystallographer once integrated intensities of all reflections are available. First, the Bravais lattices that are compatible with the observed reduced cell are identified. In the second step, any of the plausible space groups may be tested and rated according to symmetry R factors and systematic absences of integrated reflection intensities after reindexing. Additional acceptance criteria are obtained from refinement, now using a reduced set of independent parameters describing the conventional unit cell which should not lead to a significant increase of r.m.s. deviations between observed and calculated reflection positions and angles.
The determination of possible Bravais lattices is based upon the concept of the reduced cell whose metric parameters characterize 44 lattice types as described in Part 9 of IT A. A primitive basis of a given lattice is defined there as a reduced cell if it is righthanded and if the components of its metric tensor satisfy a number of conditions (inequalities). The main conditions state that the basis vectors are the shortest three linear independent lattice vectors with either all acute or all nonacute angles between them. As specified in IT A, each of the 44 lattice types is characterized by additional equality relations among the six components of the reducedcell metric tensor. As an example, for lattice character 13 (Bravais type oC) the components of the metric tensor of the reduced cell must satisfy
Any primitive triclinic cell describing a given lattice can be converted into a reduced cell. It is well known, however, that the reduced cell thus derived is sensitive to experimental error. Hence, the direct approach of first deriving the correct reduced cell and then finding the lattice type is unstable and may in certain cases even prevent the identification of the correct Bravais lattice.
A suitable solution of the problem has been found that avoids any decision about what the `true' reduced cell is. The essential requirements of this procedure are: (a) a database of possible reduced cells and (b) a backward search strategy that finds the bestfitting cell in the database for each lattice type.
The database is derived from a seed cell which strictly satisfies the definitions for a reduced cell. All cells of the same volume as the seed cell whose basis vectors can be linearly expressed in terms of the seed vectors by indices or +1 are included in the database. Each unit cell in the database is considered as a potential reduced cell even though some of the defining conditions as given in Part 9 of IT A may be violated. These violations are treated as being due to experimental error.
The backward search strategy starts with the hypothesis that the lattice type is already known and identifies the bestfitting cell in the database of possible reduced cells. Contrary to a forward directed search, it is now always possible to decide which conditions have to be satisfied by the components of the metric tensor of the reduced cell. The total amount by which all these equality and inequality conditions are violated is used as a quality index. This measure is defined below for lattice type 13 oC testing a potential reduced cell from the database for agreement. Positive values of the quality index indicate that some conditions are not satisfied. All potential reduced cells in the database are tested and the smallest value for is assigned to lattice type 13. This test is carried out for all 44 possible lattice types using quality indices derived in a similar way from the defining conditions as listed in Part 9 of IT A. For each of the 44 lattice types thus tested, the procedure described here returns the quality index, the conventional cell parameters and a transformation matrix relating original indices with respect to the seed cell to the new indices with respect to the conventional cell. These indextransformation matrices are derived from those given in Table 9.2.5.1 in IT A.
The results obtained by this method are shown in Table 11.3.6.1 for the example of a 1.5° oscillation data film containing 1313 strong diffraction spots which were located automatically. The space group of the crystal is and the cell constants are . The entry for the correct Bravais lattice oC with derived cell constants close to the true ones has a low value for its quality index and thus appears as a possible explanation of the observed diffraction pattern.

Inspection of the table rating the likelihood of each of the 44 lattice types usually reveals a rather limited set of possible space groups. Furthermore, the absence of paritychanging symmetry operators required for protein crystals restricts the number of possible space groups to 65 instead of 230. Any space group can be tested by repeating only the final steps of data processing. These steps include a comparison of symmetryrelated reflection intensities, as well as a refinement of the parameters controlling the diffraction pattern after reindexing the reflections by the appropriate transformation. Low r.m.s. deviations between the observed and refined spot positions, as well as small R factors for symmetryrelated reflection intensities, indicate that the constraints imposed by the tentatively chosen space group are satisfied. The space group with highest symmetry compatible with the data is almost certainly correct if the data set is sufficiently complete and redundant, which requires that each symmetry element relates a sufficient number of reflections to one another.
For the example of a 1.5° oscillation data film given above, spacegroup determination consists of the following steps. Inspection of Table 11.3.6.1 indicates that lattice characters 10, 13, 14 and 34, besides the triclinic characters 31 and 44, are approximately compatible with the observed diffraction pattern. The highest lattice symmetry is orthorhombic (character 13, Bravais type oC), which limits the possible space groups for protein crystals to either or C222. Processing of all films in the data set was completed in space group P1 using the cell constants shown for lattice character 44. To test whether the crystal has spacegroup symmetry C222 and conventional cell constants , the final steps of data processing were repeated after reindexing the reflections by the transformation , , as specified for lattice character 13. Note that the transformation also provides a simple tool for correcting the indices if all reflections are misindexed by a constant. The results clearly show that the crystal has spacegroup symmetry . The presence of the axis was deduced from the rather weak intensities observed for reflections of type .
References
Abramowitz, M. & Stegun, I. A. (1972). Handbook of mathematical functions. New York: Dover Publications.Bricogne, G. (1986). Indexing and the Fourier transform. In Proceedings of the EEC cooperative workshop on positionsensitive detector software (phase III), p. 28. LURE, 12–19 November.
Diamond, R. (1966). A mathematical modelbuilding procedure for proteins. Acta Cryst. 21, 253–266.
Diamond, R. (1969). Profile analysis in single crystal diffractometry. Acta Cryst. A25, 43–55.
Dijkstra, E. W. (1976). A discipline of programming, pp. 154–167. New Jersey: PrenticeHall.
Ford, G. C. (1974). Intensity determination by profile fitting applied to precession photographs. J. Appl. Cryst. 7, 555–564.
Fox, G. C. & Holmes, K. C. (1966). An alternative method of solving the layer scaling equations of Hamilton, Rollett and Sparks. Acta Cryst. 20, 886–891.
Greenhough, T. J. & Helliwell, J. R. (1982). Oscillation camera data processing: reflecting range and prediction of partiality. I. Conventional Xray sources. J. Appl. Cryst. 15, 338–351.
Harrison, S. C., Winkler, F. K., Schutt, C. E. & Durbin, R. M. (1985). Oscillation method with large unit cells. Methods Enzymol. 114A, 211–237.
Howard, A. (1986). Autoindexing. In Proceedings of the EEC cooperative workshop on positionsensitive detector software (phases I & II), pp. 89–94. LURE, 26 May–7 June.
International Tables for Crystallography (2005). Vol. A. Spacegroup symmetry, edited by Th. Hahn, ch. 9.2. Heidelberg: Springer.
Kabsch, W. (1988a). Automatic indexing of rotation diffraction patterns. J. Appl. Cryst. 21, 67–71.
Kabsch, W. (1988b). Evaluation of singlecrystal Xray diffraction data from a positionsensitive detector. J. Appl. Cryst. 21, 916–924.
Kabsch, W. (1993). Automatic processing of rotation diffraction data from crystals of initially unknown symmetry and cell constants. J. Appl. Cryst. 26, 795–800.
Otwinowski, Z. (1993). Oscillation data reduction program. In Proceedings of the CCP4 study weekend. Data collection and processing, edited by L. Sawyer, N. Isaacs & S. Bailey, pp. 56–62. Warrington: Daresbury Laboratory.
Otwinowski, Z. & Minor, W. (1997). Processing of Xray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307–326.
Pflugrath, J. W. (1997). Diffractiondata processing for electronic detectors: theory and practice. Methods Enzymol. 276A, 286–306.
Rossmann, M. G. (1985). Determining the intensity of Bragg reflections from oscillation photographs. Methods Enzymol. 114A, 237–280.
Schutt, C. & Winkler, F. K. (1977). The oscillation method for very large unit cells. In The rotation method in crystallography, edited by U. W. Arndt & A. J. Wonacott, pp. 173–186. Amsterdam, New York, Oxford: NorthHolland.
Steller, I., Bolotovsky, R. & Rossmann, M. G. (1997). An algorithm for automatic indexing of oscillation images using Fourier analysis. J. Appl. Cryst. 30, 1036–1040.
Wirth, N. (1976). Algorithms + data structures = programs, pp. 264–274. New York: PrenticeHall.