Tables for
Volume F
Crystallography of biological macromolecules
Edited by E. Arnold, D. M. Himmel and M. G. Rossmann

International Tables for Crystallography (2012). Vol. F, ch. 9.1, pp. 226-228   | 1 | 2 |

Section 9.1.13. Relating data collection to the problem in hand

Z. Dautera* and K. S. Wilsonb

aNCI Frederick & Argonne National Laboratory, Building 202, Argonne, IL 60439, USA, and bYork Structural Biology Laboratory, Department of Chemistry, University of York, York YO10 5YW, England
Correspondence e-mail:

9.1.13. Relating data collection to the problem in hand

| top | pdf |

The data-collection protocol should be matched to the purposes for which the data are to be used. Different applications present a range of different needs, requiring the intensities (or structure-factor amplitudes) to be exploited in different ways. In this section a representative set of applications is outlined in terms of how the tactics and strategies of data collection can vary. Isomorphous-anomalous derivatives

| top | pdf |

The phasing of proteins by isomorphous replacement requires the collection of data from crystals of one or more heavy-atom derivatives of the protein that are isomorphous to the parent native crystal. Preparation of derivatives involves either soaking of native crystals in the heavy-atom solution or co-crystallization with the heavy-atom reagent (Part 12[link] ). Data collection can be split into two parts. The first step is to establish whether a potential derivative is isomorphous and contains the expected heavy atoms. The second is to collect the data on this derivative to provide the necessary phase information for the native structure factors. The problems of how to utilize the phase information are addressed in Part 12[link] . Here, strategies applicable to the two steps are described.

Screening of derivatives can be carried out by collecting data to the resolution limits of the crystals. This can consume substantial data-collection resources and lead to irrelevant data that are not from isomorphous crystals or do not contain the anticipated heavy-atom signal. It is preferable to record the minimum data sufficient to identify a potential derivative in order to save time and resources, as many samples may need to be screened. A minimal strategy can exploit some or all of the following protocols:

  • (1) An essentially complete native-data reference set should be available, although not necessarily to the ultimate resolution limit.

  • (2) Preparation of a set of crystals with a selected set of potential heavy atoms, the number depending on crystal availability.

  • (3) Collection of a small number of images from each potential derivative crystal, ideally on the home-laboratory rotating-anode source or an SR beamline if necessary. These data can be recorded to a low resolution: in principle 4 Å or less should be enough. The resulting partial derivative data are scaled with the complete native set. The fractional isomorphous difference can be evaluated easily and compared with the expected agreement with the native data. In general, values less than 10% suggest that the heavy atom is not bound. Values higher than about 30% suggest an unacceptable level of non-isomorphism. Intermediate values suggest, but do not guarantee, that the derivative is worth pursuing. Normal probability plots can be helpful in this respect (Howell & Smith, 1992[link]).

  • (4) Given a positive result from point (3[link]), complete data may be recorded on the same or an equivalent derivative crystal. Again, it may be useful to record data to low resolution in the first instance. 4 Å resolution is again quite sufficient to solve the structure of a heavy-atom constellation using direct or Patterson methods, allowing the more complete characterization of the potential derivative.

  • (5) If the compound proves to be a useful derivative, data can then be recorded to higher resolution for the computation of phase information. It may not be appropriate to record data to the highest resolution as for the native protein. In this context, the strength of the data is of primary importance, and relatively weak data at high resolution may be less relevant.

Some practical points are highly relevant here. The ability to store and reuse vitrified crystals means that potential derivatives can first be screened at the lowest possible resolution, and the crystal can be preserved and used later only if the derivative proves to provide useful phase information. The final resolution for data collection will then depend on the degree of iso­morphism. The wavelength, if tunable, should be set to a value just below the absorption edge in order to maximize the anomalous signal. The redundancy can play an important role, as it is useful to have a large number of independent measurements so that outliers in the native or derivative data can be excluded, as these can cause major problems in either the Patterson or direct-methods approaches for locating the heavy atom (Part 12[link] ). Anomalous scattering, MAD and SAD

| top | pdf |

The requirements for collecting data with an intrinsically weak anomalous signal are several. The highest possible resolution should not be the primary consideration. The emphasis is on data quality, as it is necessary to measure very small differences in structure-factor amplitudes, which are already in themselves relatively weak. Important considerations include the following.

  • (1) Optimization of the wavelength, particularly for MAD experiments.

  • (2) Ensuring that the anomalous data are complete in terms of all possible Bijvoet pairs. This is not always addressed by the currently available data-processing software.

  • (3) High redundancy of measurements significantly enhances the quality of the signal, as this provides effective averaging of errors and allows the rejection of statistical outliers. The latter is especially important for direct-methods solution of the anomalous-scattering constellation.

  • (4) However, the crystal lifetime is finite owing to the effects of radiation damage, which can introduce changes in intensities of the same order as the anomalous signal. For SAD (single-wavelength anomalous dispersion)/MAD data, the exposures should be limited to ensure the data are complete before the onset of substantial damage. This may well mean that the resolution limit should be set more modestly than for native data.

For MAD experiments (Hendrickson, 1999[link]; Smith, 1991[link]), which can only be carried out at SR sites, the optimum number of wavelengths at which data should be recorded remains unclear. Given finite beam time, the trade-off may be between measuring with limited redundancy at several wavelengths as against higher redundancy at a smaller number of wavelengths, or even at one wavelength.

SAD represents the limiting case. All data are recorded at one wavelength, reducing the requirement for fine monochro­matization and for fine tunability and stability. Now quality, especially in the form of redundancy, is the dominating factor since all phasing is based purely on a single anomalous difference for each reflection. In recent years, SAD phasing has come to predominate in the number of novel structures deposited in the PDB. Molecular replacement

| top | pdf |

For the initial data required for molecular replacement (MR), high resolution is not essential. Firstly, the method depends on homologous models that are usually only an imperfect representation of the structure under investigation, hence high-resolution data cannot be accurately modelled and will only introduce noise into the analysis. Secondly, the rotation function, the first step in MR, is based on the representation of the Patterson function in terms of spherical harmonics, which is limited in its accuracy.

In contrast, it is vital for MR applications that the most intense low-resolution terms are measured. The lack of such reflections strongly affects the rotation- and translation-function computations, as the functions are based on Patterson syntheses involving the square of the structure-factor amplitudes, and are dominated by the largest terms. Elimination of the strongest few per cent of the low-resolution data may well prevent a successful solution by MR.

However, for refinement of structures solved by MR, it is important that data be recorded to a resolution sufficient to allow escape from the phase bias introduced by the model. This is a key point. There are many examples where collection of data to a higher resolution has enabled the refinement of an MR solution which would not refine at the lower resolution. Definitive data for refinement of protein models

| top | pdf |

All structures benefit from the highest accuracy in their atomic coordinates to shed light on the details of their biological function. These may include substrate or inhibitor complexes and mutants as well as native proteins where the analysis requires the full potential of X-ray crystallography. Many of these crystals will not diffract to atomic resolution; nevertheless, all steps in a detailed crystal structure analysis are made simpler as the resolution and quality of the data are increased. This includes solution of the phase problem, interpretation of the electron-density maps and refinement of the model.

The most appropriate strategy for data collection involves decisions based on a complex and mutually dependent set of parameters including:

  • (1) Crystal quality and availability. If only one crystal is available, the choices are limited. If many are available, then some experimentation is recommended to select a high-quality sample. This is greatly aided by the recent introduction of automated sample changers and strategy software on a number of beamlines.

  • (2) Cryogenic vitrification. In many cases, this allows collection of data from a single crystal. If appropriate cryogenic conditions cannot be established, making it necessary to record room-temperature data, this can affect strategy dramatically, in that several crystals might well be required to record the target resolution and completeness.

  • (3) X-ray source and detector. The availability of these again places restrictions on the experiments that are tractable. An SR source will always provide better data, but has logistical problems of availability and access. For some problems, SR becomes sine qua non and a rotating anode is just insufficient. These include the use of MAD techniques, very small crystals, large and complex structures with large unit cells such as viruses, and where atomic resolution data are needed.

  • (4) Overall data-collection time allocated. This has an obvious overlap with point (3[link]). In particular, if SR is to be used later, then the resolution limit on the home source may be modest. If SR is not likely to be employed, then a higher resolution may be aimed for, requiring more time, and again dependent on the pressure on local resources.

Whatever the resource, it is good to define a strategy that will provide high completeness of the unique amplitudes at the highest resolution, with the realization that there may be some conflict between these two requirements owing to radiation damage. A series of mutant or complex structures

| top | pdf |

The detailed geometry of the molecule is already known and the rather general effects of ligand binding or mutation can be initially identified at a relatively modest resolution and completeness. As with heavy-atom screening, it is often advisable to check that the desired complex or structural modification has been achieved by first recording data at low resolution.

However, if the analysis then proves to be of real chemical interest, with a need for accurate definition of structural features, the data should be subsequently extended in resolution and quality. As with the identification of isomorphous derivatives, this approach has benefited greatly from cryogenic vitrification, where the sample can be screened at low resolution and then preserved for subsequent use. Atomic resolution applications

| top | pdf |

As for MAD data, the needs for atomic resolution data are extreme, but rather different in nature. Atomic resolution refinement is addressed in Chapter 18.4[link] . Suffice it to say that by atomic resolution it is meant that meaningful experimental data extend close to 1 Å resolution. There are two principal reasons for recording such data. Firstly, they allow the refinement of a full anisotropic atomic model, leading to a more complete description of subtle structural features. Secondly, direct methods of phasing are dependent upon the principle of atomicity.

The problems to be faced include:

  • (1) The high contrast in intensities between the low- and high-angle reflections. This may be much larger than the dynamic range of the detector. If exposure times are long enough to give good counting statistics at high resolution, then the low-resolution spots will be saturated. The solution is to use more than one pass with different effective exposure times.

  • (2) The overall exposure time is often considerable and substantial radiation damage may finally result. The com­pleteness of the low-resolution data is crucial, and it is strongly recommended to collect the low-resolution pass first as the time taken for this is relatively small.

  • (3) The close spacing between adjacent spots within the lunes on the detector, dependent on the cell dimensions. The only aid is to use fine collimation.

  • (4) The overlap of adjacent lunes at high diffraction angle, especially if a long cell axis lies along the beam direction. Using an alternative mount of the crystal is the simplest solution. Otherwise, the rotation range per image must be reduced, increasing the number of exposures. This was a problem with slow read-out detectors, but is largely alleviated with CCDs.

  • (5) For direct-methods applications, a liberal judgement of resolution limit should be adopted. Even a small percentage of meaningful reflections in the outer shells can assist the phasing. These weak shells can be rejected or given appropriate low weights in the refinement. The strong, low-resolution terms are vital for direct methods.


Hendrickson, W. A. (1999). Maturation of MAD phasing for the determination of macromolecular structures. J. Synchrotron Rad. 6, 845–851.
Howell, P. L. & Smith, G. D. (1992). Identification of heavy-atom derivatives by normal probability methods. J. Appl. Cryst. 25, 81–86.
Smith, J. L. (1991). Determination of three-dimensional structure by multiwavelength anomalous diffraction. Curr. Opin. Struct. Biol. 1, 1002–1011.

to end of page
to top of page