Tables for
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F, ch. 14.2, pp. 299-309   | 1 | 2 |

Chapter 14.2. MAD and MIR

J. L. Smith,a W. A. Hendrickson,b T. C. Terwilligerc and J. Berendzend

aDepartment of Biological Sciences, Purdue University, West Lafayette, IN 47907-1392, USA, bDepartment of Biochemistry, College of Physicians & Surgeons of Columbia University, 630 West 168th Street, New York, NY 10032, USA, cBioscience Division, Mail Stop M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA, and  dBiophysics Group, Mail Stop D454, Los Alamos National Laboratory, Los Alamos, NM 87545, USA

The first part of this chapter discusses multiwavelength anomalous diffraction (MAD). In the second part of the chapter, the Solve software is described. Solve is designed to automate the solution of macromolecular X-ray structures in straightforward cases. The overall approach is to link together, in a seamless procedure, all the analysis steps that a crystallographer would normally carry out. In the process, the software converts each decision-making step into an optimization problem. A key element of the procedure is the scoring and ranking of possible solutions. For MAD data, a second key element is the conversion of MAD data to pseudo-SIRAS data, allowing for much faster structure solution. The automated procedure has been used to determine structures with as many as 56 selenium atoms in the asymmetric unit.

14.2.1. Multiwavelength anomalous diffraction

| top | pdf |
J. L. Smitha* and W. A. Hendricksonb

Anomalous-scattering effects measured at several X-ray wavelengths can provide a direct solution to the crystallographic phase problem. For many years this was appreciated as a hypothetical possibility (Okaya & Pepinsky, 1956[link]), but, until tunable synchrotron radiation became available, experimental investigation with the weakly diffracting crystals of biological macromolecules was limited to one heroic experiment (Hoppe & Jakubowski, 1975[link]). Multiwavelength anomalous diffraction (MAD) became a dominant phasing method in macromolecular crystallography with the advent of reliable, brilliant synchrotron-radiation sources, the adoption of cryopreservation techniques for crystals of macromolecules, and the development of general anomalous-scatterer labels for proteins and nucleic acids.

Anomalous scattering, first recognized as a source of phase information by Bijvoet (1949)[link], has been employed since the early days of macromolecular crystallography (Blow, 1958[link]). It has been used to locate positions of anomalous scatterers (Rossmann, 1961[link]), to supplement phase information from isomorphous replacement (North, 1965[link]; Matthews, 1966a[link]) and to identify the enantiomorph of the heavy-atom partial structure in multiple isomorphous replacement (MIR) phasing (Matthews, 1966b[link]). Anomalous scattering at a single wavelength was the sole source of phase information in the structure determination of crambin (Hendrickson & Teeter, 1981[link]), an important precursor to development of MAD. MAD differs from these other applications in using anomalous scattering at several wavelengths for complete phase determination without approximations or simplifying assumptions. Anomalous scattering factors

| top | pdf |

The scattering of X-rays by an isolated atom is described by the atomic scattering factor, [f^{0}], based on the assumption that the electrons in the atom oscillate as free electrons in response to X-ray stimulation. The magnitude of [f^{0}] is normalized to the scattering by a single electron. Thus the `normal' scattering factor [f^{0}] is a real number, equal to the Fourier transform of the electron-density distribution of the atom. At zero scattering angle ([s = \sin\; \theta/\lambda = 0]), [f^{0}] equals Z, the atomic number. [f^{0}] falls off rapidly with increasing scattering angle due to weak scattering by the diffuse parts of the electron-density distribution. In reality, electrons in an atom do not oscillate freely because they are bound in atomic orbitals. Deviation from the free-electron model of atomic scattering is known as anomalous scattering. Using a classical mechanical model (James, 1948[link]), an atom scatters as a set of damped oscillators with resonant frequencies matched to the absorption frequencies of the electronic shells. The total atomic scattering factor, f, is thus a complex number. f is denoted as a sum of `normal' and `anomalous' components, where the anomalous components are corrections to the free-electron model: [f = f^{0} + f' + if''. \eqno(] f′ and f″ are expressed in electron units, as is [f^{0}]. The real component of anomalous scattering, f′, is in phase with the normal scattering, [f^{0}], whilst the imaginary component, f″, is out of phase by [\pi/2].

The imaginary component of anomalous scattering, f″, is proportional to the atomic absorption coefficient of the atom, [\mu_{a}], at X-ray energy E: [f'' (E) = (mc / 4 \pi e^{2} \hbar ) E\mu_{a} (E), \eqno(] where m is the electronic mass, c is the speed of light, e is the electronic charge and h [(=2\pi\hbar)] is Planck's constant. Thus, f″ can be determined experimentally by measurement of the atomic absorption coefficient. The relationship between f″ and f′ is known as the Kramers–Kronig dispersion relation (James, 1948[link]; Als-Nielsen & McMorrow, 2001[link]):[f'(E) = \left({2\over\pi}\right) P\int\limits_{0}^{\infty}{E'f''(E')\over E^2-E'^2}\;{\rm d}E',\eqno(] where P represents the Cauchy principal value of the integral such that integration over [E'] is performed from 0 to [(E-\varepsilon)] and from [(E+\varepsilon)] to [\infty], and then the limit [\varepsilon\rightarrow 0] is taken. The principal value of the integral can be evaluated numerically from limited spectral data that have been scaled to theoretical [f''] scattering factors (or [\mu_a] absorption coefficients) at points remote from the absorption edge.

Anomalous scattering is present for all atomic types at all X-ray energies. However, the magnitudes of f′ and f″ are negligible at X-ray energies far removed from the resonant frequencies of the atom. This includes all light atoms (H, C, N, O) of biological macromolecules at all X-ray energies commonly used for crystallography. f′ and f″ are rather insensitive to scattering angle, unlike [f^{0}], because the electronic resonant frequencies pertain to inner electron shells, which have radii much smaller than the X-ray wavelengths used for anomalous-scattering experiments. The magnitudes of f′ and f″ are greatest at X-ray energies very near resonant frequencies, and are also highly energy-dependent (Fig.[link]. This property of anomalous scattering is exploited in MAD.


Figure | top | pdf |

Anomalous scattering factors for Se in a protein labelled with selenomethionine. The spectra are a hybrid of experimental values derived from an absorption spectrum of a SeMet protein for energies near the Se K absorption edge and calculated values for energies remote from the edge. The Se K edge occurs at 12 660 eV, corresponding to a wavelength of 0.9800 Å. Anomalous-scattering effects are enhanced by a white line just above the edge. The position of the absorption edge ([E_{\rm edge}]) is the inflection point of the [\mu_a] (and [f'']) spectrum, and [E_{\rm peak}] is the energy of peak absorption just above the edge. These energies correspond to the wavelengths [\lambda_{\rm edge}] and [\lambda_{\rm peak}] in a MAD experiment because the magnitudes of [f'] and [f''] are greatest at [E_{\rm edge}] and [E_{\rm peak}], respectively.

Three means are available for evaluating anomalous scattering factors, f′ and f″. Calculations from first principles on isolated elemental atoms are accurate for energies remote from resonant frequencies (Cromer & Liberman, 1970a[link],b[link]). However, these calculated values do not apply to the energies most critical in a MAD experiment. f′ and f″ can also be estimated by fitting to diffraction data measured at different energies (Templeton et al., 1982[link]). Finally, f″ can be obtained from X-ray absorption spectra by the equation above, and f′ from f″ by the Kramers–Kronig transform [equation ([link]; Hendrickson et al., 1988[link]; Smith, 1998[link]]. Both the precise position of a resonant frequency and the values of f′ and f″ near resonance generally depend on transitions to unoccupied molecular orbitals, and are quite sensitive to the electronic environment surrounding the atom. Complexities in the X-ray absorption edge, particularly so-called `white lines', can enhance the anomalous scattering considerably (Fig.[link]). Thus, experimental measurements are needed to select wavelengths for optimal signals, and the values of [f'] and [f''] should be determined either from an absorption spectrum or by refinement against the diffraction data.

X-ray spectra near absorption edges of anomalous scatterers depend on the orientation of the local chemical environment in the X-ray beam, which is polarized for synchrotron radiation. The anisotropy of anomalous scattering may affect both the edge position and the magnitude of absorption. In such cases, f′ and f″ for individual atoms are also dependent on orientation. Orientational averaging due to multiple anomalous scatterer sites or crystallographic symmetry may prevent macroscopic detection of polarization effects in crystals. A formalism to describe anisotropic anomalous scattering in which f′ and f″ are tensors has been developed by Templeton & Templeton (1988)[link]. Fanchon & Hendrickson (1990)[link] have developed a technique to refine f′ and f″ tensors against MAD data. Although anomalous scattering labels such as the commonly used selenomethionine and related compounds are strongly anisotropic (Templeton & Templeton, 1988[link]; Hendrickson et al., 1990[link]), anisotropy is generally ignored in MAD for biological macromolecules. A phase equation for MAD

| top | pdf |

The impact of anomalous scattering on diffraction measurements can be evaluated by substituting the scattering-factor expression [equation ([link]] into the structure-factor equation. [{^{\lambda}F_{\rm obs} ({\bf h}) = \textstyle\sum\limits_{i=1}^{\rm No.\ of\ atoms}\displaystyle (\;f^{0} + f'_{\lambda} + if''_{\lambda})_{i} \exp [-B_{i} s^{2} ({\bf h})] \exp (2\pi i{\bf h} \cdot {\bf x}_{i}).} \eqno(] It is convenient to separate terms in the structure-factor expression according to wavelength dependence (Karle, 1980[link]). A wavelength-independent structure factor [{}^{0}F_{T}] with phase [\varphi_{T}] is defined to represent the total normal scattering from all atoms (Hendrickson, 1985[link]) as [\eqalignno{^{0}F_{T} ({\bf h}) &= \textstyle\sum\limits_{i=1}^{\rm No.\ of\ atoms}\displaystyle f^{0}_{i} \exp \left[-B_{i} s^{2} ({\bf h})\right] \exp (2\pi i{\bf h} \cdot {\bf x}_{i}) &\cr&= |^{0}F_{T}| \exp (i\varphi_{T}). &(\cr}] The anomalous scattering is confined to wavelength-dependent `anomalous' structure factors [^{\lambda} {\bf F}'] and [^{\lambda}{\bf F}''], representing the real and imaginary components of anomalous scattering for all atoms. In general, the anomalous structure factors are considered for only the subset of atoms with detectable anomalous scattering, leading to the anomalous structure factors [^{\lambda}{\bf F}'_{A}] and [^{\lambda}{\bf F}''_{A}]: [\eqalignno{^{\lambda}{F}'_{A} ({\bf h}) &= \textstyle\sum\limits_{j=1}^{N_{\rm ano}}\displaystyle f'_{\lambda j} \exp \left[-B_{j} s^{2} ({\bf h})\right] \exp (2\pi i{\bf h} \cdot {\bf x}_{j}) &(\cr^{\lambda}F''_{A} ({\bf h}) &= \textstyle\sum\limits_{j=1}^{N_{\rm ano}}\displaystyle if''_{\lambda j} \exp \left[-B_{j} s^{2} ({\bf h})\right] \exp (2\pi i{\bf h} \cdot {\bf x}_{j}), &(\cr}%(] where [N_{\rm ano}] is the number of anomalous scatterers. The anomalous structure factors [^{\lambda}F'_{A}] and [^{\lambda}F''_{A}] can also be expressed in terms of the normal structure-factor components, [^{0}F_{A_k}] with phase [\varphi_{A_k}], for the subsets of atoms that comprise each kind of anomalous scatterer (Karle, 1980[link]), [\eqalignno{^{\lambda}{F}'_{A} &= \textstyle\sum\limits_{k=1}^{\rm{No.\; of \;kinds}}\displaystyle (f'_{\lambda}/f^{0})_{k} \ {^{0}}F_{A_k} &(\cr^{\lambda}{F}''_{A} &= \textstyle\sum\limits_{k=1}^{\rm{No.\; of \;kinds}}\displaystyle (i f''_{\lambda}/f^{0})_{k} \ {^{0}}F_{A_k}. &(\cr}%(] Thus, the wavelength-dependent experimental structure factor [^{\lambda}F_{\rm obs}] can be represented using normal structure factors only: [^{\lambda}F_{\rm obs} = \ {^{0}F_{T}} + \textstyle\sum\limits_{k=1}^{\rm{No.\; of \;kinds}}\displaystyle \left({f'_{\lambda} \over \;f^{0}} + i{f''_{\lambda} \over f^{0}}\right)_{k} \ {^{0}F}_{A_k} \eqno(] This factorization is convenient because all wavelength dependence is confined to the anomalous scattering factors f′ and f″, which are independent of atomic positions, occupancies and thermal parameters. In addition, an electron-density map for the total structure should be based on normal scattering by all atoms, represented by the structure factor [^{0}F_{T}] with phase [\varphi_{T}]. As described in Section[link], the normal scattering factor [f^{0}] is strongly dependent on scattering angle, whereas f′ and f″ are nearly invariant with s.

A very useful observational equation is obtained by applying the law of cosines to [^{0}F_{T}], [^{\lambda}F'_{A}] and [^{\lambda}F''_{A}] (Karle, 1980[link]; Hendrickson, 1985[link]). Anomalous structure factors are treated separately for each kind of anomalous scatterer. The number of terms in the resulting expression is [(q + 1)^{2}] for q kinds of anomalous scatterers. For the commonest case of one kind of anomalous scatterer [\eqalignno{|^{\lambda}F_{\rm obs} (\pm {\bf h})|^{2} &= |^{0}F_{T}|^{2} + a_{\lambda} |^{0}F_{A}|^{2} &\cr&\quad + b_{\lambda} |^{0}F_{T}| \ |^{0}F_{A}| \cos (\varphi_{T} - \varphi_{A}) &\cr&\quad \pm c_{\lambda} |^{0}F_{T}| \ |^{0}F_{A}| \sin (\varphi_{T} - \varphi_{A}), &(\cr}] where [a_{\lambda} = (f^{''2}_{\lambda} + f^{'2}_{\lambda})/(f^{0})^{2}], [b_{\lambda} = 2f'_{\lambda}/f^{0}] and [c_{\lambda} = 2f''_{\lambda}/f^{0}]. [|^{\lambda}F_{\rm obs}(\pm {\bf h})|] refers to the Bijvoet mate reflections [+{\bf h}] and [-{\bf h}]. The MAD observational equation illustrates the orthogonal contributions to phasing made by the real (f′) and imaginary (f″) components of anomalous scattering. `Dispersive' phase information derives from differences between [|F_{\rm obs}|] at wavelengths having different values of f′ and contributes to [\cos(\varphi_{T} - \varphi_{A})]. `Bijvoet' phase information derives from the Friedel difference [|F_{\rm obs}(+{\bf h})| - |F_{\rm obs}(-{\bf h})|] at a wavelength with substantial f″ and contributes to [\sin(\varphi_{T} - \varphi_{A})]. Phase information is enhanced by selection of wavelengths for data measurement that maximize the magnitudes of both f′ and f″. Apart from ignoring the very weakest anomalous-scattering effects, the MAD observational equation ([link] involves no approximations. Diffraction ratios for estimating the MAD phasing signal

| top | pdf |

The first consideration in design of a MAD experiment is the choice of anomalous scatterer(s) with consideration of the magnitude of the phasing signal. Estimation of the total scattering by the macromolecule and the potential phasing signal generated by the anomalous scatterer(s) under consideration is informative.

The magnitude of the MAD phasing signal is estimated as the ratio of the expected dispersive or Bijvoet difference to the expected total scattering of the macromolecule. This is based on calculation of the expected root-mean-square structure amplitude [({\rm rms}|F|)] (Wilson, 1942[link]). [{\rm rms}|F| = \langle |F|^2 \rangle^{1/2} = \left(\textstyle\sum\displaystyle f_{i}^{2}\right)^{1/2} = N^{1/2}f \eqno(] for N identical atoms. The expected total scattering of the macromolecule is estimated at [s = 0] using an average non-hydrogen atom. Based on atomic frequencies in biological macromolecules, the average values of [f^{0}] are 6.70 e for proteins, 7.20 e for DNA and 7.26 e for RNA. The average number of non-hydrogen atoms and molecular mass per residue are 7.7 atoms and 110 Da for proteins, 21.8 atoms and 292 Da for DNA, and 22.4 atoms and 304 Da for RNA. These averages result in the following expressions for estimated total scattering of biological macromolecules: [\eqalignno{{\rm rms} |^{0}F_{T}| _{\rm protein} &\approx 6.70 (\hbox{No. of atoms})^{1/2} \approx (346 \times \hbox{No. of amino acids})^{1/2}\cr&\approx (3.14 \times \hbox{molecular mass})^{1/2} &\cr{\rm rms}|^{0}F_{T}| _{\rm DNA} &\approx 7.20 (\hbox{No. of atoms})^{1/2} \approx (1128 \times \hbox{No. of nucleotides})^{1/2}\cr&\approx (3.87 \times \hbox{molecular mass})^{1/2} &\cr{\rm rms}|^{0}F_{T}| _{\rm RNA} &\approx 7.26 (\hbox{No. of atoms})^{1/2} \approx (1183 \times \hbox{No. of nucleotides})^{1/2}\cr&\approx (3.89 \times \hbox{molecular mass})^{1/2}.& ( \cr}] Note: the estimated total scattering of a protein is coincidentally [\approx (\pi \times \hbox{molecular mass})^{1/2}].

The diffraction ratios relevant to a MAD experiment with N anomalous-scatterer sites are [{{\rm rms}\|^{\lambda 1}F_{\rm obs} | - | ^{\lambda 2}F_{\rm obs}\| \over {\rm rms}|^{0}F_{T}| } \approx (N/2)^{1/2} {|f'_{\lambda 1} - f'_{\lambda 2}| \over {\rm rms}|^{0}F_{T}| } \eqno(] for the dispersive signal and [{{\rm rms}\|^{\lambda}F_{\rm obs}^{+} | - | ^{\lambda}F_{\rm obs}^{-}\| \over {\rm rms}|^{0}F_{T}| } \approx (N/2)^{1/2} {2f''_{\lambda} \over {\rm rms}|^{0}F_{T}| } \eqno(] for the Bijvoet signal. The diffraction ratios, analogous to similar relations for isomorphous replacement (Crick & Magdoff, 1956[link]), are equivalent to the expected fractional changes in intensity due to anomalous scattering, and, as such, can be compared directly to the [R_{\rm sym}] estimate of error in the experimental data for evaluation of the phasing signal. Of course, the phasing signal may be diminished by partial occupancy or thermal motion, as for normal scattering. Experimental considerations

| top | pdf |

The design and execution of a MAD experiment are distinguished from monochromatic experiments in macromolecular crystallography primarily by the stringent criteria for wavelength selection.

The largest MAD phasing signal is obtained at energies with the most extreme values of f′ and f″, which correspond to the sharpest features of the absorption edge (Fig.[link]). The energy of peak absorption just above the edge [(E_{\rm peak})] corresponds to the wavelength of maximum f″ and optimal Bijvoet signal [(\lambda_{\rm peak})]. Typically, the orthogonal dispersive signal is optimized by recording one data set at the wavelength corresponding to the inflection point of the absorption edge (minimum f′, [\lambda_{\rm edge}]), and one or more data sets at remote wavelengths having f′ with smaller magnitudes [(\lambda_{\rm remote})]. The choice of the remote wavelength(s) is experiment dependent. If only one remote wavelength is used, it is typically on the high-energy side of the absorption edge due to the larger Bijvoet signal. The remote wavelength(s) may also be chosen to avoid complications from other edges or to obtain data at a wavelength optimal for model refinement. In the case of anomalous scatterers that exhibit sharp `white line' features, the dispersive signal may be optimized between the minimum of [f'] at the ascending edge [(\lambda_{\rm edge})] and the local maximum of [f'] at the descending inflection point [(\lambda_{\rm descent})].

The features of an X-ray absorption edge are in many cases very sharp, with the energies of the inflection point and peak absorption separated by as little as 2 eV. Therefore, it is critical to determine [\lambda_{\rm edge}] and [\lambda_{\rm peak}] experimentally by recording the absorption edge from the labelled macromolecule at the time of a MAD experiment. Even when the position of the edge is well known, small unanticipated chemical changes in the sample or calibration errors in the X-ray beam can reduce the MAD signal very significantly.

The MAD phasing signal is derived from intensity differences that may be similar in magnitude to measurement errors. Thus a general philosophy in the design of a MAD experiment is to equalize systematic errors among the measurements whose differences will contribute to each phase determination. This is achieved for each unique reflection by recording Bijvoet measurements at all wavelengths from the same asymmetric portion of diffraction space at nearly the same time. If crystal decay necessitates use of multiple crystals in a MAD experiment, blocks of Bijvoet data should be recorded identically at all of the selected wavelengths from each crystal contributing to the data set. Bijvoet mates can be recorded simultaneously by alignment of the crystal with a mirror plane of diffraction symmetry perpendicular to the rotation axis, or Friedel images can be recorded in an `inverse beam' experiment. Inverse-beam geometry is a hypothetical method for measurement of Friedel data using both the forward and reverse directions of the incident X-ray beam. In a real experiment, diffraction images and their Friedel equivalents are recorded at crystal positions related by 180° rotation about any axis perpendicular to the incident beam, usually the data-collection axis. The inverse-beam experiment requires neither crystal symmetry nor crystal alignment, and is well suited to crystals mounted in random orientations.

The multiwavelength measurements for each unique reflection will be identically redundant and have nearly equal systematic errors if identical blocks of Bijvoet data are collected, as described above. When such a data-collection strategy is followed, the resulting MAD data set will include all multiwavelength Bijvoet measurements for all regions of the reciprocal lattice that are covered in the experiment.

Cryopreservation of crystals is of enormous benefit to MAD. Systematic error due to radiation damage is eliminated or greatly diminished. Systematic differences between crystals are eliminated in cases where a complete MAD data set is measured from a single frozen crystal. Intensities of weak reflections are estimated more accurately because less material contributes to diffuse background scatter in the mounts used for frozen crystals than for unfrozen crystals.

Measurement errors are of major importance in all areas of macromolecular crystallography, but are the limiting factor in phase determination by MAD. MAD data should be of high quality by the usual measures ([R_{\rm sym}], redundancy, completeness), especially in experiments where the phasing signal is weak. Good counting statistics are of paramount importance. Experimental error, estimated by [R_{\rm sym}], increases with increasing scattering angle because of the strong fall-off of [f^{0}] with s. In a carefully designed experiment, the effect of increasing [R_{\rm sym}] with s is mitigated somewhat by equalizing systematic errors and by averaging highly redundant data. Disappearance of the phasing signal into [R_{\rm sym}] noise is the major reason that useful MAD phases are not obtained to the diffraction limit of crystals, even though anomalous scattering does not diminish with increasing s.

The optimal number of data-collection wavelengths for successful phase determination by MAD has been debated. In most cases, it is necessary to measure data at [\lambda_{\rm edge}], [\lambda_{\rm peak}] and a [\lambda_{\rm remote}] in order to take advantage of the most extreme values of f′ and f″. If f′ values at [\lambda_{\rm edge}] and [\lambda_{\rm peak}] are different enough to produce a detectable dispersive signal, then phases can be obtained from three measurements: [|F^{+}|] and [|F^{-}|] at [\lambda_{\rm peak}], and either [|F^{+}|] or [|F^{-}|] at [\lambda_{\rm edge}]. However, redundancy is one of the best ways to minimize the effects of measurement error in macromolecular crystallography. Redundant Bijvoet signals can be obtained at [\lambda_{\rm peak}] and at any [\lambda_{\rm remote}] above the absorption edge if both [|F^{+}|] and [|F^{-}|] are measured at each wavelength. Likewise, the dispersive signal between measurements at [\lambda_{\rm edge}] and [\lambda_{\rm remote}] is also redundant if both [|F^{+}|] and [|F^{-}|] measurements are taken at each wavelength. More highly redundant four- or five-wavelength MAD experiments may be advantageous, although greater redundancy should not be gained at the cost of good counting statistics. Brilliant synchrotron sources and high-speed detectors make rapid measurement of complete multiwavelength data sets possible, but the practical feasibility is often compromised by radiation damage. Phase information from the Bijvoet signal at a single wavelength (preferably [\lambda_{\rm peak}]) can also be used as a basis for structure determination. The phase probability distribution from single-wavelength anomalous scattering is in general bimodal, and must be resolved with additional phase information. This could be the partial structure of anomalous scatterers, as in the classic crambin experiment (Hendrickson & Teeter, 1981[link]), or the real-space constraints, such as solvent flattening or redundancy averaging, that are applied in common schemes for phase refinement by density modification. Data handling

| top | pdf |

Two general approaches to data handling for MAD have been employed.

An extreme interpretation of the scheme for equalizing systematic errors is known as `phase first, merge later' (Hendrickson, 1985[link]; Hendrickson & Ogata, 1997[link]). The idea is that systematic errors may be amplified by merging data, and that this may obscure a weak phasing signal. In this approach, the individual observations constituting a multiwavelength Bijvoet set, as determined by the data-collection strategy, are grouped together and scaled. There may be redundant multiwavelength sets of observations, but these are merged only after individual phase evaluations have been made. Error estimates from the phasing, or the agreement of redundant phase determinations, can be incorporated into weights for averaging, or can be used to reject outliers. Complicated, experiment-dependent book-keeping is required to assemble exactly the correct observations into each unmerged set of multiwavelength measurements. However, the `phase first, merge later' approach may be advantageous for MAD data sets from multiple crystals, or when minor disasters disrupt the experiment and thwart the data-collection strategy.

A second approach, known as `merge first, phase later', is to scale and merge data at each wavelength, keeping Bijvoet pairs separate, and then to scale data at all wavelengths to one another (Ramakrishnan & Biou, 1997[link]). The idea is that the multiwavelength Bijvoet measurements are identically redundant for each unique reflection if the MAD data were measured according to the strategy outlined in Section[link]. Thus, merging the redundancies should reduce systematic errors in the amplitude differences used for phasing. The `merge first, phase later' approach is computationally simpler than the `phase first, merge later' approach because it is experiment independent. However, unanticipated experimental disasters may be more difficult to overcome in the `merge first, phase later' approach to data handling.

Of course, if the MAD signal is strong relative to the experimental error, either approach to data handling should be successful. Data scaling in both approaches may be done most easily and reliably by scaling all data against a standard data set, such as the unique data from one wavelength with Bijvoet mates averaged. In general a dogmatic approach to data handling is best avoided in favour of whichever computational technique or combination of techniques is most suited to the problem at hand. Factors such as the strength of the MAD signal, data-collection strategy, number of crystals contributing to the data set, crystal quality and experimental disasters should be taken into account. Approaches to MAD phasing

| top | pdf |

There are two general approaches to MAD phasing. In the explicit approach, the MAD observational equation is solved directly (Hendrickson et al., 1988[link]; Hendrickson & Ogata, 1997[link]). In the pseudo-MIR approach, MAD phasing is treated as a special case of multiple isomorphous replacement (Burling et al., 1996[link]; Terwilliger, 1997[link]; Ramakrishnan & Biou, 1997[link]). Both approaches have been quite successful, and each has advantages and disadvantages. For complete phase determination by either method, the partial structure of the anomalous scatterers must be determined. The explicit and pseudo-MIR approaches differ in when the partial structure is determined and in how it is refined.

The explicit approach provides the quantities [|^{0}F_{T}|], [|^{0}F_{A}|] and [(\varphi_{T} - \varphi_{A})] by direct fit of the [|^{\lambda}F_{\rm obs}|] to the MAD observational equation ([link]. No anomalous-scatterer partial structure model is required in this first step of phasing. Estimates of the anomalous scattering factors at the wavelengths of data collection are required. These estimates can be refined (Weis et al., 1991[link]), so they need not be highly accurate. Redundancies are merged to produce a unique data set at the level of the derived quantities [|^{0}F_{T}|], [|^{0}F_{A}|], [(\varphi_{T} - \varphi_{A})] and their error estimates. The anomalous-scatterer partial structure is determined from the derived estimates of [|^{0}F_{A}|] and refined against these amplitudes. In the second step of phasing, [\varphi_{T}] is derived from the phase difference [(\varphi_{T} - \varphi_{A})] and weights are calculated for a Fourier synthesis from [|^{0}F_{T}|] and [\varphi_{T}]. Phase probability distributions (ABCD coefficients; Hendrickson & Lattman, 1970[link]) derived from the MAD observational equation ([link] can be used directly in the explicit approach (Pähler et al., 1990[link]). A probabilistic treatment based on maximum likelihood theory has also been developed (de La Fortelle & Bricogne, 1997[link]). There are two advantages to the explicit approach. First, it is amenable to the `phase first, merge later' scheme of data handling because refinement of the anomalous-scatterer partial structure is entirely separate from phase calculation. The second principal advantage of the explicit approach is the calculation of an experimentally derived estimate of the normal structure amplitude [|^{0}F_{A}|] for the anomalous scatterer. This is the quantity with which the partial structure of anomalous scatterers is most directly solved and refined. However, extraction of reliable [|^{0}F_{A}|] estimates from data with low signal-to-noise can be difficult. Bayesian methods of [|^{0}F_{A}|] estimation (Terwilliger, 1994a[link]; Krahn et al., 1999[link]) have been shown to be more robust than least-squares methods.

In the pseudo-MIR approach, data at one wavelength are designated as `native' data, which include anomalous scattering, and data at the other wavelengths as `derivative' data. This approach has the advantage that nothing need be known about the anomalous scattering factors at any time during phasing. These quantities are incorporated into heavy-atom atomic `occupancies' and refined along with other parameters. Of course, the partial structure of anomalous scatterers must be known, and its refinement is concurrent with phasing. This may be a principal advantage of the pseudo-MIR approach, because the anomalous-scatterer parameter refinement may be more reliable when incorporated into phasing than when done against [|^{0}F_{A}|] estimates. Greater weight is given to the data set selected as `native' in refinement of the `heavy-atom' parameters in some implementations of the pseudo-MIR approach, although others treat data at all wavelengths equivalently (Terwilliger & Berendzen, 1997[link]). The amplitudes [|^{0}F_{A}|] are not a by-product of the pseudo-MIR approach. Determination of the anomalous-scatterer partial structure

| top | pdf |

Determination of the partial structure of anomalous scatterers is a prerequisite for MAD-phased electron density, regardless of the phasing technique. As described above, the optimal quantities for solving and refining the partial structure of anomalous scatterers are the normal structure amplitudes [|^{0}F_{A}|]. Frequently [|^{0}F_{A}|] values are not extracted from the MAD measurements, and the largest Bijvoet or dispersive differences are used instead. This involves the approximation of representing structure amplitudes [(|^{0}F_{A}|)] as the subset of larger differences [(||F^{+}| - |F^{-} || \hbox{ or } ||F_{\lambda ^{1}} | - |F_{\lambda ^{2}}||)]. The approximation is accurate for only a small fraction of reflections because there is little correlation between [\varphi_{A}] and [\varphi_{T}]. However, it suffices for a suitably strong signal and a suitably small number of sites. Patterson methods are quite successful in locating anomalous scatterers when the number of sites is small. However, the aim of MAD is to solve the macromolecule structure from one MAD data set using any number of anomalous scatterer sites. For larger numbers of sites, statistical direct methods may be employed.

The correct enantiomorph for the anomalous-scatterer partial structure also must be determined ([\varphi_{A}] versus [-\varphi_{A}]) in order to obtain an electron-density image of the macromolecule. However, it cannot be determined directly from MAD data. The correct enantiomorph is chosen by comparison of electron-density maps based on both enantiomorphs of the partial structure. Unlike the situation for pure MIR, the density based on the incorrect enantiomorph of the anomalous-scatterer partial structure is not the mirror image of that based on the correct enantiomorph and contains no image of the macromolecule. The correct map is distinguished by features such as a clear solvent boundary, positive correlation of redundant densities and a macromolecule-like density histogram. If the anomalous-scattering centres form a centric array, then the two enantiomorphs are identical and both maps are correct. General anomalous-scatterer labels for biological macromolecules

| top | pdf |

MAD requires a suitable anomalous scatterer, of which none are generally present in naturally occurring proteins or nucleic acids. However, selenomethionine (SeMet) substituted for the natural amino acid methionine (Met) is a general anomalous-scattering label for proteins (Hendrickson, 1985[link]), and is the anomalous scatterer most frequently used in MAD. The K edge of Se is the most accessible for MAD experiments (λ = 0.98 Å).

The SeMet label is especially general and convenient because it is introduced by biological substitution of SeMet for methionine. This is achieved by blocking methionine biosynthesis and substituting SeMet for Met in the growth medium of the cells in which the protein is produced. Production of SeMet protein in bacteria is generally straightforward (Hendrickson et al., 1990[link]; Doublié, 1997[link]) and has also been accomplished in eukaryotic cells (Lustbader et al., 1995[link]; Bellizzi et al., 1999[link]).

Methionine is a particularly attractive target for anomalous-scatterer labelling because the side chain is usually buried in the hydrophobic core of globular proteins where it is relatively better ordered than are surface side chains. The labelling experiment provides direct evidence for isostructuralism of Met and SeMet proteins. All proteins in the biological expression system have SeMet substituted for Met at levels approaching 100%. The cells are viable, therefore the proteins are functional and isostructural with their unlabelled counterparts to the extent required by function.

The natural abundance of methionine in soluble proteins is approximately one in fifty amino acids, providing a typical MAD phasing signal of 4–6% of [|F|] [equations ([link] and ([link]]. Typical extreme values for the anomalous scattering factors are [f'_{\min} \sim - 10 \hbox{ e}] and [f''_{\max} \sim 6 \hbox{ e}] (Fig.[link]. SeMet is more sensitive to oxidation than is Met, and care must be taken to maintain a homogeneous oxidation state. Generally, the reduced state is maintained by addition of disulfide reducing agents to SeMet protein and crystals. However, the oxidized forms of Se have sharper K-edge features and [f'] and [f''] values of greater magnitude than does the reduced form (Smith & Thompson, 1998[link]). This property has been exploited to enhance anomalous signals by intentional oxidation of SeMet protein (Sharff et al., 2000[link]). SeMet is also a useful isomorphous-replacement label with a signal of [\sim\! 10]% of [|F|]. Prior knowledge of the sites of labelling is extremely useful during initial fitting of a protein sequence to electron density. Also, noncrystallographic symmetry operators can usually be defined more reliably from Se positions in SeMet protein than by heavy-atom positions in MIR due to the uniformity and completeness of labelling (Tesmer et al., 1996[link]).

An analogous general label is available for nucleic acids in the form of brominated bases, particularly 5-bromouridine, which is isostructural with thymidine. The K edge of Br corresponds to a wavelength of 0.92 Å, which is quite favourable for data collection.

14.2.2. Automated MAD and MIR structure solution

| top | pdf |
T. C. Terwilligerc* and J. Berendzend Introduction

| top | pdf |

In favourable cases, structure solution by X-ray crystallography using the MAD or MIR methods can be a straightforward, though often lengthy, process. The recently developed Solve software (Terwilliger & Berendzen, 1999b[link]) is designed to fully automate this class of structure solution. The overall approach is to link together all the analysis steps that a crystallographer would normally carry out into a seamless procedure, and in the process to convert each decision-making step into an optimization problem.

In the case of both MAD and MIR data, a key element of the procedure is the scoring and ranking of possible solutions. This scoring procedure makes it possible to treat structure solution as an optimization procedure, rather than a decision-making one. In the case of MAD data, a second key element of the procedure is the conversion of MAD data to a pseudo-SIRAS form (Terwilliger, 1994b[link]) that allows much more rapid analysis than one involving the full MAD data set. MAD and MIR structure solution

| top | pdf |

The MAD and MIR approaches to structure solution are conceptually very similar and share several important steps. Two of these are the identification of possible locations of heavy or anomalously scattering atoms and an analysis of the quality of each of these potential heavy-atom solutions. In each method, trial partial structures for these heavy or anomalously scattering atoms are often obtained by inspection of difference Patterson functions or by semi-automated analysis (e.g. Terwilliger et al., 1987[link]; Chang & Lewis, 1994[link]; Vagin & Teplyakov, 1998[link]). In other cases, direct-methods approaches have been used to find heavy-atom sites (Sheldrick, 1990[link]; Miller et al., 1994[link]). Potential heavy-atom solutions found in any of these approaches are often just a starting point for structure solution, with additional sites found by difference Fourier or other approaches.

The analysis of the quality of potential heavy-atom solutions is also very similar in the MIR and MAD methods. In both cases a partial structure is used to calculate native phases for the entire structure, and the electron density that results is examined to see if the expected features of the macromolecule are found. Additionally, the agreement of the heavy-atom model with the difference Patterson function and the figure of merit of phasing are commonly used to evaluate the quality of a solution. In many cases, an analysis of heavy-atom sites by sequential deletion of individual sites or derivatives is often an important criterion of quality as well (Dickerson et al., 1961[link]). Decision making and structure solution

| top | pdf |

The process of structure solution can be thought of largely as a decision-making process. In the early stages of solution, a crystallographer must choose which of several potential trial solutions may be worth pursuing. At a later stage, the crystallographer must choose which peaks in a heavy-atom difference Fourier are to be included in the heavy-atom model, and which hand of the solution is correct. At a final stage, the crystallographer must decide whether the solution process is complete and which of the possible heavy-atom models is the best. The most important feature of the Solve software is the use of a consistent scoring algorithm as the basis for making all these decisions. The need for rapid refinement and phasing during automated structure solution

| top | pdf |

In order to make automated structure solution practical, it was necessary to be able to evaluate heavy-atom solutions very rapidly. This is because the automated approach used by Solve requires analysis of many heavy-atom solutions (typically 300–1000). For each heavy-atom solution examined, the heavy-atom sites have to be refined and phases calculated. In implementing automated structure solution, it was important to recognize the need for a trade-off between the most accurate heavy-atom refinement and phasing at all stages of structure solution and the time required to carry it out. The balance chosen for Solve was to use the most accurate available methods for final phase calculations, and to use approximate but much faster methods for all refinements and phase calculations. The refinement method chosen on this basis was origin-removed Patterson refinement (Terwilliger & Eisenberg, 1983[link]), which treats each derivative in an MIR data set independently and which is very fast because it does not require phase calculation. The phasing approach used for MIR data thoughout Solve is Bayesian correlated phasing (Terwilliger & Berendzen, 1996[link]; Terwilliger & Eisenberg, 1987[link]), which takes into account the correlation of non-isomorphism among derivatives without substantially slowing down phase calculations.

For MAD data, Bayesian calculations of phase probabilities are very slow (e.g. Terwilliger & Berendzen, 1997[link]; de La Fortelle & Bricogne, 1997[link]). Consequently, we have used an alternative procedure for all MAD phase calculations except those done at the very final stage. This alternative is to convert the MAD data set into a form that is similar to one obtained in the single isomorphous replacement with anomalous scattering (SIRAS) method. In this way, a single data set with isomorphous and anomalous differences is obtained that can be used in heavy-atom refinement by the origin-removed Patterson refinement method and in phasing by conventional SIRAS phasing (Terwilliger & Eisenberg, 1987[link]). Conversion of MAD data to a pseudo-SIRAS form

| top | pdf |

The conversion of MAD data to a pseudo-SIRAS form that has almost the same information content requires two important assumptions. The first assumption is that the structure factor corresponding to anomalously scattering atoms in a structure varies in magnitude but not in phase at various X-ray wavelengths. This assumption will hold when there is one dominant type of anomalously scattering atom. The second is that the structure factor corresponding to anomalously scattering atoms is small compared to the structure factor from all other atoms. As long as these two assumptions hold, the information in a MAD experiment is largely contained in just three quantities: a structure factor ([F_{o}]) corresponding to the scattering from non-anomalously scattering atoms, a dispersive or isomorphous difference at a standard wavelength [\lambda_{o}] ([\Delta^{\rm ISO}_{\lambda_{o}}]), and an anomalous difference ([\Delta^{\rm ANO}_{\lambda_{o}}]) at the same standard wavelength (Terwilliger, 1994b[link]). It is easy to see that these three quantities could be treated just like a SIRAS data set with the `native' structure factor [F_{P}] replaced by [F_{o}], the derivative structure factor [F_{PH}] replaced by [F_{o} + \Delta^{\rm ISO}_{\lambda_{o}}], and the anomalous difference replaced by [\Delta^{\rm ANO}_{\lambda_{o}}] (Terwilliger, 1994b[link]). This is the approach taken by Solve. In this section, it is briefly shown how these three quantities can be estimated from MAD data.

For a particular reflection and a particular wavelength [\lambda_{j}], we can write the total normal (i.e., non-anomalous) scattering from a structure ([{\bf F}_{{\rm tot},\,\lambda_{j}}]) as the sum of two components. One is the scattering from all non-anomalously scattering atoms ([{\bf F}_{o}]). This scattering is wavelength-independent. The second is the normal scattering from anomalously scattering atoms ([{\bf F}_{H_{\lambda_{j}}}]) at wavelength [\lambda_{j}]. This term includes wavelength-dependent dispersive shifts in atomic scattering due to the f′ term in the scattering factor, but not the anomalous part due to the f″ term. The magnitude of the total scattering factor can then be written in the form [F_{{\rm tot},\, \lambda_{j}} = |{\bf F}_{o} + {\bf F}_{H_{\lambda_{j}}}|. \eqno(] Here [{\bf F}_{o}] and [{\bf F}_{{\rm tot},\,\lambda_{j}}] can be thought of corresponding, respectively, to the native structure factor, [F_{P}], and the derivative structure factor, [F_{PH}], as used in the method of isomorphous replacement (Blundell & Johnson, 1976[link]). If the scattering from anomalously scattering atoms is small compared to that from all other atoms, equation ([link] can be rewritten in the approximate form [F_{{\rm tot},\, \lambda_{j}} \simeq F_{o} + F_{H_{\lambda_{j}}} \cos (\alpha), \eqno(] where α is the phase difference between the structure factors corresponding to non-anomalously and anomalously scattering atoms in the unit cell, [{\bf F}_{o}] and [{\bf F}_{H_{\lambda_{j}}}], respectively, at this X-ray wavelength.

The data in a MAD experiment consist of observations of structure-factor amplitudes for Bijvoet pairs, [F^{+}_{\lambda_{j}}] and [F^{-}_{\lambda_{j}}], for several X-ray wavelengths [\lambda_{j}]. These can be rewritten in terms of an average structure-factor amplitude [\overline{F}_{\lambda_{j}}] and an anomalous difference [\Delta^{\rm ANO}_{\lambda_{j}}] (cf. Blundell & Johnson, 1976[link]). We would like to convert these into estimates of the amplitude of the structure factor corresponding to the non-anomalously scattering atoms alone, the amplitude of the structure factor corresponding to the entire structure at a standard wavelength, and the anomalous difference at the standard wavelength.

The normal scattering due to anomalously scattering atoms ([{\bf F}_{H_{\lambda_{j}}}]) changes in magnitude but not direction as a function of X-ray wavelength. We can therefore write (Terwilliger, 1994b[link]) [{\bf F}_{H_{\lambda_{\dot{j}}}} = {\bf F}_{H_{\lambda_{o}}} {f_{o} + f^{\prime}(\lambda_{{j}}) \over f_{o} + f^{\prime}(\lambda_{o})}, \eqno(] where [\lambda_{o}] is an X-ray wavelength arbitrarily defined as a standard, and the real part of the scattering factor for the anomalously scattering atoms at wavelength [\lambda_{o}] is [f_{o} + f'(\lambda_{j})]. A corresponding approximation for the anomalous differences at various wavelengths can also be written (Terwilliger & Eisenberg, 1987[link]) [\Delta^{\rm ANO}_{\lambda_{j}} = \Delta^{\rm ANO}_{\lambda_{o}} {f''(\lambda_{j}) \over f''(\lambda_{o})}, \eqno(] where [f''(\lambda_{j})] is the imaginary part of the scattering factor for the anomalously scattering atoms at wavelength [\lambda_{j}]. Based on equation ([link], anomalous differences at any wavelength can be estimated using measurements at the standard wavelength.

An estimate of the structure-factor amplitude ([F_{o}]) corresponding to the scattering from non-anomalously scattering atoms and of the dispersive difference at standard wavelength [\lambda_{o}] ([\Delta^{\rm ISO}_{\lambda_{o}}]) can be obtained from average structure-factor amplitudes ([\overline{F}_{\lambda_{j}}]) at any pair of wavelengths [\lambda_{i}] and [\lambda_{j}] by proceeding in two steps. Using equations ([link] and ([link], the component of [{\bf F}_{H_{\lambda_{o}}}] along [{\bf F}_{o}], which we term [\Delta^{\rm ISO}_{\lambda_{o}}], can be estimated as [\Delta^{\rm ISO}_{\lambda_{o}} \simeq F_{H_{\lambda_{o}}} \cos(\alpha) \eqno(] or [\Delta^{\rm ISO}_{\lambda_{o}} \simeq (\overline{F}_{\lambda_{i}} - \overline{F}_{\lambda_{j}}) {f_{o} + f'(\lambda_{o}) \over f'(\lambda_{i}) - f'(\lambda_{j})}. \eqno(] Then, in turn, this estimate of [\Delta^{\rm ISO}_{\lambda_{o}}] can be used to obtain [F_{o}]: [F_{o} \simeq \overline{F}_{\lambda_{j}} - \Delta^{\rm ISO}_{\lambda_{o}} {f_{o} + f'(\lambda_{j}) \over f_{o} + f'(\lambda_{o})}. \eqno(] This set of [F_{o}], [F_{o} + \Delta^{\rm ISO}_{\lambda_{o}}] and [\Delta^{\rm ANO}_{\lambda_{j}}] can then be used just as [F_{P}], [F_{PH}] and [\Delta^{\rm ANO}] are used in the SIRAS (single isomorphous replacement with anomalous scattering) method.

The algorithm described above is implemented in the program segment MADMRG as part of Solve (Terwilliger, 1994b[link]). In most cases, there are more than one pair of X-ray wavelengths corresponding to a particular reflection. The estimates from each pair of wavelengths are averaged, using weighting factors based on the uncertainties in each estimate. Data from various pairs of X-ray wavelengths and from various Bijvoet pairs can have very different weights in their contributions to the total. This can be understood by noting that pairs of wavelengths that yield a large value of the denominator in equation ([link] (i.e., those that differ considerably in dispersive contributions) would yield relatively accurate estimates of [\Delta^{\rm ISO}_{\lambda_{o}}]. In the same way, Bijvoet differences measured at the wavelength with the largest value of f″ will contribute the most to estimates of [\Delta^{\rm ANO}_{\lambda_{j}}].

The standard wavelength choice in this analysis is arbitrary, because values at any wavelength can be converted to values at any other wavelength. The standard wavelength does not even have to be one of the wavelengths in the experiment, though it is convenient to choose one of them. Scoring of trial heavy-atom solutions

| top | pdf |

Scoring of potential heavy-atom solutions is an essential part of the Solve algorithm because it allows ranking of solutions and appropriate decision making. Solve scores trial heavy-atom solutions (or anomalously scattering atom solutions) using four criteria: agreeement with the Patterson function, cross-validation of heavy-atom sites, figure of merit, and non-randomness of the electron-density map. The scores for each criterion are normalized to those for a group of starting solutions (most of which are incorrect) to obtain Z scores. The total score for a solution is the sum of its Z scores after correction for anomalously high scores in any category.

The first criterion used by Solve for evaluating a trial heavy-atom solution is the agreement between calculated and observed Patterson functions. Comparisons of this type have always been important in the MIR and MAD methods (Blundell & Johnson, 1976[link]). The score for Patterson-function agreement is the average value of the Patterson function at predicted locations of peaks, after multiplication by a weighting factor based on the number of heavy-atom sites in the trial solution. The weighting factor (Terwilliger & Berendzen, 1999b[link]) is adjusted so that if two solutions have the same mean value at predicted Patterson peaks, the one with the larger numbers of sites receives the higher score. Typically the weighting factor is approximately given by [(N)^{1/2}], where there are N sites in the solution.

In some cases, predicted Patterson vectors fall on high peaks that are not related to the heavy-atom solution. To exclude these contributions, occupancies of each heavy-atom site are refined so that the predicted peak heights approximately match the observed peak heights at the predicted interatomic positions. Then all peaks with heights more than 1σ higher than their predicted values are truncated at this height. The average values are further corrected for instances where more than one predicted Patterson vector falls on the same location by scaling that peak height by the fraction of predicted vectors that are unique.

A `cross-validation' difference Fourier analysis is the basis of the second criterion used to evaluate heavy-atom solutions. One at a time, each site in a solution (and any equivalent sites in other derivatives for MIR solutions) is omitted from the heavy-atom model and phases are recalculated. These phases are used in a difference Fourier analysis and the peak height at the location of the omitted site is noted. A similar analysis where a derivative is omitted from phasing and all other derivatives are used to phase a difference Fourier has been used for many years (Dickerson et al., 1961[link]). The score for cross-validation difference Fouriers is the average peak height, after weighting by the same factor used in the difference Patterson analysis.

The mean figure of merit of phasing (m) (Blundell & Johnson, 1976[link]) can be a remarkably useful measure of the quality of phasing despite its susceptibility to systematic error (Terwilliger & Berendzen, 1999b[link]). The overall figure of merit is essentially a measure of the internal consistency of the heavy-atom solution and the data, and is used as the third criterion for solution quality in Solve. As heavy-atom refinement in Solve is carried out using origin-removed Patterson refinement (Terwilliger & Eisenberg, 1983[link]), occupancies of heavy-atom sites are relatively unbiased. This minimizes the problem of high occupancies leading to inflated figures of merit. Additionally, using a single procedure for phasing allows comparison between solutions. The score based on figure of merit is simply the unweighted mean for all reflections included in phasing.

The most important criterion used by a crystallographer in evaluating the quality of a heavy-atom solution is the interpretability of the resulting electron-density map. Although a full implementation of such a criterion is difficult, it is quite straightforward to evaluate instead whether the electron-density map has features that are expected for a crystal of a macromolecule. A number of features of electron-density maps could be used for this purpose, including the connectivity of electron density in the maps (Baker et al., 1993[link]), the presence of clearly defined regions of protein and solvent (Wang, 1985[link]; Podjarny et al., 1987[link]; Zhang & Main, 1990[link]; Xiang et al., 1993[link]; Abrahams et al., 1994[link]; Terwilliger & Berendzen, 1999a[link],c[link]), and histogram matching of electron densities (Zhang & Main, 1990[link]; Goldstein & Zhang, 1998[link]). We have used the identification of solvent and protein regions as the measure of map quality in Solve. This requires that there be both solvent and protein regions in the electron-density map, but for most macromolecular structures the fraction of the unit cell that is occupied by the macromolecule is in the suitable range of 30–70%. The criterion used in scoring by Solve is based on the connectivity of the solvent and protein regions (Terwilliger & Berendzen, 1999c[link]). The unit cell is divided into boxes approximately twice the resolution of the map on a side, and within each box the r.m.s. electron density is calculated, without including the [F_{000}] term in the Fourier synthesis. For boxes within the protein region, this r.m.s. electron density will typically be high (as there are some points where atoms are located and other points between atoms), while for those in the solvent region it will be low (as the electron density is fairly uniform). The score based on the connectivity of the protein and solvent regions is simply the correlation coefficient of this r.m.s. electron density for adjacent boxes. If there is a large contiguous protein region and a large contiguous solvent region, then adjacent boxes will have highly correlated values of their r.m.s. electron densities. If the electron density is random, there will be little or no correlation. In practice, for a very good electron-density map, this correlation of local r.m.s. electron density may be as high as 0.5 or 0.6. Automated MIR and MAD structure determination

| top | pdf |

The four-point scoring scheme described above provides the foundation for automated structure solution. To make it practical, the conversion of MAD data to a pseudo-SIRAS form and the use of rapid origin-removed Patterson-based heavy-atom refinement has been nearly essential. The remainder of the Solve algorithm for automated structure solution is largely a standardized form of local scaling, an integrated set of routines to carry out all of the calculations required for heavy-atom searching, refinement and phasing, and routines to keep track of the lists of current solutions being examined and past solutions that have already been tested.

Scaling of data in the Solve algorithm is done by a local scaling procedure (Matthews & Czerwinski, 1975[link]). Systematic errors are minimized by scaling [F^{+}] and [F^{-}], native and derivative, and wavelengths of MAD data in very similar ways and by keeping different data sets separate until the end of scaling. The scaling procedure is optimized for cases where the data are collected in a systematic fashion. For both MIR and MAD data, the overall procedure is to construct a reference data set that is as complete as possible and that contains information either from a native data set (for MIR) or for all wavelengths (for MAD data). This reference data set is constructed for just the asymmetric unit of data and is essentially the average of all measurements obtained for each reflection. The reference data set is then expanded to the entire reciprocal lattice and used as the basis for local scaling of each individual data set [see Terwilliger & Berendzen (1999b)[link] for additional details].

Once MIR data have been scaled, or MAD data have been scaled and converted to a pseudo-SIRAS form, difference Patterson functions are used to identify plausible one-site or two-site heavy-atom solutions. For MIR data, difference Patterson functions are calculated for each derivative. For MAD data, anomalous and dispersive differences are combined to yield a Bayesian estimate of the Patterson function for the anomalously scattering atoms (Terwilliger, 1994a[link]). An automated search of the Patterson function is then used to find a large number (typically 30) of potential single-site and two-site solutions. In principle, Patterson methods could be used to solve the complete heavy-atom substructure, but the approach used in Solve is to find just the first one or two heavy-atom sites in this way and to find all others by difference Fourier analysis. This initial set of one-site and two-site solutions becomes the initial list of potential solutions (`seeds') for automated structure solution. Once each of the potential seeds is scored and ranked, the top seeds (typically five) are selected as independent starting points for the search for heavy-atom solutions.

For each starting solution (seed), the main cycle in the automated structure-solution algorithm used by Solve consists of two basic steps. The first is to refine heavy-atom parameters and rank all existing solutions generated so far from this seed based on the four criteria discussed above. The second is to take the highest-ranking solution that has not yet been exhaustively analysed and use it in an attempt to generate a more complete solution. Generation of new solutions is carried out in three ways: by deletion of sites, by addition of sites from difference Fouriers, and by inversion. A partial solution is considered to have been exhaustively analysed when all single-site deletions have been considered, when no more peaks in a difference Fourier can be found that improve upon it, and when inversion does not improve it, or when the maximum number of sites input by the user has been reached. In each case, new solutions generated in these three ways are refined, scored and ranked, and the cycle is continued until all the top solutions have been fully analysed and no new solutions are found. Throughout this process, a tally of the solutions that have already been considered is kept, and any time a solution is a duplicate of a previously examined solution it is dropped.

In some cases, one very clear solution appears early in the structure-solution process, while in others, there are several solutions that have similar scores at early (and sometimes even late) stages of structure solution. In cases where no one solution is much better than the others, all the seeds are exhaustively analysed. On the other hand, if a very promising solution emerges from one seed, then the search is narrowed to focus on that seed, deletions are not carried out until the end of the analysis, and many peaks from the difference Fourier analysis are added at a time so as to build up the solution as quickly as possible. Once the expected number of heavy-atom sites are found, then each site is deleted in turn to see if the solution can be further improved. If this occurs, then the new solutions are analysed in the same way by addition and deletion of sites and by inversion until no improvement is obtained.

At the conclusion of the Solve algorithm, an electron-density map and phases for the top solution are reported in a form that is compatible with the CCP4 suite (Collaborative Computational Project, Number 4, 1994[link]). Additionally, command files that can be modified to look for additional heavy-atom sites or to construct other electron-density maps are produced. If more than one possible solution is found, the heavy-atom sites and phasing statistics for all of them are reported. Generation of model X-ray data sets

| top | pdf |

An important feature of Solve is the inclusion of modules for the generation of model data. Solve can construct model raw X-ray data for either MIR or MAD cases. The macromolecular structure can be defined by a file in PDB format (Bernstein et al., 1977[link]) with heavy-atom parameters defined by the user. Any degree of `experimental' uncertainty in measurement of intensities can be included, and limited non-isomorphism for MIR data in which cell dimensions differ for native and any of the derivative data sets (but in which the macromolecular structure is identical) can be included. This automatic generation of model data is very useful in evaluating what can and what cannot be solved. Once a data set has been generated, the Solve algorithm can be used to attempt to solve it. Solve generates a model electron-density map based on the input coordinates, and during the structure-solution process all maps calculated with trial solutions can be compared to the model map. In many cases, heavy-atom solutions can be related to different origins (and to different handedness as well). The origin shift is identified by Solve by finding the shift that best maps the trial solution onto the (known) correct solution. Conclusions

| top | pdf |

The Solve algorithm is very useful for solving macromolecular structures by the MIR and MAD methods. It has been used to solve MAD structures with as many as 56 selenium atoms in the asymmetric unit (W. Smith & C. Janson, personal communication). From the user's point of view, the algorithm is very simple. Only a few input parameters are needed in most cases, and the Solve algorithm carries out the entire process automatically. In principle, the procedure can be very thorough as well, so that many trial starting solutions can be examined and difficult heavy-atom structures can be found. Additionally, for the most difficult structure-solution cases, the failure to find a solution can be useful in confirming that additional information is needed. Software availability

| top | pdf |

The Solve software and complete documentation can be obtained from the web site .


TCT and JB gratefully acknowledge support from the National Institutes of Health and the US Department of Energy.


Abrahams, J. P., Leslie, A. G. W., Lutter, R. & Walker, J. E. (1994). Structure at 2.8-angstrom resolution of f1-ATPase from bovine heart-mitochondria. Nature (London), 370, 621–628.Google Scholar
Als-Nielsen, J. & McMorrow, D. F. (2001). Elements of modern X-ray physics. New York: John Wiley & Sons.Google Scholar
Baker, D., Krukowski, A. E. & Agard, D. A. (1993). Uniqueness and the ab initio phase problem in macromolecular crystallography. Acta Cryst. D49, 186–192.Google Scholar
Bellizzi, J. J. III, Widom, J., Kemp, C. W. & Clardy, J. (1999). Producing selenomethionine-labeled proteins with a baculovirus expression vector system. Structure, 7, R263–R267.Google Scholar
Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). Protein data bank: computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542.Google Scholar
Bijvoet, J. M. (1949). Phase determination in direct Fourier-synthesis of crystal structures. Proc. Acad. Sci. Amst. B52, 313–314.Google Scholar
Blow, D. M. (1958). The structure of haemoglobin. VII. Determination of phase angles in the non-centrosymmetric [100] zone. Proc. R. Soc. London Ser. A, 247, 302–335.Google Scholar
Blundell, T. L. & Johnson, L. N. (1976). Protein crystallography. p. 368. New York: Academic Press.Google Scholar
Burling, F. T., Weis, W. I., Flaherty, K. M. & Brünger, A. T. (1996). Direct observation of protein solvation and discrete disorder with experimental crystallographic phases. Science, 271, 72–77.Google Scholar
Chang, G. & Lewis, M. (1994). Using genetic algorithms for solving heavy-atom sites. Acta Cryst. D50, 667–674.Google Scholar
Collaborative Computational Project, Number 4 (1994). The CCP4 suite: programs for protein crystallography. Acta Cryst. D50, 760–763.Google Scholar
Crick, F. H. C. & Magdoff, B. S. (1956). The theory of the method of isomorphous replacement for protein crystals. I. Acta Cryst. 9, 901–908.Google Scholar
Cromer, D. T. & Liberman, D. (1970a). Relativistic calculation of anomalous scattering factors for X-rays. J. Chem. Phys. 53, 1891–1898.Google Scholar
Cromer, D. T. & Liberman, D. (1970b). Relativistic calculation of anomalous scattering factors for X-rays. Report LA-4403. Los Alamos National Laboratory, USA.Google Scholar
Dickerson, R. E., Kendrew, J. C. & Strandberg, B. E. (1961). The crystal structure of myoglobin: phase determination to a resolution of 2 Å by the method of isomorphous replacement. Acta Cryst. 14, 1188–1195.Google Scholar
Doublié, S. (1997). Preparation of selenomethionyl proteins for phase determination. Methods Enzymol. 276, 523–530.Google Scholar
Fanchon, E. & Hendrickson, W. A. (1990). Effect of the anisotropy of anomalous scattering on the MAD phasing method. Acta Cryst. A46, 809–820.Google Scholar
Goldstein, A. & Zhang, K. Y. J. (1998). The two-dimensional histogram as a constraint for protein phase improvement. Acta Cryst. D54, 1230–1244.Google Scholar
Hendrickson, W. A. (1985). Analysis of protein structure from diffraction measurement at multiple wavelengths. Trans. Am. Crystallogr. Assoc. 21, 11–21.Google Scholar
Hendrickson, W. A., Horton, J. R. & LeMaster, D. M. (1990). Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD): a vehicle for direct determination of three-dimensional structure. EMBO J. 9, 1665–1672.Google Scholar
Hendrickson, W. A. & Lattman, E. E. (1970). Representation of phase probability distributions for simplified combination of independent phase information. Acta Cryst. B26, 136–143.Google Scholar
Hendrickson, W. A. & Ogata, C. M. (1997). Phase determination from multiwavelength anomalous diffraction measurements. Methods Enzymol. 276, 494–523.Google Scholar
Hendrickson, W. A., Smith, J. L., Phizackerley, R. P. & Merritt, E. A. (1988). Crystallographic structure analysis of lamprey hemoglobin from anomalous dispersion of synchrotron radiation. Proteins Struct. Funct. Genet. 4, 77–88.Google Scholar
Hendrickson, W. A. & Teeter, M. M. (1981). Structure of the hydrophobic protein crambin determined directly from the anomalous scattering of sulphur. Nature (London), 290, 107–113.Google Scholar
Hoppe, W. & Jakubowski, U. (1975). The determination of phases of erythrocruorin using the two-wavelength method with iron as anomalous scatterer. In Anomalous scattering, edited by S. Ramaseshan & S. C. Abrahams, 3–11. Copenhagen: Munksgaard.Google Scholar
James, R. W. (1948). The optical principles of the diffraction of X-rays. Reprinted (1982) Ox Bow Press, Woodbridge, CT.Google Scholar
Karle, J. (1980). Some developments in anomalous dispersion for the structural investigation of macromolecular systems in biology. Int. J. Quantum Chem. Quantum Biol. Symp. 7, 357–367.Google Scholar
Krahn, J. M., Sinha, S. & Smith, J. L. (1999). Successes and prospects for SeMet MAD and large structures. Trans. Am. Crystallogr. Assoc. 35, 27–38.Google Scholar
La Fortelle, E. de & Bricogne, G. (1997). Maximum-likelihood heavy-atom parameter refinement for multiple isomorphous replacement and multiwavelength anomalous diffraction methods. Methods Enzymol. 276, 472–494.Google Scholar
Lustbader, J. W., Wu, H., Birken, S., Pollak, S., Kolks-Gawinowicz, M. A., Pound, A. M., Austen, D., Hendrickson, W. A. & Canfield, R. E. (1995). The expression, characterization and crystallization of wild-type and selenomethionyl human chorionic gonadotropin. Endocrinology, 136, 640–650.Google Scholar
Matthews, B. W. (1966a). The extension of the isomorphous replacement method to include anomalous scattering measurements. Acta Cryst. 20, 82–86.Google Scholar
Matthews, B. W. (1966b). The determination of the position of anomalously scattering heavy atom groups in protein crystals. Acta Cryst. 20, 230–239.Google Scholar
Matthews, B. W. & Czerwinski, E. W. (1975). Local scaling: a method to reduce systematic errors in isomorphous replacement and anomalous scattering measurements. Acta Cryst. A31, 480–487.Google Scholar
Miller, R., Gallo, S. M., Khalak, H. G. & Weeks, C. M. (1994). SnB: crystal structure determination via shake-and-bake. J. Appl. Cryst. 27, 613–621.Google Scholar
North, A. C. T. (1965). The combination of isomorphous replacement and anomalous scattering data in phase determination of non-centrosymmetric reflexions. Acta Cryst. 18, 212–216.Google Scholar
Okaya, Y. & Pepinsky, R. (1956). New formulation and solution of the phase problem in X-ray analysis of noncentric crystals containing anomalous scatterers. Phys. Rev. 103, 1645–1647.Google Scholar
Pähler, A., Smith, J. L. & Hendrickson, W. A. (1990). A probability representation for phase information from multiwavelength anomalous dispersion. Acta Cryst. A46, 537–540.Google Scholar
Podjarny, A. D., Bhat, T. N. & Zwick, M. (1987). Improving crystallographic macromolecular images: the real-space approach. Annu. Rev. Biophys. Biophys. Chem. 16, 351–373.Google Scholar
Ramakrishnan, V. & Biou, V. (1997). Treatment of multiwavelength anomalous diffraction data as a special case of multiple isomorphous replacement. Methods Enzymol. 276, 538–557.Google Scholar
Rossmann, M. G. (1961). The position of anomalous scatterers in protein crystals. Acta Cryst. 14, 383–388.Google Scholar
Sharff, A. J., Koronakis, E., Luisi, B. & Koronakis, V. (2000). Oxidation of selenomethionine: some MADness in the method! Acta Cryst. D56, 785–788.Google Scholar
Sheldrick, G. M. (1990). Phase annealing in SHELX-90: direct methods for larger structures. Acta Cryst. A46, 467–473.Google Scholar
Smith, J. L. (1998). Multiwavelength anomalous diffraction in macromolecular crystallography. In Direct methods for solving macromolecular structures, edited by S. Fortier, pp. 221–225. The Netherlands: CCLRC.Google Scholar
Smith, J. L. & Thompson, A. (1998). Reactivity of selenomethionine – dents in the magic bullet? Structure, 15, 815–819.Google Scholar
Templeton, L. K. & Templeton, D. H. (1988). Biaxial tensors for anomalous scattering of X-rays in selenolanthionine. Acta Cryst. A44, 1045–1051.Google Scholar
Templeton, L. K., Templeton, D. H., Phizackerley, R. P. & Hodgson, K. O. (1982). L3-edge anomalous scattering by gadolinium and samarium measured at high resolution with synchrotron radiation. Acta Cryst. A38, 74–78.Google Scholar
Terwilliger, T. C. (1994a). MAD phasing: Bayesian estimates of [F_{A}]. Acta Cryst. D50, 11–16.Google Scholar
Terwilliger, T. C. (1994b). MAD phasing: treatment of dispersive differences as isomorphous replacement information. Acta Cryst. D50, 17–23.Google Scholar
Terwilliger, T. C. (1997). Multiwavelength anomalous diffraction phasing of macromolecular structures: analysis of MAD data as single isomorphous replacement with anomalous scattering data using the MADMRG program. Methods Enzymol. 276, 530–537.Google Scholar
Terwilliger, T. C. & Berendzen, J. (1996). Correlated phasing of multiple isomorphous replacement data. Acta Cryst. D52, 749–757.Google Scholar
Terwilliger, T. C. & Berendzen, J. (1997). Bayesian correlated MAD phasing. Acta Cryst. D53, 571–579.Google Scholar
Terwilliger, T. C. & Berendzen, J. (1999a). Discrimination of solvent from protein regions in native Fouriers as a means of evaluating heavy-atom solutions in the MIR and MAD methods. Acta Cryst. D55, 501–505.Google Scholar
Terwilliger, T. C. & Berendzen, J. (1999b). Automated MIR and MAD structure solution. Acta Cryst. D55, 849–861.Google Scholar
Terwilliger, T. C. & Berendzen, J. (1999c). Evaluation of macromolecular electron-density map quality using the correlation of local r.m.s. density. Acta Cryst. D55, 1872–1877.Google Scholar
Terwilliger, T. C. & Eisenberg, D. (1983). Unbiased three-dimensional refinement of heavy-atom parameters by correlation of origin-removed Patterson functions. Acta Cryst. A39, 813–817.Google Scholar
Terwilliger, T. C. & Eisenberg, D. (1987). Isomorphous replacement: effects of errors on the phase probability distribution. Acta Cryst. A43, 6–13.Google Scholar
Terwilliger, T. C., Kim, S.-H. & Eisenberg, D. (1987). Generalized method of determining heavy-atom positions using the difference Patterson function. Acta Cryst. A43, 1–5.Google Scholar
Tesmer, J. J. G., Klem, T. J., Deras, M. L., Davisson, V. J. & Smith, J. L. (1996). The crystal structure of GMP synthetase reveals a novel catalytic triad and is a structural paradigm for two enzyme families. Nature Struct. Biol. 3, 74–86.Google Scholar
Vagin, A. & Teplyakov, A. (1998). A translation-function approach for heavy-atom location in macromolecular crystallography. Acta Cryst. D54, 400–402.Google Scholar
Wang, B.-C. (1985). Resolution of phase ambiguity in macromolecular crystallography. Methods Enzymol. 115, 90–112.Google Scholar
Weis, W. I., Kahn, R., Fourme, R., Drickamer, K. & Hendrickson, W. A. (1991). Structure of the calcium-dependent lectin domain from a rat mannose-binding protein determined by MAD phasing. Science, 254, 1608–1615.Google Scholar
Wilson, A. J. C. (1942). Determination of absolute from relative X-ray intensity data. Nature (London), 150, 151–152.Google Scholar
Xiang, S., Carter, C. W. Jr, Bricogne, G. & Gilmore, C. J. (1993). Entropy maximization constrained by solvent flatness: a new method for macromolecular phase extension and map improvement. Acta Cryst. D49, 193–212.Google Scholar
Zhang, K. Y. J. & Main, P. (1990). The use of Sayre's equation with solvent flattening and histogram matching for phase extension and refinement of protein structures. Acta Cryst. A46, 377–381.Google Scholar

to end of page
to top of page