InternationalReciprocal spaceTables for Crystallography Volume B Edited by U. Shmueli © International Union of Crystallography 2006 |
International Tables for Crystallography (2006). Vol. B, ch. 2.2, pp. 231-234
## Section 2.2.10. Direct methods in macromolecular crystallography |

Protein structures cannot be solved *ab initio* by traditional direct methods (*i.e.*, by application of the tangent formula alone). Accordingly, the first applications were focused on two tasks:

The application of standard tangent techniques to (*a*) and (*b*) has not been found to be very satisfactory (Coulter & Dewar, 1971; Hendrickson *et al.*, 1973; Weinzierl *et al.*, 1969). Tangent methods, in fact, require atomicity and non-negativity of the electron density. Both these properties are not satisfied if data do not extend to atomic resolution . Because of series termination and other errors the electron-density map at presents large negative regions which will appear as false peaks in the squared structure. However, tangent methods use only a part of the information given by the Sayre equation (2.2.6.5). In fact, (2.2.6.5) express two equations relating the radial and angular parts of the two sides, so obtaining a large degree of overdetermination of the phases. To achieve this Sayre (1972) [see also Sayre & Toupin (1975)] suggested minimizing (2.2.10.1) by least squares as a function of the phases: Even if tests on rubredoxin (extensions of phases from 2.5 to 1.5 Å resolution) and insulin (Cutfield *et al.*, 1975) (from 1.9 to 1.5 Å resolution) were successful, the limitations of the method are its high cost and, especially, the higher efficiency of the least-squares method. Equivalent considerations hold for the application of determinantal methods to proteins [see Podjarny *et al.* (1981); de Rango *et al.* (1985) and literature cited therein].

A question now arises: why is the tangent formula unable to solve protein structures? Fan *et al*. (1991) considered the question from a first-principle approach and concluded that:

Sheldrick (1990) suggested that direct methods are not expected to succeed if fewer than half of the reflections in the range 1.1–1.2 Å are observed with (a condition seldom satisfied by protein data).

The most complete analysis of the problem has been made by Giacovazzo, Guagliardi *et al*. (1994). They observed that the expected value of α (see Section 2.2.7) suggested by the tangent formula for proteins is comparable with the variance of the α parameter. In other words, for proteins the signal determining the phase is comparable with the noise, and therefore the phase indication is expected to be unreliable.

Section 2.2.10.1 suggests that the mere use of the tangent formula or the Sayre equation cannot solve *ab initio* protein structures of usual size. However, even in an *ab initio* situation, there is a source of supplementary information which may be used. Good examples are the `peaklist optimization' procedure (Sheldrick & Gould, 1995) and the *SIR*97 procedure (Altomare *et al*., 1999) for refining and completing the trial structure offered by the first *E* map.

In both cases there are reasons to suspect that the correct structure is sometimes extracted from a totally incorrect direct-methods solution. These results suggest that a direct-space procedure can provide some form of structural information complementary to that used in reciprocal space by the tangent or similar formulae. The combination of real- and reciprocal-space techniques could therefore enlarge the size of crystal structures solvable by direct methods. The first program to explicitly propose the combined use of direct and reciprocal space was *Shake and Bake* (*SnB*), which inspired a second package, *half-bake* (*HB*). A third program, *SIR*99, uses a different algorithm.

The *SnB* method (DeTitta *et al.*, 1994; Weeks *et al.*, 1994; Hauptman, 1995) is the heir of the *cosine least-squares method* described in Section 2.2.8, point (4). The function where is the triplet phase, and .

is expected to have a global minimum, provided the number of phases involved is sufficiently large, when all the phases are equal to their true values for some choice of origin and enantiomorph. Thus the phasing problem reduces to that of finding the global minimum of (the *minimum principle*).

*SnB* comprises a *shake* step (phase refinement) and a *bake* step (electron-density modification), the second step aiming to impose phase constraints implicit in real space. Accordingly, the program requires two Fourier transforms per cycle, and numerous cycles. Thus it may be very time consuming and it is not competitive with other direct methods for the solution of the crystal structures of small molecules. However, it introduced into the field the tremendous usefulness of intensive computations for the direct solution of complex crystal structures.

Owing to Sheldrick (1997), *HB* does most of its work in direct space. Random atomic positions are generated, to which a modified *peaklist optimization* process is applied. A number of peaks are eliminated subject to the condition that remains as large as possible (only reflections with are involved, where ). The phases of a suitable subset of reflections are then used as input for a tangent expansion. Then an *E* map is calculated from which peaks are selected: these are submitted to the elimination procedure.

Typically 5–20 cycles of this internal loop are performed. Then a correlation coefficient (CC) between and is calculated for all the data. If the CC is good (*i.e.* larger than a given threshold), then a new loop is performed: a new *E* map is obtained, from which a list of peaks is selected for submission to the elimination procedure. The criterion now is the value of the CC, which is calculated for all the reflections. Typically two to five cycles of this external loop are performed.

The program works indefinitely, restarting from random atoms until interrupted. It may work either by applying the true space-group symmetry or after having expanded the data to *P*1.

The *SIR*99 procedure (Burla *et al*., 1999) may be divided into two distinct parts: the tangent section (*i.e.*, a double tangent process using triplet and quartet invariants) is followed by a real-space refinement procedure. As in *SIR*97, the reciprocal-space part is followed by the real-space refinement, but this time this last part is much more complex. It involves three different techniques: EDM (an electron-density modification process), the HAFR part (in which all the peaks are associated with the heaviest atomic species) and the DLSQ procedure (a least-squares Fourier refinement process). The atomicity is gradually introduced into the procedure. The entire process requires, for each trial, several cycles of EDM and HAFR: the real-space part is able to lead to the correct solution even when the tangent formula does not provide favourable phase values.

The modulus of the isomorphous difference may be assumed at a first approximation as an estimate of the heavy-atom s.f. . Normalization of 's and application of the tangent formula may reveal the heavy-atom structure (Wilson, 1978).

The theoretical basis for integrating the techniques of direct methods and isomorphous replacement was introduced by Hauptman (1982*a*). According to his notation let us denote by and atomic scattering factors for the atom labelled *j* in a pair of isomorphous structures, and let and denote corresponding normalized structure factors. Then where The conditional probability of the two-phase structure invariant given and is (Hauptman, 1982*a*) where Three-phase structure invariants were evaluated by considering that eight invariants exist for a given triple of indices **h**, **k**, **l** : So, for the estimation of any , the joint probability distribution has to be studied, from which eight conditional probability densities can be obtained: for .

The analytical expressions of are too intricate and are not given here (the reader is referred to the original paper). We only say that may be positive or negative, so that reliable triplet phase estimates near 0 or near π are possible: the larger , the more reliable the phase estimate.

A useful interpretation of the formulae in terms of experimental parameters was suggested by Fortier *et al.* (1984): according to them, distributions do not depend, as in the case of the traditional three-phase invariants, on the total number of atoms per unit cell but rather on the scattering difference between the native protein and the derivative (that is, on the scattering of the heavy atoms in the derivative).

Hauptman's formulae were generalized by Giacovazzo *et al.* (1988): the new expressions were able to take into account the resolution effects on distribution parameters. The formulae are completely general and include as special cases native protein and heavy-atom isomorphous derivatives as well as X-ray and neutron diffraction data. Their complicated algebraic forms are easily reduced to a simple expression in the case of a native protein heavy-atom derivative: in particular, the reliability parameter for is where indices *P* and *H* warn that parameters have to be calculated over protein atoms and over heavy atoms, respectively, and Δ is a pseudo-normalized difference (with respect to the heavy-atom structure) between moduli of structure factors.

Equation (2.2.10.2) may be compared with Karle's (1983) qualitative rule: if the sign of is plus then the value of is estimated to be zero; if its sign is minus then the expected value of is close to π. In practice Karle's rule agrees with (2.2.10.2) only if the Cochran-type term in (2.2.10.2) may be neglected. Furthermore, (2.2.10.2) shows that large reliability values do not depend on the triple product of structure-factor differences, but on the triple product of pseudo-normalized differences. A series of papers (Giacovazzo, Siliqi & Ralph, 1994; Giacovazzo, Siliqi & Spagna, 1994; Giacovazzo, Siliqi & Platas, 1995; Giacovazzo, Siliqi & Zanotti, 1995; Giacovazzo *et al*., 1996) shows how equation (2.2.10.2) may be implemented in a direct procedure which proved to be able to estimate the protein phases correctly without any preliminary information on the heavy-atom substructure.

Combination of direct methods with the two-derivative case is also possible (Fortier *et al.*, 1984) and leads to more accurate estimates of triplet invariants provided experimental data are of sufficient accuracy.

If the frequency of the radiation is close to an absorption edge of an atom, then that atom will scatter the X-rays anomalously (see Chapter 2.4 ) according to . This results in the breakdown of Friedel's law. It was soon realized that the Bijvoet difference could also be used in the determination of phases (Peerdeman & Bijvoet, 1956; Ramachandran & Raman, 1956; Okaya & Pepinsky, 1956). Since then, a great deal of work has been done both from algebraic (see Chapter 2.4 ) and from probabilistic points of view. In this section we are only interested in the second.

We will mention the following different cases:

Probability distributions of diffraction intensities and of selected functions of diffraction intensities for dispersive structures have been given by various authors [Parthasarathy & Srinivasan (1964), see also Srinivasan & Parthasarathy (1976) and relevant literature cited therein]. We describe here some probabilistic formulae for estimating invariants of low order.

Let us now describe some practical aspects of the integration of direct methods with OAS techniques.

Anomalous difference structure factors can be used for locating the positions of the anomalous scatterers (Mukherjee *et al.*, 1989). Tests prove that accuracy in the difference magnitudes is critical for the success of the phasing process.

Suppose now that the positions of the heavy atoms have been found. How do we estimate the phase values for the protein? The phase ambiguity strictly connected with OAS techniques can be overcome by different methods: we quote the Qs method by Hao & Woolfson (1989), the Wilson distribution method and the MPS method by Ralph & Woolfson (1991), and the Bijvoet–Ramachandran–Raman method by Peerdeman & Bijvoet (1956), Raman (1959) and Moncrief & Lipscomb (1966). More recently, a probabilistic method by Fan & Gu (1985) gained additional insight into the problem.

Isomorphous replacement and anomalous scattering are discussed in Chapter 2.4
and in *IT* F
(2001). We observe here only that the SIRAS case can lead algebraically to unambiguous phase determination provided the experimental data are sufficiently good. Thus, any probabilistic treatment must take into consideration errors in the measurements.

In the MIRAS and MAD cases the system is overconditioned: again any probabilistic treatment must consider errors in the measurements, but now overconditioning allows the reduction of the perverse effects of the experimental errors and (in MIRAS) of the lack of isomorphism.

A particular application of extreme relevance concerns the location of anomalous scatterers when selenomethionine-substituted proteins and MAD data are available (Hendrickson & Ogata, 1997; Smith, 1998). In this case, many selenium sites should be identified and usual Patterson-interpretation methods can be expected to fail. The successes of *SnB* and *HB* prove the essential role of direct methods in this important area.

### References

*International Tables for Crystallography*(2001). Vol. F.

*Macromolecular crystallography*, edited by M. G. Rossmann & E. Arnold. Dordrecht: Kluwer Academic Publishers.

Altomare, A., Burla, M. C., Camalli, M., Cascarano, G. L., Giacovazzo, C., Guagliardi, A., Moliterni, A. G. G., Polidori, G. & Spagna, R.(1999).

*SIR97: a new tool for crystal structure determination and refinement. J. Appl. Cryst.*

**32**, 115–119.

Burla, M. C., Camalli, M., Carrozzini, B., Cascarano, G. L., Giacovazzo, C., Polidori, G. & Spagna, R. (1999).

*SIR99, a program for the automatic solution of small and large crystal structures. Acta Cryst.*A

**55**, 991–999.

Cascarano, G. & Giacovazzo, C. (1985).

*One-wavelength technique: some probabilistic formulas using the anomalous dispersion effect. Acta Cryst.*A

**41**, 408–413.

Coulter, C. L. & Dewar, R. B. K. (1971).

*Tangent formula applications in protein crystallography: an evaluation. Acta Cryst.*B

**27**, 1730–1740.

Cutfield, J. F., Dodson, E. J., Dodson, G. G., Hodgkin, D. C., Isaacs, N. W., Sakabe, K. & Sakabe, N. (1975).

*The high resolution structure of insulin: a comparison of results obtained from least-squares phase refinement and difference Fourier refinement. Acta Cryst.*A

**31**, S21.

DeTitta, G. T., Weeks, C. M., Thuman, P., Miller, R. & Hauptman, H. A. (1994).

*Structure solution by minimal-function phase refinement and Fourier filtering. I. Theoretical basis.*

*Acta Cryst.*A

**50**, 203–210.

Fan, H. F., Hao, Q. & Woolfson, M. M. (1991).

*Proteins and direct methods. Z. Kristallogr.*

**197**, 197–208.

Fan, H.-F. & Gu, Y.-X. (1985).

*Combining direct methods with isomorphous replacement or anomalous scattering data. III. The incorporation of partial structure information. Acta Cryst.*A

**41**, 280–284.

Fortier, S., Weeks, C. M. & Hauptman, H. (1984).

*On integrating the techniques of direct methods and isomorphous replacement. III. The three-phase invariant for the native and two-derivative case. Acta Cryst.*A

**40**, 646–651.

Giacovazzo, C. (1983

*b*).

*The estimation of two-phase invariants in*

*when anomalous scatterers are present. Acta Cryst.*A

**39**, 585–592.

Giacovazzo, C. (1987).

*One wavelength technique: estimation of centrosymmetrical two-phase invariants in dispersive structures. Acta Cryst.*A

**43**, 73–75.

Giacovazzo, C., Cascarano, G. & Zheng, C.-D. (1988).

*On integrating the techniques of direct methods and isomorphous replacement. A new probabilistic formula for triplet invariants. Acta Cryst.*A

**44**, 45–51.

Giacovazzo, C., Guagliardi, A., Ravelli, R. & Siliqi, D. (1994).

*Ab initio direct phasing of proteins: the limits. Z. Kristallogr.*

**209**, 136–142.

Giacovazzo, C., Siliqi, D. & Platas, J. G. (1995).

*The ab initio crystal structure solution of proteins by direct methods. V. A new normalizing procedure.*

*Acta Cryst.*A

**51**, 811–820.

Giacovazzo, C., Siliqi, D., Platas, J. G., Hecht, H.-J., Zanotti, G. & York, B. (1996).

*The ab initio crystal structure solution of proteins by direct methods. VI. Complete phasing up to derivative resolution.*

*Acta Cryst.*D

**52**, 813–825.

Giacovazzo, C., Siliqi, D. & Ralph, A. (1994).

*The ab initio crystal structure solution of proteins by direct methods. I. Feasibility.*

*Acta Cryst.*A

**50**, 503–510.

Giacovazzo, C., Siliqi, D. & Spagna, R. (1994).

*The ab initio crystal structure solution of proteins by direct methods. II. The procedure and its first applications.*

*Acta Cryst*. A

**50**, 609–621.

Giacovazzo, C., Siliqi, D. & Zanotti, G. (1995).

*The ab initio crystal structure solution of proteins by direct methods. III. The phase extension process.*

*Acta Cryst.*A

**51**, 177–188.

Hao, Q. & Woolfson, M. M. (1989).

*Application of the P*

_{s}

*-function method to macromolecular structure determination.*

*Acta Cryst.*A

**45**, 794–797.

Hauptman, H. (1982

*a*).

*On integrating the techniques of direct methods and isomorphous replacement. I. The theoretical basis. Acta Cryst.*A

**38**, 289–294.

Hauptman, H. (1982

*b*).

*On integrating the techniques of direct methods with anomalous dispersion. I. The theoretical basis. Acta Cryst.*A

**38**, 632–641.

Hauptman, H. (1995).

*Looking ahead.*

*Acta Cryst.*B

**51**, 416–422.

Heinermann, J. J. L., Krabbendam, H., Kroon, J. & Spek, A. L. (1978).

*Direct phase determination of triple products from Bijvoet inequalities. II. A probabilistic approach. Acta Cryst.*A

**34**, 447–450.

Hendrickson, W. A., Love, W. E. & Karle, J. (1973).

*Crystal structure analysis of sea lamprey hemoglobin at 2 Å resolution. J. Mol. Biol.*

**74**, 331–361.

Hendrickson, W. A. & Ogata, C. M. (1997).

*Phase determination from multiwavelength anomalous diffraction measurements.*

*Methods Enzymol.*

**276**, 494–523.

Karle, J. (1983).

*A simple rule for finding and distinguishing triplet phase invariants with values near 0 or*π

*with isomorphous replacement data. Acta Cryst.*A

**39**, 800–805.

Karle, J. (1984).

*Rules for evaluating triplet phase invariants by use of anomalous dispersion data. Acta Cryst.*A

**40**, 4–11.

Karle, J. (1985).

*Many algebraic formulas for the evaluation of triplet phase invariants from isomorphous replacement and anomalous dispersion data. Acta Cryst.*A

**41**, 182–189.

Kroon, J., Spek, A. L. & Krabbendam, H. (1977).

*Direct phase determination of triple products from Bijvoet inequalities. Acta Cryst.*A

**33**, 382–385.

Moncrief, J. W. & Lipscomb, W. N. (1966).

*Structure of leurocristine methiodide dihydrate by anomalous scattering methods; relation to leurocristine (vincristine) and vincaleukoblastine (vinblastine).*

*Acta Cryst.*A

**21**, 322–331.

Mukherjee, A. K., Helliwell, J. R. & Main, P. (1989).

*The use of MULTAN to locate the positions of anomalous scatterers.*

*Acta Cryst.*A

**45**, 715–718.

Okaya, Y. & Pepinsky, R. (1956).

*New formulation and solution of the phase problem in X-ray analysis of non-centric crystals containing anomalous scatterers. Phys. Rev.*

**103**, 1645–1647.

Parthasarathy, S. & Srinivasan, R. (1964).

*The probability distribution of Bijvoet differences. Acta Cryst.*

**17**, 1400–1407.

Peerdeman, A. F. & Bijvoet, J. M. (1956).

*The indexing of reflexions in investigations involving the use of the anomalous scattering effect. Acta Cryst.*

**9**, 1012–1015.

Podjarny, A. D., Schevitz, R. W. & Sigler, P. B. (1981).

*Phasing low-resolution macromolecular structure factors by matricial direct methods. Acta Cryst.*A

**37**, 662–668.

Ralph, A. C. & Woolfson, M. M. (1991).

*On the application of one-wavelength anomalous scattering. III. The Wilson-distribution and MPS methods.*

*Acta Cryst.*A

**47**, 533–537.

Ramachandran, G. N. & Raman, S. (1956).

*A new method for the structure analysis of non-centrosymmetric crystals. Curr. Sci. (India)*,

**25**, 348.

Raman, S. (1959).

*Syntheses for the deconvolution of the Patterson function. Part II. Detailed theory for non-centrosymmetric crystals.*

*Acta Cryst.*

**12**, 964–975.

Rango, C. de, Mauguen, Y., Tsoucaris, G., Dodson, E. J., Dodson, G. G. & Taylor, D. J. (1985).

*The extension and refinement of the 1.9 Å spacing isomorphous phases to 1.5 Å spacing in 2Zn insulin by determinantal methods. Acta Cryst.*A

**41**, 3–17.

Sayre, D. (1972).

*On least-squares refinement of the phases of crystallographic structure factors. Acta Cryst.*A

**28**, 210–212.

Sayre, D. & Toupin, R. (1975).

*Major increase in speed of least-squares phase refinement. Acta Cryst.*A

**31**, S20.

Sheldrick, G. M. (1990).

*Phase annealing in SHELX-90: direct methods for larger structures.*

*Acta Cryst.*A

**46**, 467–473.

Sheldrick, G. M. (1997). In

*Direct methods for solving macromolecular structures.*NATO Advanced Study Institute, Erice, Italy.

Sheldrick, G. M. & Gould, R. O. (1995).

*Structure solution by iterative peaklist optimization and tangent expansion in space group*.

*Acta Cryst.*B

**51**, 423–431.

Smith, J. L. (1998).

*Multiwavelength anomalous diffraction in macromolecular crystallography*. In

*Direct methods for solving macromolecular structures*, edited by S. Fortier, pp. 221–225. Dordrecht: Kluwer Academic Publishers.

Srinivasan, R. & Parthasarathy, S. (1976).

*Some statistical applications in X-ray crystallography.*Oxford: Pergamon Press.

Weeks, C. M., DeTitta, G. T., Hauptman, H. A., Thuman, P. & Miller, R. (1994).

*Structure solution by minimal-function phase refinement and Fourier filtering. II. Implementation and applications.*

*Acta Cryst.*A

**50**, 210–220.

Weinzierl, J. E., Eisenberg, D. & Dickerson, R. E. (1969).

*Refinement of protein phases with the Karle–Hauptman tangent fomula. Acta Cryst.*B

**25**, 380–387.

Wilson, K. S. (1978).

*The application of MULTAN to the analysis of isomorphous derivatives in protein crystallography. Acta Cryst.*B

**34**, 1599–1608.