International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by M. G. Rossmann and E. Arnold © International Union of Crystallography 2006 
International Tables for Crystallography (2006). Vol. F, ch. 16.1, pp. 333345
https://doi.org/10.1107/97809553602060000689 Chapter 16.1. Ab initio phasing^{a}Institut für Anorganisch Chemie, Universität Göttingen, Tammannstrasse 4, D37077 Göttingen, Germany,^{b}Hauptman–Woodward Medical Research Institute, Inc., 73 High Street, Buffalo, NY 142031196, USA, and ^{c}Lehrstuhl für Strukturchemie, Universität Göttingen, Tammannstrasse 4, D37077 Göttingen, Germany The background and use of dualspace direct methods (also known as ShakeandBake) for the ab initio phasing of small proteins as well as the phasing of heavyatom substructures of large proteins is described. Basic concepts include normalized structure factors, multisolution procedures, random trial structures, phaserefinement formulas, peakpicking techniques, alternation of phase improvement in reciprocal and real space, and recognizing solutions. Two independent computer programs which implement the ShakeandBake algorithm, SnB and SHELXD, are compared and typical parameters are given. Other topics discussed are the use of Patterson information to get better starting phases, avoiding false minima, the effects of data resolution and completeness, special features of space group P1, and refinement strategies. 
Ab initio methods for solving the crystallographic phase problem rely on diffraction amplitudes alone and do not require prior knowledge of any atomic positions. General features that are not specific to the structure in question (e.g. the presence of disulfide bridges or solvent regions) can, however, be utilized. For the last three decades, most smallmolecule structures have been routinely solved by direct methods, a class of ab initio methods in which probabilistic phase relations are used to derive reflection phases from the measured amplitudes. Direct methods, implemented in widely used highly automated computer programs such as MULTAN (Main et al., 1980), SHELXS (Sheldrick, 1990), SAYTAN (Debaerdemaeker et al., 1985) and SIR (Burla et al., 1989), provide computationaly efficient solutions for structures containing fewer than approximately 100 independent nonH atoms. However, larger structures are not consistently amenable to these programs and, in fact, few unknown structures with more than 200 independent equal atoms have ever been solved using these programs.
Successful applications to native data for structures that could legitimately be regarded as small macromolecules awaited the development of a directmethods procedure (Weeks et al., 1993) that has come to be known as ShakeandBake. The distinctive feature of this procedure is the repeated and unconditional alternation of reciprocalspace phase refinement (Shaking) with a complementary realspace process that seeks to improve phases by applying constraints (Baking). Consequently, it yields a computerintensive algorithm, requiring two Fourier transformations during each cycle, which has been made feasible in recent years due to the tremendous increases in computer speed. The first previously unknown structures determined by ShakeandBake were two forms of the 100atom peptide ternatin (Miller et al., 1993). Subsequent applications of the ShakeandBake algorithm have involved structures containing as many as 2000 independent nonH atoms (Frazão et al., 1999) provided that accurate diffraction data have been measured to a resolution of 1.2 Å or better.
The basic theory underlying direct methods has been summarized in an excellent chapter (Giacovazzo, 2001) in IT B (Chapter 2.2 ) to which the reader is referred for details. The present chapter focuses on those aspects of direct methods that have proven useful for larger molecules (more than 250 independent nonH atoms) or are unique to the macromolecular field. These include directmethods applications that utilize anomalousdispersion measurements or multiple diffraction patterns [i.e., single isomorphous replacement (SIR), single anomalous scattering (SAS) and multiplewavelength data]. The easiest way to combine isomorphous or anomalousscattering information with direct methods is to first compute difference structure factors and then to apply direct methods to the difference data. Using this approach, the dualspace ShakeandBake procedure has been used to solve the anomalously scattering substructure of the selenomethionine derivative of an epimerase enzyme that has 70 selenium sites (Deacon & Ealick, 1999). Substructure applications require only the 2.5–3.0 Å data normally included in multiple wavelength anomalous dispersion (MAD) measurements, and data sets truncated even to 5 Å have led to solutions.
A formal integration of the probabilistic machinery of direct methods with isomorphous replacement and anomalous dispersion was initiated in 1982 (Hauptman, 1982a,b). Although practical applications of this and subsequent related theory have been limited so far, such applications are likely to have greater importance in the future, and progress is described in Sections 16.1.9.1 and 16.1.9.2. Similarly, the combination of direct methods with multiplebeam diffraction is still in its infancy. However, preliminary studies indicate that the information gleaned from multiplebeam data will greatly strengthen existing techniques (Weckert et al., 1993). Progress in this area is summarized in Section 16.1.9.3.
For purposes of directmethods computations, the usual structure factors, , are replaced by the normalized structure factors (Hauptman & Karle, 1953), where the angle brackets indicate probabilistic or statistical expectation values, the and are structurefactor magnitudes, the are the corresponding phases, k is the absolute scaling factor for the measured magnitudes, is an overall isotropic atomic meansquare displacement parameter, the are the atomic scattering factors for the N atoms in the unit cell, and the are factors that account for multiple enhancement of the average intensities for certain special reflection classes due to spacegroup symmetry (Shmueli & Wilson , 2001). The condition is always imposed. Unlike , which decreases as increases, the values of are constant for concentric resolution shells. Thus, the normalization process places all reflections on a common basis, and this is a great advantage with regard to the probability distributions that form the foundation for direct methods. Normalizing a set of reflections by means of equation (16.1.2.1) does not require any information about atomic positions. However, if some structural information, such as the configuration, orientation, or position of certain atomic groupings, is available, then this information can be applied to obtain a better model for the expected intensity distribution (Main, 1976). The distribution of values is, in principle and often in practice, independent of the unitcell size and contents, but it does depend on whether a centre of symmetry is present, as shown in Table 16.1.2.1.

Directmethods applications having the objective of locating SIR or SAS substructures require the computation of normalized difference structurefactor magnitudes, . This can, for example, be accomplished with the following series of programs from Blessing's datareduction and erroranalysis routines (DREAR): LEVY and EVAL for structurefactor normalization as specified by equation (16.1.2.1) (Blessing et al., 1996), LOCSCL for local scaling of the SIR and SAS magnitudes (Matthews & Czerwinski, 1975; Blessing, 1997), and DIFFE for computing the actual difference magnitudes (Blessing & Smith, 1999). The SnB program (see Section 16.1.7) provides a convenient interface to the DREAR suite.
Given the individual normalized structurefactor magnitudes and the atomic scattering factors which allow for the possibility of anomalous scattering, then greatestlowerbound estimates of SIR differenceE magnitudes are where is a leastsquaresfitted empirical renormalization scaling function, dependent on , that imposes the condition and serves to define and .
The phase problem of Xray crystallography may be defined as the problem of determining the phases ϕ of the normalized structure factors E when only the magnitudes are given. Owing to the atomicity of crystal structures and the redundancy of the known magnitudes, the phase problem is overdetermined and is, therefore, solvable in principle. This overdetermination implies the existence of relationships among the E's and, since the magnitudes are presumed to be known, the existence of identities among the phases that are dependent on the known magnitudes alone. The techniques of probability theory lead to the joint probability distributions of arbitrary collections of E from which the conditional probability distributions of selected sets of phases, given the values of suitably chosen magnitudes , may be inferred.
The magnitudedependent entities that constitute the foundation of direct methods are linear combinations of phases called structure invariants. The term `structure invariant' stems from the fact that the values of these quantities are independent of the choice of origin. The most useful of the structure invariants are the threephase or triplet invariants, the conditional probability distribution (Cochran, 1955), given , of which is where and N is the number of atoms, here presumed to be identical, in the asymmetric unit of the corresponding primitive unit cell. This distribution is illustrated in Fig. 16.1.3.1. The expected value of the cosine of a particular triplet, , is given by the ratio of modified Bessel functions, .
Estimates of the invariant values are most reliable when the normalized structurefactor magnitudes (, and ) are large and the number of atoms in the unit cell, N, is small. This is the primary reason why direct phasing is more difficult for macromolecules than it is for small molecules. Fourphase or quartet invariants have proven helpful in smallmolecule structure determination, particularly when used passively as the basis for a figure of merit (DeTitta et al., 1975). However, the reliability of these invariants, as given by their conditional probability distribution (Hauptman, 1975), is proportional to , and they have not as yet been shown to be useful for macromolecular phasing. The reliability of higherorder invariants decreases even more rapidly as structure size increases.
Successful crystal structure determination requires that sufficient phases be found such that a Fourier map computed using the corresponding structure factors will reveal the atomic positions. It is particularly important that the biggest terms (i.e., largest ) be included in the Fourier series. Thus, the first step in the phasing process is to sort the reflections in decreasing order according to their values and to choose the number of large reflections that are to be phased. The second step is to generate the possible invariants involving these intense reflections and then to sort them in decreasing order according to their values. Those invariants with the largest values are retained in sufficient number to achieve the desired overdetermination. Ab initio phase determination by direct methods requires not only a set of invariants, the average values of the cosines of which are presumed to be known, but also a set of starting phases. Therefore, the third step in the phasing process is the assignment of initial phase values. If enough pairs of phases, and , are known, the structure invariants can then be used to generate further phases which, in turn, can be used to evaluate still more phases. Repeated iterations will permit most reflections with large to be phased.
Depending on the space group, a small number of phases can be assigned arbitrarily in order to fix the origin position and, in noncentrosymmetric space groups, the enantiomorph. However, except for the simplest structures, these reflections provide an inadequate foundation for further phase development. Consequently, a `multisolution' or multitrial approach (Germain & Woolfson, 1968) is normally taken in which other reflections are each assigned many different starting values in the hope that one or more of the resultant phase combinations will lead to a solution. Solutions, if they occur, must be identified on the basis of some suitable figure of merit. Although phases can be evaluated sequentially, the order determined by a socalled convergence map (Germain et al., 1970), it has become standard in recent years to use a randomnumber generator to assign initial values to all available phases from the outset (Baggio et al., 1978; Yao, 1981). A variant of this procedure is to use the randomnumber generator to assign initial coordinates to the atoms in the trial structures and then to obtain initial phases from a structurefactor calculation.
Once a set of initial phases has been chosen, it must be refined against the set of structure invariants whose values are presumed known. In theory, any of a variety of optimization methods could be used to extract phase information in this way. However, so far only two (tangent refinement and parametershift optimization of the minimal function) have been shown to be of practical value.
The tangent formula, (Karle & Hauptman, 1956), is the relationship used in conventional directmethods programs to compute given a sufficient number of pairs () of known phases. It can also be used within the phaserefinement portion of the dualspace ShakeandBake procedure (Weeks, Hauptman et al., 1994; Sheldrick & Gould, 1995). The variance associated with depends on and, in practice, the estimate is only reliable for and for structures with a limited number of atoms (N). If equation (16.1.4.1) is used to redetermine previously known phases, the phasing process is referred to as tangentformula refinement; if only new phases are determined, the phasing process is tangent expansion.
The tangent formula can be derived using the assumption of equal resolved atoms. Nevertheless, it suffers from the disadvantage that, in space groups without translational symmetry, it is perfectly fulfilled by a false solution with all phases equal to zero, thereby giving rise to the socalled `uraniumatom' solution with one dominant peak in the corresponding Fourier synthesis. In conventional directmethods programs, the tangent formula is often modified in various ways to include (explicitly or implicitly) information from the socalled `negative' quartet invariants (Schenk, 1974; Hauptman, 1974; Giacovazzo, 1976) that are dependent on the smallest as well as the largest E magnitudes. Such modified tangent formulas do indeed largely overcome the problem of pseudosymmetric solutions for small N, but because of the dependence of quartetterm probabilities on , they are little more effective than the normal tangent formula for large N.
Constrained minimization of an objective function like the minimal function, (Debaerdemaeker & Woolfson, 1983; Hauptman, 1991; DeTitta et al., 1994), provides an alternative approach to phase refinement or phase expansion. is a measure of the meansquare difference between the values of the triplets calculated using a particular set of phases and the expected values of the same triplets as given by the ratio of modified Bessel functions. The minimal function is expected to have a constrained global minimum when the phases are equal to their correct values for some choice of origin and enantiomorph (the minimal principle). Experimentation has thus far confirmed that, when the minimal function is used actively in the phasing process and solutions are produced, the final trial structure corresponding to the smallest value of is a solution provided that is calculated directly from the atomic positions before the phaserefinement step (Weeks, DeTitta et al., 1994). Therefore, is also an extremely useful figure of merit. The minimal function can also include contributions from higherorder (e.g. quartet) invariants, although their use is not as imperative as with the tangent formula because the minimal function does not have a minimum when all phases are zero. In practice, quartets are rarely used in the minimal function because they increase the CPU time while adding little useful information for large structures. The cosine function in equation (16.1.4.2) can also be replaced by other functions of the phases giving rise to alternative minimal functions. In particular, an exponential expression has been found to give superior results for several P1 structures (Hauptman et al., 1999).
In principle, any minimization technique could be used to minimize by varying the phases. So far, a seemingly simple algorithm, known as parameter shift (Bhuiya & Stanley, 1963), has proven to be quite powerful and efficient as an optimization method when used within the ShakeandBake context to reduce the value of the minimal function. For example, a typical phaserefinement stage consists of three iterations or scans through the reflection list, with each phase being shifted a maximum of two times by 90° in either the positive or negative direction during each iteration. The refined value for each phase is selected, in turn, through a process which involves evaluating the minimal function using the original phase and each of its shifted values (Weeks, DeTitta et al., 1994). The phase value that results in the lowest minimalfunction value is chosen at each step. Refined phases are used immediately in the subsequent refinement of other phases. It should be noted that the parametershift routine is similar to that used in ψmap refinement (White & Woolfson, 1975) and XMY (Debaerdemaeker & Woolfson, 1989).
Peak picking is a simple but powerful way of imposing an atomicity constraint. The potential for realspace phase improvement in the context of smallmolecule direct methods was recognized by Jerome Karle (1968). He found that even a relatively small, chemically sensible, fragment extracted by manual interpretation of an electrondensity map could be expanded into a complete solution by transformation back to reciprocal space and then performing additional iterations of phase refinement with the tangent formula. Automatic realspace electrondensity map interpretation in the ShakeandBake procedure consists of selecting an appropriate number of the largest peaks in each cycle to be used as an updated trial structure without regard to chemical constraints other than a minimum allowed distance between atoms. If markedly unequal atoms are present, appropriate numbers of peaks (atoms) can be weighted by the proper atomic numbers during transformation back to reciprocal space in a subsequent structurefactor calculation. Thus, a priori knowledge concerning the chemical composition of the crystal is utilized, but no knowledge of constitution is required or used during peak selection. It is useful to think of peak picking in this context as simply an extreme form of density modification appropriate when atomic resolution data are available. In theory, under appropriate conditions it should be possible to substitute alternative densitymodification procedures such as lowdensity elimination (Shiono & Woolfson, 1992; Refaat & Woolfson, 1993) or solvent flattening (Wang, 1985), but no practical applications of such procedures have yet been made. The imposition of physical constraints counteracts the tendency of phase refinement to propagate errors or produce overly consistent phase sets. Several variants of peak picking, which are discussed below, have been successfully employed within the framework of ShakeandBake.
In its simplest form, peak picking consists of simply selecting the top Emap peaks where is the number of unique nonH atoms in the asymmetric unit. This is adequate for true smallmolecule structures. It has also been shown to work well for heavyatom or anomalously scattering substructures where is taken to be the number of expected substructure atoms (Smith et al., 1998; Turner et al., 1998). For larger structures (), it is likely to be better to select about peaks, thereby taking into account the probable presence of some atoms that, owing to high thermal motion or disorder, will not be visible during the early stages of a structure determination. Furthermore, a recent study (Weeks & Miller, 1999b) has shown that structures in the 250–1000atom range which contain a half dozen or more moderately heavy atoms (i.e., S, Cl, Fe) are more easily solved if only peaks are selected. The only chemical information used at this stage is a minimum interpeak distance, generally taken to be 1.0 Å. For substructure applications, a larger minimum distance (e.g. 3 Å) is more appropriate.
An alternative approach to peak picking is to select approximately peaks as potential atoms and then eliminate some of them, one by one, while maximizing a suitable figure of merit such as The top peaks are used as potential atoms to compute . The atom that leaves the highest value of P is then eliminated. Typically, this procedure, which has been termed iterative peaklist optimization (Sheldrick & Gould, 1995), is repeated until only atoms remain. Use of equation (16.1.5.1) may be regarded as a reciprocalspace method of maximizing the fit to the originremoved sharpened Patterson function, and it is used for this purpose in molecular replacement (Beurskens, 1981). Subject to various approximations, maximumlikelihood considerations also indicate that it is an appropriate function to maximize (Bricogne, 1998). Iterative peaklist optimization provides a higher percentage of solutions than simple peak picking, but it suffers from the disadvantage of requiring much more CPU time.
A third peakpicking strategy also involves selecting approximately of the top peaks and eliminating some, but, in this case, the deleted peaks are chosen at random. Typically, onethird of the potential atoms are removed, and the remaining atoms are used to compute . By analogy to the common practice in macromolecular crystallography of omitting part of a structure from a Fourier calculation in hopes of finding an improved position for the deleted fragment, this version of peak picking is described as making a random omit map. This procedure is a little faster than simply picking atoms because fewer atoms are used in the structurefactor calculation. More important is the fact that, like iterative peaklist optimization, it has the potential for being a more efficient search algorithm.
Emap recycling, but without phase refinement (Sheldrick, 1982, 1990; Kinneging & de Graaff, 1984), has been frequently used in conventional directmethods programs to improve the completeness of the solutions after phase refinement. It is important to apply Fourier refinement to ShakeandBake solutions also because such processing significantly increases the number of resolved atoms, thereby making the job of map interpretation much easier. Since phase refinement via either the tangent formula or the minimal function requires relatively accurate invariants that can only be generated using the larger E magnitudes, a limited number of reflections are phased during the actual dualspace cycles. Working with a limited amount of data has the added advantage that less CPU time is required. However, if the current trial structure is the `best' so far based on a figure of merit (either the minimal function or a realspace criterion), then it makes sense to subject this structure to Fourier refinement using additional data, thereby reducing seriestermination errors. The correlation coefficient (Fujinaga & Read, 1987), where weights , has been found to be an especially effective figure of merit when used with all the data and is, therefore, suited for identifying the most promising trial structure at the end of Fourier refinement. Either simple peak picking or iterative peaklist optimization can be employed during the Fourierrefinement cycles in conjunction with weighted E maps (Sim, 1959). The final model can be further improved by isotropic displacement parameter refinement for the individual atoms (Usón et al., 1999) followed by calculation of the Sim (1959) or sigmaA (Read, 1986) weighted map. This is particularly useful when the requirement of atomic resolution is barely fulfilled, and it makes it easier to interpret the resulting maps by classical macromolecular methods.
The ShakeandBake algorithm has been implemented independently in two computer programs. These are (1) SnB written in Buffalo at the Hauptman–Woodward Institute, principally by Charles Weeks and Russ Miller (Miller et al., 1994; Weeks & Miller, 1999a), and (2) SHELXD (which is also known by the alias `Halfbaked'), written in Göttingen by George Sheldrick (Sheldrick, 1997, 1998). SHELXD attempts to do more during the realspace (baking) stage than is available to the user with the current version of SnB. The most recent public release of SnB is available at http://www.hwi.buffalo.edu/SnB/ along with documentation, test data and other pertinent information. SHELXD will be released when testing is complete; for details see the SHELX homepage at http://shelx.uniac.gwdg.de/SHELX/ .
A flowchart for the generic ShakeandBake algorithm, which provides the foundation for both programs, is presented in Fig. 16.1.7.1. It contains two refinement loops embedded in the trialstructure loop. The first of these loops (steps 5–9) is a dualspace phaseimprovement loop entered by all trial structures, and the second (steps 11–14) is a realspace Fourierrefinement loop entered only by those trial structures that are currently judged to be the best on the basis of some figure of merit. These loops have been called the internal and external loops, respectively, in previous descriptions of the SHELXD program (e.g. Sheldrick & Gould, 1995; Sheldrick, 1997, 1998). Currently, the major algorithmic differences between the programs are the following:
All of the major parameters of the ShakeandBake procedure (i.e., the numbers of refinement cycles, phases, triplet invariant relationships and peaks selected) are a function of structure size and can be expressed in terms of , the number of unique nonH atoms in the asymmetric unit. These parameters have been finetuned in a series of tests using data for both small and large molecules (Weeks, DeTitta et al., 1994; Chang et al., 1997; Weeks & Miller, 1999b). Default (recommended) parameter values used in the SnB program are summarized in Table 16.1.7.1. At resolutions in the 1.1–1.4 Å range, recalcitrant data sets can sometimes be made to yield solutions if (1) the phase:invariant ratio is increased from 1:10 to values ranging between 1:20 and 1:50 or (2) the number of dualspace refinement cycles is doubled or tripled. The presence of moderately heavy atoms (e.g. S, C, Fe) greatly increases the probability of success at resolutions less than 1.2 Å; in general, the higher the fraction of such atoms the more the resolution requirement can be relaxed, provided that these atoms have low B values. Thus, disulfide bridges are much more helpful than methionine sulfur atoms because they tend to have lower B values. Parameter recommendations for substructures are based on an analysis of the peakwavelength anomalousdifference data for Sadenosylhomocysteine (AdoHcy) hydrolase (Turner et al., 1998). Parameter shift with a maximum of two 90° steps [indicated by the shorthand notation PS(90°, 2)] is the default phaserefinement mode. However, some structures (especially large P1 structures) may respond better to a single larger shift [e.g. PS(157.5°, 1)] (Deacon et al., 1998). This seems to reduce the frequency of false minima (see Section 16.1.8.2).

In general, the parameter values used in SHELXD are similar to those used in SnB. However, the combination of random omit maps with tangent extension has been found to be the most effective strategy within the context of SHELXD. Consequently, it is used as the default operational mode (see Section 16.1.8.4 for details).
On account of the intensive nature of the computations involved, SnB and SHELXD are designed to run unattended for long periods while also providing ways for the user to check the status of jobs in progress. The progress of current SnB jobs can be followed by monitoring a figureofmerit histogram for the trial structures that have been processed (Fig. 16.1.7.2). A clear bimodal distribution of figureofmerit values is a strong indication that a solution has, in fact, been found. However, not all solutions are so obvious, and it sometimes pays to inspect the best trial even when the histogram is unimodal. The course of a typical solution as a function of SnB cycle is contrasted with that of a nonsolution in Fig. 16.1.7.3. Minimalfunction values for a solution usually decrease abruptly over the course of just a few cycles, and a tool is provided within SnB that allows the user to visually inspect the trace of minimalfunction values for the best trial completed so far. Fig. 16.1.7.3 shows that the abrupt decrease in minimalfunction values corresponds to a simultaneous abrupt increase in the number of peaks close to true atomic positions. In this example, a second abrupt increase in correct peaks occurs when Fourier refinement is started.

A histogram of figureofmerit values (minimal function) for 378 scorpion toxin II trials. This bimodal histogram suggests that ten trials are solutions. 

Tracing the history of a solution and a nonsolution trial for scorpion toxin II as a function of ShakeandBake cycle. (a) Minimalfunction figure of merit, and (b) number of peaks closer than 0.5 Å to true atomic positions. Simple peak picking (200 or 0.4N_{u} peaks) was used for 500 (N_{u}) cycles, and 500 peaks (N_{u}) were then selected for an additional 50 (0.1N_{u}) dualspace cycles. The solution (which had the lowest minimalfunction value) was then subjected to 50 cycles of Fourier refinement. 
Since the correlation coefficient is a relatively absolute figure of merit (given atomic resolution, values greater than 65% almost invariably correspond to correct solutions), it is usually clear when SHELXD has solved a structure. The current version of SHELXD includes an option for calculating it using the full data every 10 or 20 internal loop cycles, and jumping to the external loop if the value is high enough. Recalculating it every cycle would be computationally less efficient overall.
The solution of the (known) structure of triclinic lysozyme by SHELXD and shortly afterwards by SnB (Deacon et al., 1998) finally broke the 1000atom barrier for direct methods (there happen to be 1001 protein atoms in this structure!). Both programs have also solved a large number of previously unsolved structures that had defeated conventional direct methods; some examples are listed in Table 16.1.8.1. The overall quality of solutions is generally very good, especially if appropriate action is taken during the Fourierrefinement stage. Most of the time, the ShakeandBake method works remarkably well, even for rather large structures. However, in problematic situations, the user needs to be aware of options that can increase the chance of success.
References: [1] Loll et al. (1997); [2] Schäfer et al. (1996); [3] Schäfer (1998); [4] Schäfer, Sheldrick, Bahner & Lackner (1998); [5] Langs (1988); [6] Drouin (1998); [7] Anderson et al. (1996); [8] Schäfer & Prange (1998); [9] Stec et al. (1995); [10] Weeks et al. (1995); [11] Usón et al. (1999); [12] Aree et al. (1999); [13] Prive et al. (1999); [14] Dauter et al. (1992); [15] Loll et al. (1998); [16] Schneider (1998); [17] Reibenspiess (1998); [18] Schäfer, Sheldrick, Schneider & Vértesy (1998); [19] Teichert (1998); [20] Smith et al. (1997); [21] Gessler et al. (1999); [22] Schneider et al. (2000); [23] Parisini et al. (1999); [24] Deacon et al. (1998); [25] Walsh et al. (1998); [26] Frazão et al. (1999); [27] Ekstrom et al. (1999); [28] Li et al. (1999); [29] Radfar et al. (2000); [30] Turner et al. (1998); [31] Deacon & Ealick (1999).

When slightly heavier atoms such as sulfur are present, it is possible to start the ShakeandBake recycling procedure from a set of atomic positions that are consistent with the Patterson function. For large structures, the vectors between such atoms will correspond to Patterson densities around or even below the noise level, so classical methods of locating the positions of these atoms unambiguously from the Patterson are unlikely to succeed. Nevertheless, the Patterson function can still be used to filter sets of starting atoms. This filter is currently implemented as follows in SHELXD. First, a sharpened Patterson function (Sheldrick et al., 1993) is calculated, and the top 200 (for example) nonHarker peaks further than a given minimum distance from the origin are selected, in turn, as twoatom translationsearch fragments, one such fragment being employed per solution attempt. For each of a large number of random translations, all unique Patterson vectors involving the two atoms and their symmetry equivalents are found and sorted in order of increasing Patterson density. The sum of the smallest third of these values is used as a figure of merit (PMF). Tests showed that although the globally highest PMF for a given twoatom search fragment may not correspond to correct atomic positions, nevertheless, by limiting the number of trials, some correct solutions may still be found. After all the vectors have been used as search fragments (e.g. after 200 attempts), the procedure is repeated starting again with the first vector. The two atoms may be used to generate further atoms using a full Patterson superposition minimum function or a weighted difference synthesis (in the current version of SHELXD, a combination of the two is used).
In the case of the small protein BPTI (Schneider, 1998), 15 300 attempts based on 100 different search vectors led to four final solutions with mean phase error less than 18°, although none of the globally highest PMF values for any of the search vectors corresponded to correct solutions. Table 16.1.8.2 shows the effect of using different twoatom search fragments for hirustasin, a previously unsolved 55aminoacid protein containing five disulfide bridges first solved using SHELXD (Usón et al., 1999). It is not clear why some search fragments perform so much better than others; surprisingly, one of the more effective search vectors deviates considerably (1.69 Å) from the nearest true S–S vector.

The frequent imposition of realspace constraints appears to keep dualspace methods from producing most of the false minima that plague practitioners of conventional direct methods. Translated molecules have not been observed (so far), and traditionally problematic structures with polycyclic ring systems and long aliphatic chains are readily solved (McCourt et al., 1996, 1997). False minima of the type that occur primarily in space groups lacking translational symmetry and are characterized by a single large `uranium' peak do occur frequently in P1 and occasionally in other space groups. Triclinic hen eggwhite lysozyme exhibits this phenomenon regardless of whether parametershift or tangentformula phase refinement is employed. An example from another space group (C222) is provided by the Se substructure data for AdoHcy hydrolase. In this case, many trials converge to false minima if the feature in the SnB program that eliminates peaks at special positions is not utilized.
The problem with false minima is most serious if they have a `better' value of the figure of merit being used for diagnostic purposes than do the true solutions. Fortunately, this is not the case with the uranium `solutions', which can be distinguished on the basis of the minimal function [equation (16.1.4.2)] or the correlation coefficient [equation (16.1.6.1)]. However, it would be inefficient to compute the latter in each dualspace cycle since it requires that essentially all reflections be used. To be an effective discriminator, the figure of merit must be computed using the phases calculated from the pointatom model, not from the phases directly after refinement. Phase refinement can and does produce sets of phases, such as the uranium phases, which do not correspond to physical reality. Hence, it should not be surprising that such phase sets might appear `better' than the true phases and could lead to an erroneous choice for the best trial. Peak picking, followed by a structurefactor calculation in which the peaks are sensibly weighted, converts the phase set back to physically allowed values. If the value of the minimal function computed from the refined or unconstrained phases is denoted by and the value of the minimal function computed using the constrained phases resulting from the atomic model is denoted by , then a function defined by can be used to distinguish false minima from other nonsolutions as well as the true solutions. Once a trial falls into a false minimum, it never escapes. Therefore, the R ratio can be used, within SnB, as a criterion for early termination of unproductive trials. Based on data for several P1 structures, it appears that termination of trials with R ratio values exceeding 0.2 will eliminate most false minima without risking rejection of any potential solutions. In the case of triclinic lysozyme, false minima can be recognized, on average, by cycle 25. Since the default recommendation would be for 1000 cycles, a substantial saving in CPU time is realized by using the R ratio earlytermination test. It should be noted that SHELXD optionally allows early termination of trials if the second peak is less than a specified fraction (e.g. 40%) of the height of the first. Generally, but not always, the Rratio and peakratio tests eliminate the same trials.
Recognizing false minima is, of course, only part of the battle. It is also necessary to find a real solution, and essentially 100% of the triclinic lysozyme trials were found to be false minima when the standard parametershift conditions of two 90° shifts were used. In fact, significant numbers of solutions occur only when singleshift angles in the range 140–170° are used (Fig. 16.1.8.1), and there is a surprisingly high success rate (percentage of trial structures that go to solutions) over a narrow range of angles centred about 157.5°. It is also not surprising that there is a correlated decrease in the percentage of false minima in the range 140–150°. This suggests that a fruitful strategy for structures that exhibit a large percentage of false minima would be the following. Run 100 or so trials at each of several shift angles in the range 90–180°, find the smallest angle which gives nearly zero false minima, and then use this angle as a single shift for many trials. Balhimycin is an example of a large nonP1 structure that also requires a parameter shift of around 154° to obtain a solution using the minimal function.
The importance of the presence of several atoms heavier than oxygen for increasing the chance of obtaining a solution by SnB at resolutions less than 1.2 Å was noticed for truncated data from vancomycin and the 289atom structure of conotoxin EpI (Weeks & Miller, 1999b). The results of SHELXD application to hirustasin are consistent with this (Usón et al., 1999). The 55aminoacid protein hirustasin could be solved by SHELXD using either 1.2 Å lowtemperature data or 1.4 Å roomtemperature data; however, as shown in Fig. 16.1.8.2(a), the mean phase error (MPE) is significantly better for the 1.2 Å data over the whole resolution range. The MPE is determined primarily by the datatoparameter ratio, which is reflected in the smaller number of reliable triplet invariants at lower resolution. Although smallmolecule interpretation based on peak positions worked well for the 1.2 Å solution (overall ), standard protein chain tracing was required for the 1.4 Å solution (overall ). As is clear from the corresponding electrondensity map (Fig. 16.1.8.2b), the ShakeandBake procedure produces easily interpreted protein density even when bonded atoms are barely resolved from each other. The hirustasin structure was also determined with SHELXD using 1.55 Å truncated data, and this endeavour currently holds the record for the lowestresolution successful application of ShakeandBake.

(a) Mean phase error as a function of resolution for the two independent ab initio SHELXD solutions of the previously unsolved protein hirustasin. Either the 1.2 Å or the 1.4 Å native data set led to solution of the structure. (b) Part of the hirustasin molecule from the 1.4 Å roomtemperature data after one round of Bvalue refinement with fixed coordinates. 
The relative effects of accuracy, completeness and resolution on ShakeandBake success rates using SnB for three large P1 structures were studied by computing errorfree data using the known atomic coordinates. The results of these studies, presented in Table 16.1.8.3, show that experimental error contributed nothing of consequence to the low success rates for vancomycin and lysozyme. However, completing the vancomycin data up to the maximum measured resolution of 0.97 Å resulted in a substantial increase in success rate which was further improved to an astounding success rate of 80% when the data were expanded to 0.85 Å.

On account of overload problems, the experimental vancomycin data did not include any data at 10 Å resolution or lower. A total of 4000 reflections were phased in the dualspace loop in the process of solving this structure with the experimental data. Some of these data were then replaced with the largest errorfree magnitudes chosen from the missing reflections at several different resolution limits. The results in Table 16.1.8.4 show a tenfold increase in success rate when only 200 of the largest missing magnitudes were supplied, and it made no difference whether these reflections had a maximum resolution of 2.8 Å or were chosen randomly from the whole 0.97 Å sphere. The moral of this story is that, when collecting data for ShakeandBake, it pays to take a second pass using a shorter exposure to fillin the lowresolution data.

Variations in the computational details of the dualspace loop can make major differences in the efficacy of SnB and SHELXD. Recently, several strategies were combined in SHELXD and applied to a 148atom P1 test structure (Karle et al., 1989) with the results shown in Fig. 16.1.8.3. The CPU time requirements of parametershift (PS) and tangentformula expansion (TE) are similar, both being slower than no phase refinement (NR). In real space, the randomomitmap strategy (RO) was slightly faster than simple peak picking (PP) because fewer atoms were used in the structurefactor calculations. Both of these procedures were much faster than iterative peaklist optimization (PO). The original SHELXD algorithm (TE + PO) performs quite well in comparison with the SnB algorithm (PS + PP) in terms of the percentage of correct solutions, but less well when the efficiency is compared in terms of CPU time per solution. Surprising, the two strategies involving random omit maps (PS + RO and TE + RO), which had been calculated to give reference curves, are much more effective than the other algorithms, especially in terms of CPU efficiency. Indeed these two runs appear to approach a 100% success rate as the number of cycles becomes large. The combination of random omit maps and Karletype tangent expansion appears to be even more effective (Fig. 16.1.8.4) for gramicidin A, a structure (Langs, 1988). It should be noted that conventional direct methods incorporating the tangent formula tend to perform better in than in P1, perhaps because there is less risk of a uraniumatom pseudosolution.

(a) Success rates and (b) cost effectiveness for several dualspace strategies as applied to a 148atom P1 structure. The phaserefinement strategies are: (PS) parametershift reduction of the minimalfunction value, (TE) Karletype tangent expansion (holding the top 40% highest fixed) and (NR) no phase refinement but Sim (1959) weights applied in the E map (these depend on and so cannot be employed after phase refinement). The realspace strategies are: (PP) simple peak picking using peaks, (PO) peaklist optimization (reducing peaks to ), and (RO) random omit maps (also reducing peaks to ). A total of about 10 000 trials of 400 internal loop cycles each were used to construct this diagram. 
Subsequent tests using SHELXD on several other structures have shown that the use of random omit maps is much more effective than picking the same final number of peaks from the top of the peak list. However, it should be stressed that it is the combination TE + RO that is particularly effective. A possible special case is when a very small number of atoms is sought (e.g. Se atoms from MAD data). Preliminary tests indicate that peaklist optimization (PO) is competitive in such cases because the CPU time penalty associated with it is much smaller than when many atoms are involved.
With hindsight, it is possible to understand why the random omit maps provide such an efficient search algorithm. In macromolecular structure refinement, it is standard practice to omit parts of the model that do not fit the current electron density well, to perform some refinement or simulated annealing (Hodel et al., 1992) on the rest of the model to reduce memory effects, and then to calculate a new weighted electrondensity map (omit map). If the original features reappear in the new density, they were probably correct; in other cases the omit map may enable a new and better interpretation. Thus, random omit maps should not lead to the loss of an essentially correct solution, but enable efficient searching in other cases. It is also interesting to note that the results presented in Figs. 16.1.8.3 and 16.1.8.4 show that it is possible, albeit much less efficiently, to solve both structures using random omit maps without the use of any phase relationships based on probability theory (curves NR + RO).
The results shown in Table 16.1.8.4 and Fig. 16.1.8.3 indicate that success rates in space group P1 can be anomalously high. This suggests that it might be advantageous to expand all structures to P1 and then to locate the symmetry elements afterwards. However, this is more computationally expensive than performing the whole procedure in the true space group, and in practice such a strategy is only competitive in lowsymmetry space groups such as , C2 or (Chang et al., 1997). Expansion to P1 also offers some opportunities for starting from `slightly better than random' phases. One possibility, successfully demonstrated by Sheldrick & Gould (1995), is to use a rotation search for a small fragment (e.g. a short piece of αhelix) to generate many sets of starting phases; after expansion to P1 the translational search usually required for molecular replacement is not needed. Various Patterson superposition minimum functions (Sheldrick & Gould, 1995; Pavelčík, 1994) can also provide an excellent start for phase determination for data expanded to P1. Drendel et al. (1995) were successful in solving small organic structures ab initio by a Fourier recycling method using data expanded to P1 without the use of probability theory.
It has been known for some time that conventional direct methods can be a valuable tool for locating the positions of heavyatom substructures using isomorphous (Wilson, 1978) and anomalous (Mukherjee et al., 1989) difference structure factors. Experience has shown that successful substructure applications are highly dependent on the accuracy of the difference magnitudes. As the technology for producing selenomethioninesubstituted proteins and collecting accurate multiplewavelength (MAD) data has improved (Hendrickson & Ogata, 1997; Smith, 1998), there has been an increased need to locate many selenium sites. For larger structures (e.g. more than about 30 Se atoms), automated Patterson interpretation methods can be expected to run into difficulties since the number of unique peaks to be analysed increases with the square of the number of atoms. Experimentally measured difference data are an approximation to the data for the hypothetical substructure, and it is reasonable to expect that conventional direct methods might run into difficulties sooner when applied to such data. Dualspace direct methods provide a more robust foundation for handling such data, which are often extremely noisy. Dualspace methods also have the added advantage that the expected number of Se atoms, N_{u}, which is usually known, can be exploited directly by picking the top N_{u} peaks. Successful applications require great care in data processing, especially if the values resulting from a MAD experiment are to be used.
All successful applications of SnB to previously unknown SeMet data sets, as reported in Table 16.1.8.1, actually involved the use of peakwavelength anomalous difference data . The amount of data available for substructure problems is much larger than for fullstructure problems with a comparable number of atoms to be located. Consequently, the user can afford to be stringent in eliminating data with uncertain measurements. Guidelines for rejecting uncertain data have been suggested (Smith et al., 1998). Consideration should be limited to those data pairs [i.e., isomorphous pairs and anomalous pairs ] for which and where typically and . The final choice of maximum resolution to be used should be based on inspection of the spherical shell averages versus . The purpose of this precaution is to avoid spuriously large values for highresolution data pairs measured with large uncertainties due to imperfect isomorphism or general falloff of scattering intensity with increasing scattering angle. Only those for which (typically ) should be deemed sufficiently reliable for subsequent phasing. The probability of very large difference 's (e.g. ) is remote, and data sets that appear to have many such measurements should be examined critically for measurement errors. If a few such data remain even after the adoption of rigorous rejection criteria, it may be best to eliminate them individually. A later paper (Blessing & Smith, 1999) elaborates further dataselection criteria.
On the other hand, it is also important that the phase:invariant ratio be maintained at 1:10 in order to ensure that the phases are overdetermined. Since the largest 's for the substructure cell are more widely separated than they are in a true smallmolecule cell, the relative number of possible triplets involving the largest reciprocallattice vectors may turn out to be too small. Consequently, a relatively small number of substructure phases (e.g. 10N_{u}) may not have a sufficient number (i.e., 100N_{u}) of invariants. Since the number of triplets increases rapidly with the number of reflections considered, the appropriate action in such cases is to increase the number of reflections as suggested in Table 16.1.7.1. This will typically produce the desired overdetermination.
It is rare for Se atoms to be closer to each other than 5 Å, and the application of SnB to AdoHcy data truncated to 4 and 5 Å has been successful. Success rates were less for lowerresolution data, but the CPU time required per trial was also reduced, primarily because much smaller Fourier grids were necessary. Consequently, there was no net increase in the CPU time needed to find a solution.
A special version of SHELXD is being developed that makes extensive use of the Patterson function both in generating starting atoms and in providing an independent figure of merit. It has already successfully located the anomalous scatterers in a number of structures using MAD data or simple anomalous differences. A recent example was the unexpected location of 17 anomalous scatterers (sulfur atoms and chloride ions) from the 1.5 Åwavelength anomalous differences of tetragonal HEW lysozyme (Dauter et al., 1999).
The ShakeandBake approach has increased, by an order of magnitude, the size of structures solvable by direct methods. In addition, a routine application of the SnB program to peakwavelength anomalous difference data has revealed 64 of the 70 Se sites in a selenomethioninesubstituted protein (Deacon & Ealick, 1999). Although there is no indication that maximum size limitations have been reached, the fact that the reliability of invariant estimates is known to decrease with increasing structure size suggests that such limitations may exist; based on preliminary tests, it is conjectured that the limit is a few thousand unique atoms for conventional fullstructure experiments. Thus, it is natural to wonder what can be done in situations where direct methods are not now routinely applicable. These cases include (1) macromolecules that lack heavyatom or anomalousscattering sites with sufficient phasing power for present techniques, (2) macromolecules for which no derivatives are available or for which selenium substitution is impossible, and (3) structures of any size which fail to diffract at sufficiently high resolution. `Sufficiently high' typically means about 1.2 Å in nonsubstructure situations.
The requirement for data to very high resolution is, of course, troublesome for macromolecules. One approach to lowering resolution requirements might be to replace the peak search by a search for small common fragments (e.g. the five atoms of a peptide unit or an aromatic residue). Furthermore, it should also be possible to integrate the wARP procedure (Lamzin & Wilson, 1993; Perrakis et al., 1997) into the realspace part of the ShakeandBake cycle. The Patterson function (Pavelčík, 1994; Sheldrick & Gould, 1995) and large Karle–Hauptman determinants (Vermin & de Graaff, 1978) might also improve the success rate in borderline cases by providing betterthanrandom starting coordinates or phases.
However, it is not necessarily true that peak picking is the primary limitation to lowerresolution applications. The lack of enough sufficiently accurate tripletinvariant values appears to be a more fundamental problem. Simulation experiments have shown that the SnB program can solve the crambin structure even at 2.0 Å if the invariants used are accurate enough (Weeks et al., 1998). Therefore, the primary breakdown of ShakeandBake occurs in reciprocal space and could likely be overcome if correct individual invariant values were used instead of the rather crude estimates provided by the Cochran (1955) distribution for the cosines of the triplet invariants. Individual invariant estimates, , can be accommodated by a modified tangent formula, or by a modified minimal function, where are appropriately chosen weights. Either of these relationships can serve as the basis for a modified ShakeandBake procedure.
One approach to providing better invariant values is to estimate them individually from the known structurefactor magnitudes ('s). Several methods for doing this have been proposed over the years for the smallmolecule case (e.g. Hauptman et al., 1969; Langs, 1993), and this approach has met with limited success. In the macromolecular case, however, better options for estimating invariant values are available whenever supplemental information in the form of isomorphousreplacement or anomalousdispersion data is provided. In addition, the development of multiplebeam diffraction raises the possibility of measuring invariant values experimentally. The modified tangent and minimalfunction formulas provide the foundation for a unified treatment of all such supplemental information.
The integration of traditional direct methods with isomorphous replacement was initiated by Hauptman (1982a), who studied the conditional probability distribution of triplet invariants comprised jointly of native and derivative phases assuming as known the six magnitudes associated with reciprocallattice vectors H, K and . It was shown that many triplets, whose true values were near either 0 or π, could be identified and reliably estimated. Later it was shown that cosine estimates could be obtained anywhere in the range −1 to +1 (Fortier et al., 1985). In a series of six recent papers, Giacovazzo and collaborators utilized a combined directmethods/isomorphousreplacement approach, with limited success, to devise procedures for the ab initio solution of the phase problem for macromolecules (Giacovazzo, Siliqi & Ralph, 1994; Giacovazzo, Siliqi & Spagna, 1994; Giacovazzo, Siliqi & Zanotti, 1995; Giacovazzo & Platas, 1995; Giacovazzo, Siliqi & Platas, 1995; Giacovazzo et al., 1996). Their methods depend only on diffraction data for a pair of isomorphous structures and do not require any prior structural knowledge. Hu & Liu (1997) have generalized the earlier work to obtain the conditional distribution of the general (nphase) structure invariant when diffraction data are available for any number (m) of isomorphous structures. Finally, it has been shown that, provided the heavyatom substructure is known, Hauptman's triplet distribution leads to unique values for the triplets and the individual phases (Langs et al., 1995).
In a manner analogous to the SIR case, Hauptman (1982b) derived the conditional probability distribution for triplet invariants given six magnitudes in the presence of anomalous dispersion. It was shown that unique estimates, lying anywhere in the whole interval 0–2π, could be obtained for the triplet values. This result was unanticipated since all earlier work had led to the conclusion that a twofold ambiguity in the value of an individual phase was intrinsic to the SAS approach. Later, it was demonstrated how the probabilistic estimates led to individual phases by means of a system of SAS tangent equations (Hauptman, 1996). Although the initial application of this tangentbased approach to the previously known macromomycin structure (750 nonH protein atoms plus 150 solvent molecules) was encouraging, it has not yet been applied to unknown macromolecules.
The conditional probability distributions of the quartet invariants, in both the SIR and SAS cases, have been derived based on corresponding difference structure factors rather than on the individual structure factors themselves (Kyriakidis et al., 1996). Fan and his collaborators (Fan et al., 1984; Fan & Gu, 1985; Fan et al., 1990; Sha et al., 1995; Zheng et al., 1996) have also extensively studied the use of direct methods in the SAS case. Applications to the known small protein avian pancreatic polypeptide at 2 Å revealed the essential features of the molecule. The directmethods approach was used to break the phase ambiguity for core streptavidin and azurin II (proteins of moderate size) using SAS data at 3 Å. Although the directmethods maps in these cases did not reveal the structures, the phases were good enough to serve as successful starting points for solvent flattening.
Recent experimental work in the field of multiplebeam diffraction provides grounds for hope that a generally applicable solution to the problem of obtaining individual invariant values can be found. It has been shown that triplet invariants can be measured for lysozyme with a mean error of approximately 20° (Weckert et al., 1993; Weckert & Hümmer, 1997). In addition, direct methods strengthened by simulated triplet invariants have been used to redetermine the structure of BPTI at resolutions as low as 2.0 Å (Mathiesen & Mo, 1997, 1998). Currently, the oneatatime methods used to measure triplet phases seriously limit practical applications, but faster methods of data collection have been proposed (Shen, 1998). If the means can, in fact, be found for measuring significant numbers of triplet phases quickly and accurately, dualspace direct methods may become routinely applicable to much lower resolution data than is currently possible.
Acknowledgements
The development, in Buffalo, of the ShakeandBake algorithm and the SnB program has been supported by grants GM46733 from NIH and ACI9721373 from NSF, and computing time from the Center for Computational Research at SUNY Buffalo. HAH, CMW and RM would also like to thank the following individuals: ChunShi Chang, Ashley Deacon, George DeTitta, Adam Fass, Steve Gallo, Hanif Khalak, Andrew Palumbo, Jan Pevzner, Thomas Tang and Hongliang Xu, who have aided the development of SnB, and Steve Ealick, P. Lynne Howell, Patrick Loll, Jennifer Martin and Gil Privé, who have generously supplied data sets. The development, in Göttingen, of SHELXD has been supported by HCM Institutional Grant ERB CHBG CT 940731 from the European Commission. GMS and IU wish to thank Thammarat Aree, Zbigniew Dauter, Judith FlippenAnderson, Carlos Frazão, Jörg Kärcher, Katrin Gessler, Håkon Hope, Victor Lamzin, David Langs, Lukatz Lebioda, Paolo Lubini, Peer Mittl, Emilio Parisini, Erich Paulus, Ehmke Pohl, Thierry Prange, Joe Reibenspiess, Martina Schäfer, Thomas Schneider, Markus Teichert, László Vértesy and Martin Walsh for discussions and/or generously providing data for structures referred to in this manuscript. The authors would also like to thank Melda Tugac, Gloria Del Bel and Sandra Finken, who assisted in the preparation of the manuscript.
References
Anderson, D. H., Weiss, M. S. & Eisenberg, D. (1996). A challenging case for protein crystal structure determination: the mating pheromone Er1 from Euplotes raikovi. Acta Cryst. D52, 469–480.Aree, T., Usón, I., Schulz, B., Reck, G., Hoier, H., Sheldrick, G. M. & Saenger, W. (1999). Variation of a theme: crystal structure with four octakis(2,3,6triOmethyl)gammacyclodextrin molecules hydrated differently by a total of 19.3 water. J. Am. Chem. Soc. 121, 3321–3327.
Baggio, R., Woolfson, M. M., Declercq, J.P. & Germain, G. (1978). On the application of phase relationships to complex structures. XVI. A random approach to structure determination. Acta Cryst. A34, 883–892.
Beurskens, P. T. (1981). A statistical interpretation of rotation and translation functions in reciprocal space. Acta Cryst. A37, 426–430.
Bhuiya, A. K. & Stanley, E. (1963). The refinement of atomic parameters by direct calculation of the minimum residual. Acta Cryst. 16, 981–984.
Blessing, R. H. (1997). LOCSCL: a program to statistically optimize local scaling of singleisomorphousreplacement and singlewavelengthanomalousscattering data. J. Appl. Cryst. 30, 176–177.
Blessing, R. H., Guo, D. Y. & Langs, D. A. (1996). Statistical expectation value of the Debye–Waller factor and E(hkl) values for macromolecular crystals. Acta Cryst. D52, 257–266.
Blessing, R. H. & Smith, G. D. (1999). Difference structurefactor normalization for heavyatom or anomalousscattering substructure determinations. J. Appl. Cryst. 32, 664–670.
Bricogne, G. (1998). Bayesian statistical viewpoint on structure determination: basic concepts and examples. Methods Enzymol. 276, 361–423.
Burla, M. C., Camalli, M., Cascarano, G., Giacovazzo, C., Polidori, G., Spagna, R. & Viterbo, D. (1989). SIR88 – a directmethods program for the automatic solution of crystal structures. J. Appl. Cryst. 22, 389–393.
Chang, C.S., Weeks, C. M., Miller, R. & Hauptman, H. A. (1997). Incorporating tangent refinement in the ShakeandBake formalism. Acta Cryst. A53, 436–444.
Cochran, W. (1955). Relations between the phases of structure factors. Acta Cryst. 8, 473–478.
Dauter, Z., Dauter, M., de La Fortelle, E., Bricogne, G. & Sheldrick, G. M. (1999). Can anomalous signal of sulfur become a tool for solving protein crystal structures? J. Mol. Biol. 289, 83–92.
Dauter, Z., Sieker, L. C. & Wilson, K. S. (1992). Refinement of rubredoxin from Desulfovibrio vulgaris at 1.0 Å with and without restraints. Acta Cryst. B48, 42–59.
Deacon, A. M. & Ealick, S. E. (1999). Seleniumbased MAD phasing: setting the sites on larger structures. Structure, 7, R161–R166.
Deacon, A. M., Weeks, C. M., Miller, R. & Ealick, S. E. (1998). The ShakeandBake structure determination of triclinic lysozyme. Proc. Natl Acad. Sci. USA, 95, 9284–9289.
Debaerdemaeker, T., Tate, C. & Woolfson, M. M. (1985). On the application of phase relationships to complex structures. XXIV. The Sayre tangent formula. Acta Cryst. A41, 286–290.
Debaerdemaeker, T. & Woolfson, M. M. (1983). On the application of phase relationships to complex structures. XXII. Techniques for random phase refinement. Acta Cryst. A39, 193–196.
Debaerdemaeker, T. & Woolfson, M. M. (1989). On the application of phase relationships to complex structures. XXVIII. XMY as a random approach to the phase problem. Acta Cryst. A45, 349–353.
DeTitta, G. T., Edmonds, J. W., Langs, D. A. & Hauptman, H. (1975). Use of the negative quartet cosine invariants as a phasing figure of merit: NQEST. Acta Cryst. A31, 472–479.
DeTitta, G. T., Weeks, C. M., Thuman, P., Miller, R. & Hauptman, H. A. (1994). Structure solution by minimalfunction phase refinement and Fourier filtering. I. Theoretical basis. Acta Cryst. A50, 203–210.
Drendel, W. B., Dave, R. D. & Jain, S. (1995). Forced coalescence phasing: a method for ab initio determination of crystallographic phases. Proc. Natl Acad. Sci. USA, 92, 547–551.
Drouin, M. (1998). Personal communication.
Ekstrom, J. L., Mathews, I. I., Stanley, B. A., Pegg, A. E. & Ealick, S. E. (1999). The crystal structure of human Sadenosylmethionine decarboxylase at 2.25 Å resolution reveals a novel fold. Structure, 7, 583–595.
Fan, H.F. & Gu, Y.X. (1985). Combining direct methods with isomorphous replacement or anomalous scattering data. III. The incorporation of partial structure information. Acta Cryst. A41, 280–284.
Fan, H.F., Han, F.S. & Qian, J.Z. (1984). Combining direct methods with isomorphous replacement or anomalous scattering data. II. The treatment of errors. Acta Cryst. A40, 495–498.
Fan, H.F., Hao, Q., Gu, Y.X., Qian, J.Z., Zheng, C.D. & Ke, H. (1990). Combining direct methods with isomorphous replacement or anomalous scattering data. VII. Ab initio phasing of onewavelength anomalous scattering data from a small protein. Acta Cryst. A46, 935–939.
Fortier, S., Moore, N. J. & Fraser, M. E. (1985). A directmethods solution to the phase problem in the single isomorphous replacement case: theoretical basis and initial applications. Acta Cryst. A41, 571–577.
Frazão, C., Sieker, L., Sheldrick, G. M., Lamzin, V., LeGall, J. & Carrondo, M. A. (1999). Ab initio structure solution of a dimeric cytochrome c3 from Desulfovibrio gigas containing disulfide bridges. J. Biol. Inorg. Chem. 4, 162–165.
Fujinaga, M. & Read, R. J. (1987). Experiences with a new translationfunction program. J. Appl. Cryst. 20, 517–521.
Germain, G., Main, P. & Woolfson, M. M. (1970). On the application of phase relationships to complex structures. II. Getting a good start. Acta Cryst. B26, 274–285.
Germain, G. & Woolfson, M. M. (1968). On the application of phase relationships to complex structures. Acta Cryst. B24, 91–96.
Gessler, K., Usón, I., Takaha, T., Krauss, N., Smith, S. M., Okada, S., Sheldrick, G. M. & Saenger, W. (1999). VAmylose at atomic resolution: Xray structure of a cycloamylose with 26 glucoses. Proc. Natl Acad. Sci. USA, 96, 4246–4251.
Giacovazzo, C. (1976). A probabilistic theory of the cosine invariant . Acta Cryst. A32, 91–99.
Giacovazzo, C. (2001). Direct methods. In International tables for crystallography, Vol. B. Reciprocal space, edited by U. Shmueli, ch. 2.2. Dordrecht: Kluwer Academic Publishers.
Giacovazzo, C. & Platas, J. G. (1995). The ab initio crystal structure solution of proteins by direct methods. IV. The use of the partial structure. Acta Cryst. A51, 398–404.
Giacovazzo, C., Siliqi, D. & Platas, J. G. (1995). The ab initio crystal structure solution of proteins by direct methods. V. A new normalizing procedure. Acta Cryst. A51, 811–820.
Giacovazzo, C., Siliqi, D., Platas, J. G., Hecht, H.J., Zanotti, G. & York, B. (1996). The ab initio crystal structure solution of proteins by direct methods. VI. Complete phasing up to derivative resolution. Acta Cryst. D52, 813–825.
Giacovazzo, C., Siliqi, D. & Ralph, A. (1994). The ab initio crystal structure solution of proteins by direct methods. I. Feasibility. Acta Cryst. A50, 503–510.
Giacovazzo, C., Siliqi, D. & Spagna, R. (1994). The ab initio crystal structure solution of proteins by direct methods. II. The procedure and its first applications. Acta Cryst. A50, 609–621.
Giacovazzo, C., Siliqi, D. & Zanotti, G. (1995). The ab initio crystal structure solution of proteins by direct methods. III. The phase extension process. Acta Cryst. A51, 177–188.
Hauptman, H. (1974). On the theory and estimation of the cosine invariants . Acta Cryst. A30, 822–829.
Hauptman, H. (1975). A new method in the probabilistic theory of the structure invariants. Acta Cryst. A31, 680–687.
Hauptman, H. (1982a). On integrating the techniques of direct methods and isomorphous replacement. I. The theoretical basis. Acta Cryst. A38, 289–294.
Hauptman, H. (1982b). On integrating the techniques of direct methods with anomalous dispersion. I. The theoretical basis. Acta Cryst. A38, 632–641.
Hauptman, H., Fisher, J., Hancock, H. & Norton, D. A. (1969). Phase determination for the estriol structure. Acta Cryst. B25, 811–814.
Hauptman, H. A. (1991). A minimal principle in the phase problem. In Crystallographic computing 5: from chemistry to biology, edited by D. Moras, A. D. Podjarny & J. C. Thierry, pp. 324–332. Oxford: International Union of Crystallography and Oxford University Press.
Hauptman, H. A. (1996). The SAS maximal principle: a new approach to the phase problem. Acta Cryst. A52, 490–496.
Hauptman, H. A. & Karle, J. (1953). Solution of the phase problem. I. The centrosymmetric crystal. Am. Crystallogr. Assoc. Monograph No. 3. Dayton, Ohio: Polycrystal Book Service.
Hauptman, H. A., Xu, H., Weeks, C. M. & Miller, R. (1999). Exponential ShakeandBake: theoretical basis and applications. Acta Cryst. A55, 891–900.
Hendrickson, W. A. & Ogata, C. M. (1997). Phase determination from multiwavelength anomalous diffraction measurements. Methods Enzymol. 276, 494–523.
Hodel, A., Kim, S.H. & Brünger, A. T. (1992). Model bias in macromolecular crystal structures. Acta Cryst. A48, 851–858.
Hu, N.H. & Liu, Y.S. (1997). General expression for probabilistic estimation of multiphase structure invariants in the case of a native protein and multiple derivatives. Application to estimates of the threephase structure invariants. Acta Cryst. A53, 161–167.
Karle, I. L., FlippenAnderson, J. L., Uma, K., Balaram, H. & Balaram, P. (1989). αHelix and mixed 3_{10}/αhelix in cocrystallized conformers of BocAibValAibAibValValValAibValAibOme. Proc. Natl Acad. Sci. USA, 86, 765–769.
Karle, J. (1968). Partial structural information combined with the tangent formula for noncentrosymmetric crystals. Acta Cryst. B24, 182–186.
Karle, J. & Hauptman, H. (1956). A theory of phase determination for the four types of noncentrosymmetric space groups 1P222, 2P22, 3P_{1}2, 3P_{2}2. Acta Cryst. 9, 635–651.
Kinneging, A. J. & de Graaf, R. A. G. (1984). On the automatic extension of incomplete models by iterative Fourier calculation. J. Appl. Cryst. 17, 364–366.
Kyriakidis, C. E., Peschar, R. & Schenk, H. (1996). The estimation of fourphase structure invariants using the single difference of isomorphous structure factors. Acta Cryst. A52, 77–87.
Lamzin, V. S. & Wilson, K. S. (1993). Automatic refinement of protein models. Acta Cryst. D49, 129–147.
Langs, D. A. (1988). Threedimensional structure at 0.86 Å of the uncomplexed form of the transmembrane ion channel peptide gramicidin A. Science, 241, 188–191.
Langs, D. A. (1993). Frequency statistical method for evaluating cosine invariants of threephase relationships. Acta Cryst. A49, 545–557.
Langs, D. A., Guo, D.Y. & Hauptman, H. A. (1995). TDSIR phasing: direct use of phaseinvariant distributions in macromolecular crystallography. Acta Cryst. A51, 535–542.
Li, C., Kappock, T. J., Stubbe, J., Weaver, T. M. & Ealick, S. E. (1999). Xray crystal structure of aminoimidazole ribonucleotide synthetase (PurM), from the Escherichia coli purine biosynthetic pathway at 2.5 Å resolution. Structure, 7, 1155–1166.
Loll, P. J., Bevivino, A. E., Korty, B. D. & Axelsen, P. H. (1997). Simultaneous recognition of a carboxylatecontaining ligand and an intramolecular surrogate ligand in the crystal structure of an asymmetric vancomycin dimer. J. Am. Chem. Soc. 119, 1516–1522.
Loll, P. J., Miller, R., Weeks, C. M. & Axelsen, P. H. (1998). A ligandmediated dimerization mode for vancomycin. Chem. Biol. 5, 293–298.
McCourt, M. P., Ashraf, K., Miller, R., Weeks, C. M., Li, N., Pangborn, W. A. & Dorset, D. L. (1997). Xray crystal structures of cytotoxic oxidized cholesterols: 7ketocholesterol and 25hydroxycholesterol. J. Lipid Res. 38, 1014–1021.
McCourt, M. P., Li, N., Pangborn, W., Miller, R., Weeks, C. M. & Dorset, D. L. (1996). Crystallography of linear molecule binary solids. Xray structure of a cholesteryl myristate/cholesteryl pentadecanoate solid solution. J. Phys. Chem. 100, 9842–9847.
Main, P. (1976). Recent developments in the MULTAN system – the use of molecular structure. In Crystallographic computing techniques, edited by F. R. Ahmed, pp. 97–105. Copenhagen: Munksgaard.
Main, P., Fiske, S. J., Hull, S. E., Lessinger, L., Germain, G., Declercq, J.P. & Woolfson, M. M. (1980). MULTAN80: a system of computer programs for the automatic solution of crystal structures from Xray diffraction data. Universities of York, England, and Louvain, Belgium.
Mathiesen, R. H. & Mo, F. (1997). Application of known triplet phases in the crystallographic study of bovine pancreatic trypsin inhibitor. I: studies at 1.55 and 1.75 Å resolution. Acta Cryst. D53, 262–268.
Mathiesen, R. H. & Mo, F. (1998). Application of known triplet phases in the crystallographic study of bovine pancreatic trypsin inhibitor. II: study at 2.0 Å resolution. Acta Cryst. D54, 237–242.
Matthews, B. W. & Czerwinski, E. W. (1975). Local scaling: a method to reduce systematic errors in isomorphous replacement and anomalous scattering measurements. Acta Cryst. A31, 480–497.
Miller, R., DeTitta, G. T., Jones, R., Langs, D. A., Weeks, C. M. & Hauptman, H. A. (1993). On the application of the minimal principle to solve unknown structures. Science, 259, 1430–1433.
Miller, R., Gallo, S. M., Khalak, H. G. & Weeks, C. M. (1994). SnB: crystal structure determination via ShakeandBake. J. Appl. Cryst. 27, 613–621.
Mukherjee, A. K., Helliwell, J. R. & Main, P. (1989). The use of MULTAN to locate the positions of anomalous scatterers. Acta Cryst. A45, 715–718.
Parisini, E., Capozzi, F., Lubini, P., Lamzin, V., Luchinat, C. & Sheldrick, G. M. (1999). Ab initio solution and refinement of two high potential iron protein structures at atomic resolution. Acta Cryst. D55, 1773–1784.
Pavelčík, F. (1994). Pattersonoriented automatic structure determination. Deconvolution techniques in space group P1. Acta Cryst. A50, 467–474.
Perrakis, A., Sixma, T. K., Wilson, K. S. & Lamzin, V. S. (1997). wARP: improvement and extension of crystallographic phases by weighted averaging of multiplerefined dummy atomic models. Acta Cryst. D53, 448–455.
Privé, G. G., Anderson, D. H., Wesson, L., Cascio, D. & Eisenberg, D. (1999). Packed protein bilayers in the 0.9 Å resolution structure of a designed alpha helical bundle. Protein Sci. 8, 1400–1409.
Radfar, R., Shin, R., Sheldrick, G. M., Minor, W., Lovell, C. R., Odom, J. D., Dunlap, R. B. & Lebioda, L. (2000). The crystal structure of N10formyltetrahydrofolate synthetase from Moorella thermoacetica. Biochemistry, 39, 3920–3926.
Read, R. J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Cryst. A42, 140–149.
Refaat, L. S. & Woolfson, M. M. (1993). Directspace methods in phase extension and phase determination. II. Developments of lowdensity elimination. Acta Cryst. D49, 367–371.
Reibenspiess, J. (1998). Personal communication.
Schäfer, M. (1998). Personal communication.
Schäfer, M. & Prange, T. (1998). Personal communication.
Schäfer, M., Schneider, T. R. & Sheldrick, G. M. (1996). Crystal structure of vancomycin. Structure, 4, 1509–1515.
Schäfer, M., Sheldrick, G. M., Bahner, I. & Lackner, H. (1998). Crystal structures of actinomycin D and Z3. Angew. Chem. 37, 2381–2384.
Schäfer, M., Sheldrick, G. M., Schneider, T. R. & Vértesy, L. (1998). Structure of balhimycin and its complex with solvent molecules. Acta Cryst. D54, 175–183.
Schenk, H. (1974). On the use of negative quartets. Acta Cryst. A30, 477–481.
Schneider, T. R. (1998). Personal communication.
Schneider, T. R., Kärcher, J., Pohl, E., Lubini, P. & Sheldrick, G. M. (2000). Ab initio structure determination of the lantibiotic mersacidin. Acta Cryst. D56, 705–713.
Sha, B.D., Liu, S.P., Gu, Y.X., Fan, H.F., Ke, H., Yao, J.X. & Woolfson, M. M. (1995). Direct phasing of onewavelength anomalousscattering data of the protein core streptavidin. Acta Cryst. D51, 342–346.
Sheldrick, G. M. (1982). Crystallographic algorithms for mini and maxicomputers. In computational crystallography, edited by D. Sayre, pp. 506–514. Oxford: Clarendon Press.
Sheldrick, G. M. (1990). Phase annealing in SHELX90: direct methods for larger structures. Acta Cryst. A46, 467–473.
Sheldrick, G. M. (1997). Direct methods based on real/reciprocal space iteration. In Proceedings of the CCP4 study weekend. Recent advances in phasing, edited by K. S. Wilson, G. Davies, A. S. Ashton, & S. Bailey, pp. 147–158. DLCONF97001. Warrington: Daresbury Laboratory.
Sheldrick, G. M. (1998). SHELX: applications to macromolecules. In Direct methods for solving macromolecular structures, edited by S. Fortier, pp. 401–411. Dordrecht: Kluwer Academic Publishers.
Sheldrick, G. M., Dauter, Z., Wilson, K. S., Hope, H. & Sieker, L. C. (1993). The application of direct methods and Patterson interpretation to highresolution native protein data. Acta Cryst. D49, 18–23.
Sheldrick, G. M. & Gould, R. O. (1995). Structure solution by iterative peaklist optimization and tangent expansion in space group P1. Acta Cryst. B51, 423–431.
Shen, Q. (1998). Solving the phase problem using referencebeam Xray diffraction. Phy. Rev. Lett. 80, 3268–3271.
Shiono, M. & Woolfson, M. M. (1992). Directspace methods in phase extension and phase determination. I. Lowdensity elimination. Acta Cryst. A48, 451–456.
Shmueli, U. & Wilson, A. J. C. (2001). Statistical properties of the weighted reciprocal lattice. In International tables for crystallography, Vol. B. Reciprocal space, edited by U. Shmueli, ch. 2.1. Dordrecht: Kluwer Academic Publishers.
Sim, G. A. (1959). The distribution of phase angles for structures containing heavy atoms. II. A modification of the normal heavyatom method for noncentrosymmetical structures. Acta Cryst. 12, 813–815.
Smith, G. D., Blessing, R. H., Ealick, S. E., FontecillaCamps, J. C., Hauptman, H. A., Housset, D., Langs, D. A. & Miller, R. (1997). Ab initio structure determination and refinement of a scorpion protein toxin. Acta Cryst. D53, 551–557.
Smith, G. D., Nagar, B., Rini, J. M., Hauptman, H. A. & Blessing, R. H. (1998). The use of SnB to determine an anomalous scattering substructure. Acta Cryst. D54, 799–804.
Smith, J. L. (1998). Multiwavelength anomalous diffraction in macromolecular crystallography. In Direct methods for solving macromolecular structures, edited by S. Fortier, pp. 211–225. Dordrecht: Kluwer Academic Publishers.
Stec, B., Zhou, R. & Teeter, M. M. (1995). Fullmatrix refinement of the protein crambin at 0.83 Å and 130 K. Acta Cryst. D51, 663–681.
Teichert, M. (1998). Personal communication.
Turner, M. A., Yuan, C.S., Borchardt, R. T., Hershfield, M. S., Smith, G. D. & Howell, P. L. (1998). Structure determination of selenomethionyl Sadenosylhomocysteine hydrolase using data at a single wavelength. Nature Struct. Biol. 5, 369–375.
Usón, I., Sheldrick, G. M., de La Fortelle, E., Bricogne, G., di Marco, S., Priestle, J. P., Grütter, M. G. & Mittl, P. R. E. (1999). The 1.2 Å crystal structure of hirustasin reveals the intrinsic flexibility of a family of highly disulphide bridged inhibitors. Structure, 7, 55–63.
Vermin, W. J. & de Graaff, R. A. G. (1978). The use of Karle–Hauptman determinants in smallstructure determinations. Acta Cryst. A34, 892–894.
Walsh, M. A., Schneider, T. R., Sieker, L. C., Dauter, Z., Lamzin, V. S. & Wilson, K. S. (1998). Refinement of triclinic hen eggwhite lysozyme at atomic resolution. Acta Cryst. D54, 522–546.
Wang, B.C. (1985). Solvent flattening. Methods Enzymol. 115, 90–112.
Weckert, E. & Hümmer, K. (1997). Multiplebeam Xray diffraction for physical determination of reflection phases and its applications. Acta Cryst. A53, 108–143.
Weckert, E., Schwegle, W. & Hümmer, K. (1993). Direct phasing of macromolecular structures by threebeam diffraction. Proc. R. Soc. Lond. Ser. A, 442, 33–46.
Weeks, C. M., DeTitta, G. T., Hauptman, H. A., Thuman, P. & Miller, R. (1994). Structure solution by minimalfunction phase refinement and Fourier filtering. II. Implementation and applications. Acta Cryst. A50, 210–220.
Weeks, C. M., DeTitta, G. T., Miller, R. & Hauptman, H. A. (1993). Applications of the minimal principle to peptide structures. Acta Cryst. D49, 179–181.
Weeks, C. M., Hauptman, H. A., Chang, C.S. & Miller, R. (1994). Structure determination by ShakeandBake with tangent refinement. ACA Trans. Symp. 30, 153–161.
Weeks, C. M., Hauptman, H. A., Smith, G. D., Blessing, R. H., Teeter, M. M. & Miller, R. (1995). Crambin: a direct solution for a 400atom structure. Acta Cryst. D51, 33–38.
Weeks, C. M. & Miller, R. (1999a). The design and implementation of SnB version 2.0. J. Appl. Cryst. 32, 120–124.
Weeks, C. M. & Miller, R. (1999b). Optimizing ShakeandBake for proteins. Acta Cryst. D55, 492–500.
Weeks, C. M., Miller, R. & Hauptman, H. A. (1998). Extending the resolving power of ShakeandBake. In Direct methods for solving macromolecular structures, edited by S. Fortier, pp. 463–468. Dordrecht: Kluwer Academic Publishers.
White, P. S. & Woolfson, M. M. (1975). The application of phase relationships to complex structures. VII. Magic integers. Acta Cryst. A31, 53–56.
Wilson, K. S. (1978). The application of MULTAN to the analysis of isomorphous derivatives in protein crystallography. Acta Cryst. B34, 1599–1608.
Yao, J.X. (1981). On the application of phase relationships to complex structures. XVIII. RANTAN – random MULTAN. Acta Cryst. A37, 642–644.
Zheng, X.F., Fan, H.F., Hao, Q., Dodd, F. E. & Hasnain, S. S. (1996). Direct method structure determination of the native azurin II protein using onewavelength anomalous scattering data. Acta Cryst. D52, 937–941.