International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by E. Arnold, D. M. Himmel and M. G. Rossmann

International Tables for Crystallography (2012). Vol. F, ch. 16.3, pp. 437-442
https://doi.org/10.1107/97809553602060000852

Chapter 16.3. Ab initio phasing of low-resolution Fourier syntheses

V. Y. Lunin,a A. G. Urzhumtsevb and A. Podjarnyc*

aLaboratory of Macromolecular Crystallography, Institute of Mathematical Problems of Biology of the Russian Academy of Sciences, Institutskaia, 4 Region Russia, Pushchino, Moscow Region 142290, Russian Federation,bFaculty of Sciences, University of Nancy 1, Vandoeuvre-lès-Nancy, 54506, France, and cStructural Biology, IGBMC, BP 163 Cedex, Illkirch, 67404, France
Correspondence e-mail:  apodjarny@gmail.com

Low-resolution phasing addresses the cases where experimental X-ray diffraction intensities are only available to a low-resolution limit, or when the standard phasing methods to solve macromolecular structures fail. Ab initio phasing is based on general properties of macromolecular objects (connectivity, electron-density histograms, likelihood of molecular masks etc.) and does not require extra diffraction experiments. The Monte Carlo phasing procedure includes generation of a large `population' of trial solutions, enrichment of this population by filtering with selection criteria, clustering and averaging. The results of low-resolution phasing allow one to get information on the packing of particles in a crystal and on the shape (envelope) of the molecules, and to get an insight into the architecture of multidomain complexes.

16.3.1. Introduction

| top | pdf |

Low-resolution macromolecular structural information may be obtained for non-crystalline macromolecular objects by electron microscopy (see Chapter 19.6[link] and references therein) or small-angle X-ray scattering (see Chapter 19.3[link] and references therein), or by using X-ray free-electron lasers (Chapman et al., 2011[link]; Seibert et al., 2011[link]) and iterative density-reconstruction techniques (Sayre, 2008[link]). While giving similar structural information, low-resolution crystallographic images are a more natural starting point on the way towards atomic models. The need to work at low resolution may be due to the limited diffraction power of the crystals or the failure of standard phasing methods.

A number of techniques have been suggested to estimate the phases of structure factors using various complementary sources of information, specific to a given crystal:

  • (i) Phasing by isomorphous replacement or anomalous scattering techniques using heavy atoms or their clusters may lead to good results, ranging from the pioneering structures reported at nearly 6 Å resolution (Green et al., 1954[link]; Perutz et al., 1960[link]; Black et al., 1962[link]) to viruses (for example, Harrison & Jack, 1975[link]) and ribosomes (Ban et al., 1998[link]). However, working at low resolution requires special effort, high data quality and several sets of structure-factor magnitudes.

  • (ii) Molecular-replacement-type low-resolution searches can be carried out using conventional models, more or less coarsely (Valegård et al., 1991[link]; Jamrog et al., 2003[link]), with simplified models or with molecular envelopes (Jack et al., 1975[link]; Rayment et al., 1982[link]; Urzhumtsev & Podjarny, 1995[link]; Hao, 2006[link]). The simplified models, roughly reproducing molecular shapes obtained previously, may be a spherical shell for viruses (for example Johnson et al., 1976[link]; Chapman et al., 1992[link]), a sphere or a cylindrical shell for proteins and their complexes (Podjarny et al., 1987[link]; Harris, 1995[link]; Andersson & Hovmöller, 1996[link]; Lunin et al., 2001[link]), or several cylinders for α-helical proteins (Kalinin, 1980[link]; Strop et al., 2007[link]).

  • (iii) Phasing techniques can be based on the different average diffraction power of the proteins, nucleic acids and bulk solvent, naturally or artificially introduced (Bragg & Perutz, 1952[link]; Roth et al., 1984[link]; Carter et al., 1990[link]; Fourme et al., 1995[link]; Shepard et al., 2000[link]). These techniques require several crystals and their corresponding diffraction data sets.

  • (iv) In some cases, the classical direct phasing methods can be formally tried to obtain a molecular envelope at low resolution (Podjarny et al., 1981[link]; Carter et al., 1990[link]). Similarly, the phase triplets obtained experimentally by a three-beam diffraction experiment lead in some cases to relatively low-resolution images (Hölzer et al., 2000[link]).

In this chapter we discuss only ab initio phasing methods. By this term we mean a mathematical or computational procedure to estimate the values of the structure-factor phases using a single set of structure-factor magnitudes and only a general type of information, in contrast with most of the methods mentioned above. Low-resolution ab initio phasing estimates the phase values of the structure factors of the lowest resolution for a given crystal, from several tens to several hundreds of reflections in total. Depending on the size of the unit cell, a low-resolution Fourier synthesis calculated with these structure factors can show molecular positions in the unit cell, molecular envelopes (at a resolution dmin ≃ 15 Å or lower) or secondary-structure elements (when dmin ≃ 5–6 Å).

16.3.2. General features of low-resolution images

| top | pdf |

Low-resolution ab initio phasing is usually a laborious procedure. Some features of low-resolution Fourier syntheses increase the difficulties, as follows:

  • (i) There is a common belief that low-resolution Fourier syntheses represent molecular envelopes when the cutoff level is relatively low and the centres of the molecules when this level is high. In practice, density peaks are often shifted from molecular centres toward regions of close intermolecular contact. Bulk-solvent correction decreases the contrast between two neighbouring peaks and can result in their merging. Also, due to the relatively small number of reflections used in the calculation, a small change in phase can significantly modify the image and the positions of the peaks.

  • (ii) Low-resolution envelopes cannot represent sharp molecular features accurately enough and thus cannot cover all macromolecular atoms, even when the synthesis is calculated with the exact structure-factor values.

  • (iii) An increase in the resolution of Fourier maps does not always help to interpret them. For example, an increase in resolution from 16–25 Å to 10–12 Å often makes maps even less suitable for visual inspection, since they stop showing molecular envelope features but do not reveal secondary-structure elements.

  • (iv) Usually, visual inspection of low-resolution maps does not allow one to choose the correct enantiomer. In addition, the overall features of the flipped map −ρ(r) at very low resolution are often very similar to those of the direct map ρ(r). This might complicate the choice of the correct sign of the map.

  • (v) Low-resolution Fourier maps are very sensitive to missing reflections, even if their number is small.

  • (vi) Maps calculated with too few reflections may show a superposition of images corresponding to different choices of the origin.

  • (vii) The determination of twinning and the correct choice of space group are especially complicated at low resolution.

16.3.3. Low-resolution phasing

| top | pdf |

If additional experimental information cannot be used to solve the phase problem, then the search for the correct structure has to be based on some general features of the true phase set. Such features can be formulated as a selection criterion (`score function', `figure of merit' etc.) that may have a qualitative or a quantitative form (Gilmore, 2000[link]; Lunin, Lunina, Petrova et al., 2000[link]). In the following, we use the term `selection criterion' to reflect a quality of the phase set as a whole, and we reserve the term `figure of merit' to estimate the accuracy of an individual phase.

Unfortunately, no selection criterion suggested so far identifies the true phase set unambiguously (Lunin, Lunina et al., 2002[link]). Therefore, the phase problem cannot be solved by a simple search for the best value of a selection criterion. Nevertheless, it has been shown that many criteria improve the ensemble of phase sets by retaining good phase sets, i.e. those close to the true values. In other words, when the starting ensemble of phase sets (e.g. randomly generated) is reduced to those phase sets with a reasonable value of a selection criterion, the percentage of good phase sets increases. Therefore, a multifiltering cyclic phasing procedure was suggested (Lunin et al., 1990[link]):

  • (i) A set of reflections Swork is chosen for phasing.

  • (ii) A large number of random phase sets are generated either directly or using models.

  • (iii) The generated ensemble of phase sets is filtered by applying different selection rules.

  • (iv) The phase sets that fit all selection rules (the selected phase sets) form the output of the cycle.

The output is used to produce estimates of individual phase values (averaged or median) and to modify the mode of generation of random phases in the next cycle. The cycle is repeated several times, varying the phase-generating mode and the selection rules, extending the set of reflections Swork to be phased etc.

A key feature of this procedure is that at every cycle we do not search for the phase set with the best possible fit to the selection rules, but we filter out obviously poor phase sets.

16.3.4. Phase generation and selection

| top | pdf |

16.3.4.1. Generation of the phase sets

| top | pdf |

This step defines how the configuration space of all phase sets is explored. First, all phases may be considered as independent variables. At the beginning of phasing when no information is available, all values are considered equally probable. When some phase information for a reflection is available, for example in the form of a probability distribution, it may be used to generate new random phases, taking into account the restrictions of the space-group symmetry. Second, the phases may be calculated through models. For example, a phase set may be calculated from a model composed of a small number of large Gaussian spheres (`globs'), the coordinates of which are now considered as primary random variables (Lunin et al., 1995[link]; Guo et al., 2000[link]). The sphere coordinates may be generated uniformly in the whole unit cell at the very beginning of phasing and inside a molecular mask later, when some phase information becomes available. The use of models parameterizes the phase space and thus reduces the search. At the same time, if the models are inappropriate, they cannot generate a good approximation to the correct solution.

A further development of the second approach is a multiple-model-based molecular-replacement search (Buehler et al., 2009[link]). In this case, the output of the procedure is no longer the rigid-body parameters of the search model, as in traditional molecular replacement, but a phase set.

16.3.4.2. Search targets – overview

| top | pdf |

Two ways of generating phase variants lead to two different types of search targets. First, the phase selection can be applied at the level of the Fourier maps. For each trial phase set, a Fourier map can be calculated using experimental magnitudes. Then, the selection rule checks whether it has the features of a correct macromolecular Fourier synthesis or not. This approach has the advantage that it avoids model-related problems. If the phase sets are calculated from a model, then the corresponding calculated structure-factor magnitudes are also available. Their similarity to the experimental ones can be used to select the associated phase set.

16.3.4.3. Histogram of a Fourier synthesis

| top | pdf |

The histogram of a Fourier synthesis ρ(r) indicates which values are present and how frequently these values appear in the synthesis (Lunin, 1988[link], 1993[link]; Harrison, 1988[link]; Luzzati et al., 1988[link]; Zhang & Main, 1990[link]). To present the histogram numerically, the interval (ρmin, ρmax) of the possible values of ρ(r) is divided into K equal parts (bins) and for every bin the frequency[{\nu _k} = {n_k}/N, \quad k = 1,\ldots,K, \eqno(16.3.4.1)]is calculated. Here, N is the total number of grid points at which ρ(r) is calculated and nk is the number of grid points with ρ(r) values belonging to the kth bin. The set of frequencies [\{ {\nu _k} \}_{k = 1}^K] is called the histogram corresponding to the function ρ(r). It depends on the resolution of the synthesis and is sensitive to phase errors. The `standard histogram' [\{ {\nu _{k}^{\rm exact}} \}_{k = 1}^K] at a particular resolution corresponds to the Fourier synthesis calculated with the observed magnitudes and the correct phases. It can be predicted before the phases are determined (Zhang & Main, 1990[link]; Lunin & Skovoroda, 1991[link]). When the standard histogram is known, it can be used to select appropriate Fourier syntheses. For example, a selection criterion may be defined as the correlation of the standard histogram with that from the Fourier synthesis calculated with the observed magnitudes and trial phases. Similar low-resolution phasing ideas have been used in protein electron crystallography (Dorset, 2000[link]).

16.3.4.4. Map connectivity

| top | pdf |

The quality of phase sets may be also judged by the topological properties of regions of high electron density, e.g. their connectivity. A visual inspection of continuous regions in Fourier maps and the absence of noisy peaks were used for many years to estimate map quality. A formal scheme of the use of connectivity for phase improvement was discussed by Baker et al. (1993[link]).

These ideas can be incorporated into low-resolution phasing as follows (Lunin, Lunina & Urzhumtsev, 2000[link]). For a Fourier synthesis, ρ(r), calculated with the observed magnitudes and trial phases, each trial phase set is associated with a mask region[{\Omega _\kappa } = \left\{ {\bf r}{\rm :}\ \rho \left({\bf r} \right) \,\gt \,\kappa \right\} \eqno(16.3.4.2)]composed of the grid points with the highest values of ρ(r). The simplest characteristics of the region are the number of connected components in the region Ωκ and their size. If the synthesis resolution is low and the cutoff level is high enough, it is expected that the region Ωκ will consist of a small number of globs corresponding to isolated molecules. The correspondence between the connected components in the Ωκ mask and the desired number (for example, a known number of molecules in the unit cell) may be used as a connectivity-based selection criterion. More sophisticated criteria can be introduced.

16.3.4.5. Few-atoms model approach

| top | pdf |

At low resolution, the content of a macromolecular crystal may be represented by a set of large isotropic Gaussian scatterers (pseudo-atoms). The number of scatterers necessary for appropriate modelling depends on the molecular shape and the resolution. Their size, represented by their isotropic displacement parameter (B factor), can be estimated at the first step of the procedure. The coordinates of the spheres can be generated randomly and uniformly in the unit cell. For an advanced search, when the molecular envelope is already known, the spheres can only be generated inside this envelope. More complicated generation rules can be applied as well. For a generated model, a set of structure factors Fcalc(s) is calculated and their magnitudes Fcalc(s) are compared with the experimental data Fobs(s) (Lunin et al., 1995[link]). When these values are close enough, the corresponding phase set is selected for further analysis.

In the simplest case, when the molecule is approximated by a single sphere, a systematic search (instead of a random one) of the position of this sphere in the unit cell becomes possible (Jack et al., 1975[link]; Podjarny et al., 1987[link]; Harris, 1995[link]; Andersson & Hovmöller, 1996[link]). On the other hand, some attempts to obtain ab initio molecular envelopes when working with many pseudo-atoms have been also reported (Subbiah, 1991[link], 1993[link]; David & Subbiah, 1994[link]).

16.3.4.6. Likelihood-based selection

| top | pdf |

Statistical likelihood estimates the probability of reproducing experimental values within a framework of a statistical hypothesis. For example, the likelihood of a trial mask region may be estimated as the probability of obtaining the values of the observed magnitudes when placing atoms randomly and uni­formly into the mask and calculating the structure factors from such a random model (Lunin et al., 1998[link]):[L = {\rm Probability} \left\{ {\left\{ {{F^{\rm calc}}({\bf s})} \right\} {\rm{\, are \, close \, to\, }} \left\{ {{F^{\rm obs}}({\bf s})} \right\}} \right\} , \,\, {\bf{s}} \in {S_{\rm test}}. \eqno(16.3.4.3)]The most reasonable masks are those with the highest probability. In an advanced approach, more general prior probability distributions pprior(r) for atomic coordinates are considered, and a search for the one that maximizes the probability is conducted [equation (16.3.4.3)[link]] (Bricogne & Gilmore, 1990[link]). Such an approach may be used to select a mask (or prior) from several alternatives.

The likelihood given by equation (16.3.4.3)[link] becomes a selection criterion for phase sets if every trial phase set is associated with a mask or more general prior. A mask may be constructed as a region of highest values in a Fourier synthesis calculated with observed magnitudes and trial phases (Petrova et al., 2000[link]). A more advanced prior distribution may be associated with trial phases. For example, a search for the most featureless distribution (maximum entropy) among all the distributions that are consistent with the trial phases and observed magnitudes has been proposed (Bricogne & Gilmore, 1990[link]). The likelihood given by equation (16.3.4.3)[link] may be calculated in a straightforward Monte Carlo-type procedure (Petrova et al., 2000[link]) or with the use of an analytical approximation of the likelihood function (Lunin & Urzhumtsev, 1984[link]; Bricogne & Gilmore, 1990[link]). It should be noted that the set of reflections Stest used to calculate the likelihood given by equation (16.3.4.3)[link] is generally different from the set of reflections Swork for trial phases.

16.3.4.7. Binary functions

| top | pdf |

Binary integer programming (BIP) is an effective approach for solving a system of linear inequalities in binary unknowns (0 or 1). Crystallographers are often interested in the study of a region with density values above a certain level, i.e. in a binary function representing this region.

The unknown phase values are linked to the observed mag­nitudes by nonlinear equations, but the equations may be reduced to linear ones if the phase values are restricted to a grid (Lunin, Urzhumtsev & Bockmayr, 2002[link]). On the other hand, the use of binary representations of electron density and phases introduces additional errors in the calculations, reducing the advantages of the BIP approach. Another approach for obtaining binary masks at a grid uses genetic algorithms (Webster & Hilgenfeld, 2001[link]).

16.3.4.8. Common features of selection criteria

| top | pdf |

All the targets reported above (as well as some others tried in order to select phase sets) show similar problems when applied at low resolution (Lunin, Lunina, Petrova et al., 2000[link]):

  • (i) The best value of a target may correspond to quite a poor phase set.

  • (ii) The target value for a good phase set may be significantly worse than the best target value.

  • (iii) Local refinement of model parameters or phases may lead to significant improvement of the target value without any improvement in the phases.

Owing to these problems, attempts to find the solution of the low-resolution phase problem by minimizing (or maximizing) a target are unreliable. At the same time, the selection of phase sets from a random ensemble by an appropriate target significantly increases the fraction of good ones.

16.3.5. Processing of the output

| top | pdf |

The output of the selection procedure may be presented as a two-dimensional table. In this table, each column represents one selected phase set and each row corresponds to one structure factor, represented by the phase value in different selected phase sets. If a selected phase set is considered as a point in multidimensional configuration space of all phase sets, the output becomes a `cloud' in this space. A definition of the distance between two points in this space is required to study the detail of the distribution of these points (see Section 16.3.5.5[link]).

16.3.5.1. Formal comparison of phase sets

| top | pdf |

Most frequently, two quantities are used to compare phase sets. One is the mean phase error,[{D_\varphi } = {1 \over M}\sum\limits_{{\bf s} \in {\bf S}} {\left| {{\varphi _{\bf 1}} \left({\bf s} \right) - {\varphi _2}\left({\bf s} \right)} \right|}, \eqno(16.3.5.1)]which takes into account the phase values only. The second quantity is the map correlation coefficient (Lunin & Woolfson, 1993[link]),[\eqalignno{{C_\varphi } = & \, {{\int {\left [ {\rho _1} ({\bf r}) - \left\langle {\rho _1} \right\rangle \right ] \left [ {\rho _2} ({\bf r}) - \left\langle {\rho _2} \right\rangle \right ]\, {\rm d} {V_{\bf r}} }} \over {\left \{ {\int {{\left [ {\rho _1} ({\bf r}) - \left\langle {\rho _1} \right\rangle \right ]}^2} \,{\rm d}{V_{\bf r}} \, \int {{\left [ {\rho _2} ({\bf r}) - \left\langle {\rho _2} \right\rangle \right ] }^2}\, {\rm d}{V_{\bf r}} } \right \} ^{1/2} } } \cr = & \, {{\textstyle\sum_{{\bf s} \in S} {F^{\rm obs}} {{({\bf s})}^2} \cos \left [ {\varphi _1} ({\bf s}) - {\varphi _2} ({\bf s}) \right ]} \over {\textstyle \sum_{{\bf s} \in S} {F^{\rm obs}} {{({\bf s})}^2}}}, & (16.3.5.2)}]which weighs the differences by structure-factor phases. Here, M is the number of reflections in the studied set S, ϕ1(s) and ϕ2(s) are phase values corresponding to reflection s in two phase sets, ρ1(r) and ρ2(r) are Fourier syntheses calculated using the observed magnitudes and phases to be compared, and the sums in reciprocal space are calculated without the F(0) term. Each of these quantities has its own disadvantage. The first is influenced by a number of weak reflections with poor phases that do not strongly affect the Fourier map but do significantly increase the value of Dϕ. On the other hand, Cϕ is influenced mostly by a small number of very strong reflections and is not so sensitive to errors in the phases of other structure factors. Sometimes, additional weighting in equations (16.3.5.1)[link] and (16.3.5.2)[link] or statistical calculations in resolution shells are applied to attenuate these effects.

If a collection of phase sets is considered as a cloud in multidimensional space, a formal distance may be defined on the basis of equation (16.3.5.2)[link] as[\eqalignno{ {d_\varphi } = & \, \left \{ {{{\int {{{\left [ {\rho _1} ({\bf r}) - {\rho _2} ({\bf r}) \right ]}^2} \,{\rm d} {V_{\bf r}}} } \over {\int {{{\left [ {\rho _1} ({\bf r}) \right ] }^2}\, {\rm d} {V_{\bf r}}} }}} \right \} ^{1/2} \cr = & \, \left \{ {{{\int {{{\left [ {\rho _1} ({\bf r}) - {\rho _2} ({\bf r}) \right ] }^2}\, {\rm d} {V_{\bf r}}} } \over {\int {{{\left [ {\rho _{2} }({\bf r}) \right ] }^2} \,{\rm d} {V_{\bf r}}} }}} \right \} ^{1/2} \cr = & \, \left [ 2\left({1 - {C_\varphi }} \right)\right ]^{1/2} , &(16.3.5.3)}]varying from 0 to 2.

These formal measures of closeness are convenient when com­paring phase sets with small differences but may be misleading at the initial stages of ab initio phasing, as discussed below.

16.3.5.2. Phase ambiguity and alignment of phase sets

| top | pdf |

Two randomly generated phase sets may result in Fourier maps ρ1(r) and ρ2(r) that differ only by an origin shift t permitted for the given space group: ρ2(r) = ρ1(r − t). These two maps present the same object, they are calculated with the same experimental magnitudes and they are indistinguishable by the criteria discussed above. Therefore, they could both appear in the output of the selection procedure. However, the corresponding phases in these phase sets are different,[{\varphi _2}({\bf s}) = {\varphi _1}({\bf s}) + 2\pi ({\bf s},{\bf t}), \eqno(16.3.5.4)]and an attempt to average such phase sets or superpose the corresponding maps is wrong. To compare and average these phase sets correctly they (or the corresponding maps) should be aligned (Lunin et al., 1990[link]; Hašek & Schenk, 1992[link]; Lunin & Lunina, 1996[link]). This alignment may be done on the basis of any formal measure of phase closeness, for example equations (16.3.5.1)[link] or (16.3.5.2)[link]. It searches for the space-group-permitted shift of the second synthesis that results in the smallest phase difference between the two sets.

The enantiomer density transformation ρ(r) → ρ(−r) con­serves the features of the Fourier maps used in the selection criteria, conserves the magnitudes of the structure factors and substitutes the phases ϕ by −ϕ. Only high-resolution images might be able to solve the ambiguity and reveal the `handedness' at later steps through the features of the secondary-structure elements. At the initial stage of phasing, the two enantiomer solutions may be considered equivalent and the enantiomer change ϕ → −ϕ may be added as an allowed operation in the alignment process.

Finally, a density transformation ρ(r) → −ρ(r) preserves the magnitudes, changes the phases by π and flips the envelopes for a given cutoff level. In order to assess which of the two molecular envelopes is more probable, ρ(r) or −ρ(r), a generalized maximum-likelihood technique can be used (Lunin et al., 1998[link]).

The problem of phase alignment may be solved to some extent by fixing in advance the phase values for a small number (up to four, depending on the space group) of specially chosen reflections (reflections fixing the origin and/or the enantiomer). Nevertheless, for large phase sets the influence of these reflections may be too small to produce a reliable alignment.

In the following, when we discuss the closeness of two phase sets, we always suppose that they have already been aligned.

16.3.5.3. Multiple alignment and `central' phase set

| top | pdf |

The problem of alignment becomes more complicated when applied to several phase sets simultaneously. The choice of a `central' point of reference seems to be a reasonable solution, provided it minimizes the dispersion of the phase sets. To do so, let [\left\{ {{{\bf v}_1},{{\bf v}_2},\ldots,{{\bf v}_N}} \right\}] be a `cloud' of selected phase sets considered as points in multidimensional configuration space. The point v* is defined as the centre of the cloud if it minimizes the size of the cloud after all phase sets have been aligned to it:[Q\left({{\bf v}^ * } \right) = \textstyle\sum\limits_{n = 1}^N {d_\varphi ^2 \left({{\bf v}_n},{{\bf v}^ * } \right)} \, \Rightarrow \,\min, \quad\quad {{\bf v}^ * } \in \left\{ {{{\bf v}_1},{{\bf v}_2},\ldots,{{\bf v}_N}} \right\}, \eqno (16.3.5.5)]or[{Q_1}\left({{\bf v}^ * } \right) = \textstyle\sum\limits_{n = 1}^N {d_\varphi } \left( {{\bf v}_n}, {{\bf v}^ * } \right) \, \Rightarrow \,\min\!. \eqno(16.3.5.6)]This phase set may be used as the reference point to align the phase sets or as a current solution of the phase problem.

16.3.5.4. Phase averaging and assignment of the probability distribution

| top | pdf |

The simplest way to process the selected phase sets is to average them. For every reflection s from the set S, the best phase ϕbest(s) and its figure of merit m(s) are calculated as[m ({\bf s}) \exp \left [{i{\kern 1pt} {\varphi ^{\rm best}} ({\bf s})} \right] = {1 \over N} \sum\limits_{n= 1}^N {\exp \left [{i{\kern 1pt} {\varphi _n} ({\bf s})} \right]}, \eqno(16.3.5.7)]where ϕn(s) is the phase in the nth selected set corresponding to the reflection s, and N is the number of selected sets. The figure of merit[m ({\bf s}) = {1 \over N} \sum\limits_{n = 1}^N {\cos \left[{{\varphi _n} ({\bf s}) - {\varphi ^{\rm best}} ({\bf s})} \right]}\eqno(16.3.5.8)]reflects the divergence of the phase values of the reflection s in different selected phase sets.

The output of the phasing cycle may also be used to derive an approximate unimodal distribution,[\eqalignno{P (\varphi) = & \, {1 \over {{I_0} (T )}} \exp \left [{T \cos ({\varphi - \theta })} \right] \cr = & \, \Omega ({A,B} ) \exp \left [{A \cos \varphi + B\sin \varphi } \right], &(16.3.5.9)}](also called `von Mises', `circular normal', `Sim' etc.), with θ = ϕbest, I1(T)/I0(T) = m, or a bimodal distribution,[\eqalignno {P (\varphi) = & \, \Omega ({A,B,C,D}) \cr & \times \exp \left [{A\cos \varphi + B\sin \varphi + C\cos 2\varphi + D\sin 2\varphi } \right] \cr &&(16.3.5.10)}](Hendrickson & Lattman, 1970[link]). These distributions may be used to generate phases at the next cycle of phasing.

16.3.5.5. Cluster analysis

| top | pdf |

Accurate processing of the selected phase sets may involve methods of cluster analysis (Lunin et al., 1990[link], 1995[link]; Buehler et al., 2009[link]). Cluster analysis is a branch of applied mathematics aimed at classifying a set of points in a multidimensional space into several compact groups called clusters (or classes), so that the points inside a particular cluster are close to each other while different clusters are distant in space. Methods of cluster analysis use the matrix of point-to-point distances as input information. If the cluster analysis shows that the selected population can be split into several significantly different clusters, then the averaging is performed in each cluster separately, resulting in several alternatives for the solution of the phase problem and requiring multisolution strategies.

16.3.6. Conclusions and examples

| top | pdf |

The study of large complexes has renewed research interest in low-resolution phasing, since the corresponding crystals often diffract only to low resolution. Ab initio low-resolution phasing can give shape information in the unit cell and, in favourable cases, secondary-structure information. This information can be combined with other imaging techniques to obtain a reliable description of the structure.

The low-resolution maps for the ribosomal T50S particle obtained as early as the mid-1990s (Volkmann et al., 1995[link]; Urzhumtsev et al., 1996[link]; see also Lunin, Lunina et al., 2002[link]), for the low-density lipoprotein particle (Lunin et al., 2001[link]) and for some smaller macromolecules (Fokine et al., 2003[link]; Müller et al., 2006[link]) are some examples. In favourable cases, such images show not only the molecular envelope but also the secondary structure (Lunina et al., 2003[link]). At the same time, the current ab initio low-resolution phasing procedures are not yet completely automated and require significant human intervention.

References

Andersson, K. M. & Hovmöller, S. (1996). Phasing proteins at low resolution. Acta Cryst. D52, 1174–1180.
Baker, D., Krukowski, A. E. & Agard, D. A. (1993). Uniqueness and the ab initio phase problem in macromolecular crystallography. Acta Cryst. D49, 186–192.
Ban, N., Freeborn, B., Nissen, P., Penczek, P., Grassucci, R. A., Sweet, R., Frank, J., Moore, P. B. & Steitz, T. A. (1998). A 9 Å resolution X-ray crystallographic map of the large ribosomal subunit. Cell, 93, 1105–1115.
Black, C. C. F., Fenn, R. H., North, A. C. T., Phillips, D. C. & Poljak, R. J. (1962). Structure of lysozyme. A Fourier map of the electron density at 6 Å resolution obtained by X-ray diffraction. Nature (London), 196, 1173–1178.
Bragg, W. L. & Perutz, M. F. (1952). The external form of the haemoglobin molecule. Acta Cryst. 5, 277–283.
Bricogne, G. & Gilmore, C. J. (1990). A multisolution method of phase determination by combined maximization of entropy and likelihood. I. Theory, algorithms and strategy. Acta Cryst. A46, 284–297.
Buehler, A., Urzhumtseva, L., Lunin, V. Y. & Urzhumtsev, A. (2009). Cluster analysis for phasing with molecular replacement: a feasibility study. Acta Cryst. D65, 644–650.
Carter, C. W., Crumley, K. V., Coleman, D. E., Hage, F. & Bricogne, G. (1990). Direct phase determination for the molecular envelope of tryptophanyl-tRNA synthetase from Bacillus stearothermophilus by X-ray contrast variation. Acta Cryst. A46, 57–68.
Chapman, H. N., Fromme, P. & Spence, J. C. H. (2011). Femtosecond X-ray protein nanocrystallography. Nature(London), 470, 73–77.
Chapman, M. S., Tsao, J. & Rossmann, M. G. (1992). Ab initio phase determination for spherical viruses: parameter determination for spherical-shell models. Acta Cryst. A48, 301–312.
David, P. R. & Subbiah, S. (1994). Low-resolution real-space envelopes: the application of the condensing-protocol approach to the ab initio macromolecular phase problem of a variety of examples. Acta Cryst. D50, 132–138.
Dorset, D. L. (2000). Low-resolution direct phase determination in protein electron crystallography – breaking globular constraints. Acta Cryst. A56, 529–535.
Fokine, A., Morales, R., Contreras-Martel, C., Carpentier, P., Renault, F., Rochu, D. & Chabriere, E. (2003). Direct phasing at low resolution of a protein copurified with human paraoxonase (PON1). Acta Cryst. D59, 2083–2087.
Fourme, R., Shepard, W., Kahn, R., l'Hermite, G. & Li de La Sierra, I. (1995). The multiwavelength anomalous solvent contrast (MASC) method in macromolecular crystallography. J. Synchrotron Rad. 2, 36–48.
Gilmore, C. J. (2000). Direct methods and protein crystallography at low resolution. Acta Cryst. D56, 1205–1214.
Green, D. W., Ingram, V. M. & Perutz, M. F. (1954). The structure of haemoglobin. IV. Sign determination by the isomorphous replacement method. Proc. R. Soc. London Ser. A, 225, 287–307.
Guo, D. Y., Blessing, R. H. & Langs, D. A. (2000). Globbic approximation in low-resolution direct-methods phasing. Acta Cryst. D56, 1148–1155.
Hao, Q. (2006). Macromolecular envelope determination and envelope-based phasing. Acta Cryst. D62, 909–914.
Harris, G. W. (1995). Fast ab initio calculation of solvent envelopes for protein structures. Acta Cryst. D51, 695–702.
Harrison, R. W. (1988). Histogram specification as a method of density modification. J. Appl. Cryst. 21, 949–952.
Harrison, S. C. & Jack, A. (1975). Structure of tomato bushy stunt virus. Three-dimensional X-ray diffraction analysis at 16 Å resolution. J. Mol. Biol. 97, 173–191.
Hašek, J. & Schenk, H. (1992). On the comparison of different sets of structure-factor phases. Acta Cryst. A48, 693–695.
Hendrickson, W. A. & Lattman, E. E. (1970). Representation of phase probability distributions for simplified combination of independent phase information. Acta Cryst. B26, 136–143.
Hölzer, K., Weckert, E. & Schroer, K. (2000). Properties of an electron-density map derived from a limited number of experimentally determined triplet phases. Acta Cryst. D56, 322–327.
Jack, A., Harrison, S. C. & Crowther, R. A. (1975). Structure of tomato bushy stunt virus. II. Comparison of results obtained by electron microscopy and X-ray diffraction. J. Mol. Biol. 97, 163–172.
Jamrog, D. C., Zhang, Y. & Phillips, G. N. Jr (2003). SOMoRe: a multi-dimensional search and optimization approach to molecular replacement. Acta Cryst. D59, 304–314.
Johnson, J. E., Akimoto, T., Suck, D., Rayment, I. & Rossmann, M. G. (1976). The structure of southern bean mosaic virus at 22.5 resolution. Virology, 75, 394–400.
Kalinin, D. I. (1980). Use of a cylindrical model of a protein to determine the spatial structure of the rhombic modification of leghaemoglobin. Sov. Phys. Crystallogr. 25, 307–313.
Lunin, V. Y. (1988). Use of the information on electron density distribution in macromolecules. Acta Cryst. A44, 144–150.
Lunin, V. Y. (1993). Electron-density histograms and the phase problem. Acta Cryst. D49, 90–99.
Lunin, V. Y. & Lunina, N. L. (1996). The map correlation coefficient for optimally superposed maps. Acta Cryst. A52, 365–368.
Lunin, V. Y., Lunina, N. L., Petrova, T. E., Skovoroda, T. P., Urzhumtsev, A. G. & Podjarny, A. D. (2000). Low-resolution ab initio phasing: problems and advances. Acta Cryst. D56, 1223–1232.
Lunin, V. Y., Lunina, N. L., Petrova, T. E., Urzhumtsev, A. G. & Podjarny, A. D. (1998). On the ab initio solution of the phase problem for macromolecules at very low resolution. II. Generalized likelihood based approach to cluster discrimination. Acta Cryst. D54, 726–734.
Lunin, V. Y, Lunina, N. L., Petrova, T. E., Vernoslova, E. A., Urzhumtsev, A. G. & Podjarny, A. D. (1995). On the ab initio solution of the phase problem for macromolecules at very low resolution: the few atoms model method. Acta Cryst. D51, 896–903.
Lunin, V. Y., Lunina, N., Podjarny, A., Bockmayr, A. & Urzhumtsev, A. (2002). Ab initio phasing starting from low resolution. Z. Kristallogr. 217, 668–685.
Lunin, V. Y., Lunina, N. L., Ritter, S., Frey, I., Berg, A., Diederichs, K., Podjarny, A. D., Urzhumtsev, A. & Baumstark, M. W. (2001). Low-resolution data analysis for low-density lipoprotein particle. Acta Cryst. D57, 108–121.
Lunin, V. Y., Lunina, N. L. & Urzhumtsev, A. G. (2000). Connectivity properties of high-density regions and ab initio phasing at low resolution. Acta Cryst. A56, 375–382.
Lunin, V. Y. & Skovoroda, T. P. (1991). Frequency-restrained structure-factor refinement. I. Histogram simulation. Acta Cryst. A47, 45–52.
Lunin, V. Y. & Urzhumtsev, A. G. (1984). Improvement of protein phases by coarse model modification. Acta Cryst. A40, 269–277.
Lunin, V. Y., Urzhumtsev, A. & Bockmayr, A. (2002). Direct phasing by binary integer programming. Acta Cryst. A58, 283–291.
Lunin, V. Y., Urzhumtsev, A. G. & Skovoroda, T. P. (1990). Direct low-resolution phasing from electron-density histograms in protein crystallography. Acta Cryst. A46, 540–544.
Lunin, V. Y. & Woolfson, M. M. (1993). Mean phase error and the map-correlation coefficient. Acta Cryst. D49, 530–533.
Lunina, N., Lunin, V. & Urzhumtsev, A. (2003). Connectivity-based ab initio phasing: from low resolution to a secondary structure. Acta Cryst. D59, 1702–1715.
Luzzati, V., Mariani, P. & Delacroix, H. (1988). X-ray crystallography at macromolecular resolution: a solution of the phase problem. Makromol. Chem. Macromol. Symp. 15, 1–17.
Müller, J. J., Lunina, N. L., Urzhumtsev, A., Weckert, E., Heinemann, U. & Lunin, V. Y. (2006). Low-resolution ab initio phasing of Sarcocystis muris lectin SML-2. Acta Cryst. D62, 533–540.
Perutz, M. F., Rossmann, M. G., Cullis, A. F., Muirhead, H., Will, G. & North, A. C. T. (1960). Structure of haemoglobin. A three-dimensional Fourier synthesis at 5.5 Å resolution obtained by X-ray analysis. Nature (London), 185, 416–422.
Petrova, T. E., Lunin, V. Y. & Podjarny, A. D. (2000). Ab initio low-resolution phasing in crystallography of macromolecules by maximization of likelihood. Acta Cryst. D56, 1245–1252.
Podjarny, A. D., Rees, B., Thierry, J.-C., Cavarelli, J., Jesior, J. C., Roth, M., Lewitt-Bentley, A., Kahn, R., Lorber, B., Ebel, J.-P., Giegé, R. & Moras, D. (1987). Yeast tRNAAsp–Aspartyl-tRNA synthetase complex: low resolution crystal structure. J. Biomol. Struct. Dyn. 5, 187–198.
Podjarny, A. D., Schevitz, R. W. & Sigler, P. B. (1981). Phasing low-resolution macromolecular structure factors by matricial direct methods. Acta Cryst. A37, 662–668.
Rayment, I., Baker, T. S., Caspar, D. L. & Murakami, W. T. (1982). Polyoma virus capsid structure at 22.5 Å resolution. Nature (London), 295, 110–115.
Roth, M., Lewit-Bentley, A. & Bentley, G. A. (1984). Scaling and phase-difference determination in solvent contrast variation experiments. J. Appl. Cryst. 17, 77–84.
Sayre, D. (2008). Report on a project on three-dimensional imaging of the biological cell by single-particle X-ray diffraction. Acta Cryst. A64, 33–35.
Seibert, M. M., Ekeberg, T. & Hajdu, J. (2011). Single mimivirus particles intercepted and imaged with an X-ray laser. Nature (London), 470, 78–81.
Shepard, W., Kahn, R., Ramin, M. & Fourme, R. (2000). Low-resolution phase information in multiple-wavelength anomalous solvent contrast variation experiments. Acta Cryst. D56, 1288–1303.
Strop, P., Brzustowicz, M. R. & Brunger, A. T. (2007). Ab initio molecular-replacement phasing for symmetric helical membrane proteins. Acta Cryst. D63, 188–196.
Subbiah, S. (1991). Low-resolution real-space envelopes: an approach to the ab initio macromolecular phase problem. Science, 252, 128–133.
Subbiah, S. (1993). Low-resolution real-space envelopes: improvements to the condensing protocol approach and a new method to fix the sign of such envelopes. Acta Cryst. D49, 108–119.
Urzhumtsev, A. & Podjarny, A. (1995). On the solution of the molecular-replacement problem at very low resolution: application to large complexes. Acta Cryst. D51, 888–895.
Urzhumtsev, A. G., Vernoslova, E. A. & Podjarny, A. D. (1996). Approaches to very low resolution phasing of the ribosome 50S particle from Thermus thermophilus by the few-atoms-models and molecular-replacement methods. Acta Cryst. D52, 1092–1097.
Valegård, K., Liljas, L., Fridborg, K. & Unge, T. (1991). Structure determination of the bacteriophage MS2. Acta Cryst. B47, 949–960.
Volkmann, N., Schlunzen, F., Urzhumtsev, A. G., Vernoslova, E. A., Podjarny, A. D., Roth, M., Pebay-Peyroula, E., Berkovitch-Yellin, Z., Zaytsev-Bashan, A. & Yonath, A. (1995). On ab initio phasing of ribosomal particles at very low resolution. CCP4 Newslett. 31, 23–32.
Webster, G. & Hilgenfeld, R. (2001). An evolutionary computational approach to the phase problem in macromolecular X-ray crystallography. Acta Cryst. A57, 351–358.
Zhang, K. Y. J. & Main, P. (1990). Histogram matching as a new density modification technique for phase refinement and extension of protein molecules. Acta Cryst. A46, 41–46.








































to end of page
to top of page