International
Tables for
Crystallography
Volume B
Reciprocal space
Edited by U. Shmueli

International Tables for Crystallography (2010). Vol. B, ch. 2.5, pp. 375-388   | 1 | 2 |

Section 2.5.7. Single-particle reconstruction

P. A. Penczekg

2.5.7. Single-particle reconstruction

| top | pdf |

2.5.7.1. Formation of projection images in single-particle reconstruction

| top | pdf |

Cryo-electron microscopy (cryo-EM) in combination with the single-particle approach is a new method of structure determination for large macromolecular assemblies. Currently, resolution in the range 10 to 30 Å can be reached routinely, although in a number of pilot studies it has been possible to obtain structures at 4 to 8 Å. Theoretically, electron microscopy can yield data exceeding atomic resolution, but the difficulties in overcoming the very low signal-to-noise ratio (SNR) and low contrast in the data, combined with the adverse effects of the contrast transfer function (CTF) of the microscope, hamper progress in fulfilling the potential of the technique. However, in recent years, cryo-EM has proven its power in the structure determination of large macromolecular assemblies and machines which are too large and complex for the more traditional techniques of structural biology, i.e., X-ray crystallography and NMR spectroscopy.

Single-particle reconstruction is based on the assumption that a protein exists in solution in multiple copies of the same basic structure. Unlike in crystallography, no ordering of the structure within a crystal grid is required; the enhancement of the SNR is achieved by bringing projection images of different (but structurally identical) proteins into register and averaging them. This is why the technique is sometimes called `crystallography without crystals'.

Within the linear weak-phase-object approximation of the image formation process in the microscope [see equation (2.5.2.43)[link] in Section 2.5.2[link]], 2D projections represent line integrals of the Coulomb potential of the particle under examination convoluted with the point-spread function of the microscope, s, as introduced in Section 2.5.1[link]. In addition, we have to consider the translation t of the projection in the plane of micrograph, suppression of high-frequency information by the envelope function E of the microscope, and two additive noises mB and mS . The first one is a coloured background noise, while the second is attributed to the residual scattering by the solvent or the supporting thin layer of carbon, if used, assumed to be white and affected by the transfer function of the microscope in the same way as the imaged protein. In order to have the image formation model correspond more closely to the physical reality of data collection, we write equation (2.5.6.4)[link] from Section 2.5.6[link] such that the projection operation is always realized in the z direction of the coordinate system (corresponding to the direction of propagation of the electron beam), while the molecule is rotated arbitrarily by three Eulerian angles:[\eqalignno{d_n ( {\bf{x}}) &= s_n ( {\bf{x}} ) * e_n( {\bf{x}} ) * \left[ {\textstyle\int {f\left( {{\bf{T}}_n {\bf{r}}} \right)\,{\rm d}z} + m_n^{\rm S} ( {\bf{x}} )} \right] + m_n^{\rm B} ( {\bf{x}} ),&\cr&\quad n = 1, \ldots ,N.&(2.5.7.1)}]Here [f \in R^{n^3 } ] represents the three-dimensional (3D) electron density of the imaged macromolecule and [d \in R^{n^2 } ] is the nth observed two-dimensional (2D) projection image. The total number of projection images N depends on the structure determination project, and can vary from a few hundred to hundreds of thousands. Further, e is the inverse Fourier transform of the envelope function, [{\bf{x}} = [ x \quad y ]^T ] is a vector of coordinates in the plane of projections, [{\bf{r}} = [ {r_x } \quad {r_y } \quad {r_z } \quad 1 ]^T ] is a vector of coordinates associated with nth macromolecule, [{\bf{T}}] is the 4 × 4 transformation matrix given by[{\bf{T}}( {{\bf{R}},{\bf{t}}} ) = \left[ {\matrix{ {\bf{R}} & {\bf{t}} \cr 0 & 1 \cr } } \right], \quad\left[ {\matrix{ {\bf{x}} \cr z \cr 1 \cr } } \right] = {\bf{Tr}}, \eqno(2.5.7.2)]with [{\bf{t}} = [ { t_x \quad t_y \quad 1} ]^T ] being the shift vector of translation of the object (and its projection) in the xy plane (translation in z is irrelevant due to the projection operation) and [{\bf{R}}( {\psi ,\theta ,\varphi } )] is the 3 × 3 rotation matrix specified by three Eulerian angles. As in Section 2.5.6[link], two of the angles define the direction of projection [{\boldtau}( {\theta ,\varphi } )], while the third angle [\psi ] results in rotation of the projection image in the plane of the formed image xy; changing this angle does not provide any additional information about the structure f. Both types of noise are assumed to be mutually uncorrelated and independent between projection images (i.e., [\left\langle {m_i^k m_j^l } \right\rangle_{i \ne j} =0]; k, l = S, B) and also uncorrelated with the signal ([\left\langle {d_i m_i^k } \right\rangle = 0]; k = S, B). Model (2.5.7.1)[link] is semi-empirical in that, unlike in the standard model, we have two contributions to the noise. Although in principle amorphous ice should not be affected by the CTF, so the term mS should be absorbed into mB, in practice the buffer in which the protein is purified is not pure water and it is possible to observe CTF effects by imaging frozen buffer alone. Moreover, if a thin support carbon is used, it will be a source of very strong CTF-affected noise also included in mB.

In Fourier space, (2.5.7.1)[link] is written by taking advantage of the central section theorem [equation (2.5.6.8)[link] of Section 2.5.6[link]]: the Fourier transform of a projection is extracted as a Fourier plane uv of a rotated Fourier transform of a 3D object:[D_n ( {\bf{u}}) = {\rm CTF}( {{\bf{u}}\semi\Delta f_n ,q} )E_n ( {\bf{u}})\left\{ {\left. {\left[ {F( {{\bf{Tv}}})} \right]} \right|_{u_z = 0} + M_n^{\rm S}( {\bf{u}} )} \right\} + M_n^{\rm B} ( {\bf{u}} ).\eqno(2.5.7.3)]The capital letters denote Fourier transforms of objects appearing in (2.5.7.1)[link] while CTF (a Fourier transform of s) depends, among other parameters that are set very accurately (such as the accelerating voltage of the microscope), on the defocus setting [\Delta f_n ] and the amplitude contrast ratio [0 \le q \,\lt\, 1] that reflects the presence of the amplitude contrast that is due to the removal of widely scattered electrons [the real term in (2.5.5.14)[link]]. For the range of frequency considered, q is assumed to be constant and the CTF is written in terms of the phase perturbation function [\chi ] [given by equation (2.5.2.33)[link]] as[\eqalignno{{\rm CTF}( {{\bf{u}}\semi \Delta f} ) &= \left[1 + 2q( q - 1 )\right]^{ - 1/2} \big\{ ( 1 - q )\sin \left[ \chi \left( \left| {\bf u} \right|\semi\Delta f \right) \right] &\cr &\quad - q\cos \left[ \chi \left( \left| {\bf u} \right|\semi\Delta f \right) \right] \big\} &\cr & = \sin \left\{ {\chi \left( {\left| {\bf{u}} \right|\semi\Delta f} \right) - \arctan[{{q /({1 - q})}} ]} \right\}, &\cr &&(2.5.7.4)}]where for simplicity we assumed no astigmatism. Finally, the rotationally averaged power spectrum of the observed image, calculated as the expectation value of its squared Fourier intensities (2.5.7.3)[link], is given by[P_d ( u ) = {\rm CTF}^2( u )\, E^2( u ) \left[ {P_f ( u ) + P_{\rm S} ( u )} \right] + P_{\rm B}( u ),\eqno(2.5.7.5)]where [u =|{\bf u}|] is the modulus of spatial frequency.

2.5.7.2. Structure determination in single-particle reconstruction

| top | pdf |

The goal of single-particle reconstruction is to determine the 3D electron-density map f of a biological macromolecule such that its projections agree in a least-squares sense with a large number of collected 2D electron-microscopy projection images, [d_n \in R^{n^2 } ](n = 1, 2, …, N), of isolated (single) particles with random and unknown orientations. Thus, we seek a least-squares solution to the problem stated by (2.5.7.1)[link] [or, equivalently, in Fourier space, to (2.5.7.3)[link]]. This is formally written as a nonlinear optimization problem (Yang et al., 2005[link]),[\eqalignno{\min \limits_{\psi _n ,\theta _n ,\varphi _n ,t_{x_n } ,t_{y_n } ,f,\Delta f_n ,q, \ldots } L(\psi _n ,\theta _n ,\varphi _n ,t_{x_n } ,t_{y_n } ,f,\Delta f_n ,q, \ldots ) &\cr \equiv{\textstyle {1 \over 2}}\textstyle\sum\limits_{n = 1}^N {\left\| {s_n ( {\bf{x}}) * e_n ( {\bf{x}} ) * \textstyle\int {f( {{\bf{T}}_n {\bf{r}}} )\,{\rm d}z} - d_n ( {\bf{x}} )} \right\|^2 } . &\cr&&(2.5.7.6)} ]The factor of ½ is included merely for convenience. The objective function in (2.5.7.6)[link] is clearly nonlinear due to the coupling between the orientation parameters [\psi _n ,\theta _n ,\varphi _n ,t_{x_n } ,t_{y_n } ] (n = 1, 2, …, N) and the 3D density f.

The parameters in (2.5.7.6)[link] to be determined can be separated into two groups. (1) The orientation parameters [\psi _n ,\theta _n ,\varphi _n ,t_{x_n } ,t_{y_n } ] that have to be determined entirely by solving (2.5.7.6)[link] and for which there are no initial guesses, and the structure f itself, for which we may or may not have an initial guess. The number of parameters in this group is very large: n3 + 5m. Note that in single-particle reconstruction, the number of projection data m is far greater than the linear size of the data in pixels, i.e., [m\gg n]. (2) Various parameters which we will broadly call the parameters of the image formation model (2.5.7.1)[link]–(2.5.7.4)[link]: the defocus settings of the microscope [\Delta f_n ], the amplitude contrast ratio q and, if analytical forms of the envelope function E, the power spectrum of the background noise M, or the structure F are adopted, the parameters of these equations. Some of the parameters in the second group are usually known very accurately or can be estimated from micrograph data before one attempts to solve (2.5.7.6)[link] (see Section 2.5.7.4[link]), but they can also be refined during the structure determination process [for the method for correcting the defocus settings, see Mouche et al. (2001[link])].

Owing to the very large number of parameters in (2.5.7.6)[link] and the nonlinearities present, one almost never attempts to solve the problem directly. Instead, structure determination using the single-particle technique involves several steps. (i) The macromolecular complex is prepared with a purity of at least 90%. (ii) The sample is flash-frozen in liquid ethane. Alternatively, cryo-negative stain techniques or traditional negative stain methods can be used. (iii) Pictures of the macromolecular complexes are taken. (iv) Exhaustive analysis of 2D particle images aimed at increasing the SNR of the data and evaluation of the homogeneity of the sample is performed. (v) An initial low-resolution model of the structure is established using either experimental techniques or computational methods. (vi) The initial structure is refined in order to increase the resolution using an enlarged data set. Only in this step does one attempt to minimize (2.5.7.6)[link] more or less directly. (vii) Visualization and interpretation of the resulting 3D electron-density map is the last step; it often involves docking of X-ray structures of molecules into EM density maps in order to reveal the arrangement of known molecules within the EM envelope (Fig. 2.5.7.1[link]). As within the weak-phase-object approximation of the image formation in EM the relation between densities in collected images and the 3D electron density of the imaged macromolecule is linear [(2.5.7.1)[link]], all data-processing methods employed in the structure determination project should be linear, so the densities in the cryo-EM 3D model can be interpreted in terms of the electron density of the protein.

[Figure 2.5.7.1]

Figure 2.5.7.1 | top | pdf |

Typical steps performed in a single-particle cryo-EM structure determination project.

In the actual single-particle project not all the steps have to be executed in the order outlined above. The technique has proved to be particularly useful in studies of functional complexes of proteins whose base state is known to a certain resolution or even of functional complexes whose atomic (X-ray crystallographic) structure is known. In these cases, steps (iv) and (v) can be omitted and the structure of the functional complex (for examples with ligands bound to it) can be relatively easily determined using the native structure as a starting point for step (vi).

In addition to difficulties with obtaining good cryo-EM data, the technique is computationally intensive. The reason is that in order to obtain a sufficient SNR in the 3D structure, processing of hundreds of thousands of EM projection images of the molecule might be necessary. For each, five orientation parameters have to be determined, and this is in addition to determination of the image-formation parameters required for the optimization of correlation searches. In effect, it is not unusual for single-particle projects to consume weeks of the computer time of multiprocessing clusters. This also explains why the knowledge of the base structure simplifies the work to a large degree: when it is known, initial values of the orientation parameters can be easily established, reducing not only the computational time, but also possibilities of errors in the structure-determination process.

2.5.7.3. Electron microscopy and data digitization

| top | pdf |

The electron microscope is a phase imaging system; i.e., in order to create contrast in images, they have to be underfocused. Owing to the particular form of the CTF of the microscope [(2.5.7.4)[link]], not only the amplitudes of the image in Fourier space are modified, but information in some ranges of spatial frequencies is set to zero and some phases have reversed sign. Therefore, in order to obtain possibly uniform coverage of Fourier space, the standard practice is to take pictures using different defocus settings and merge them computationally in order to fill gaps in Fourier space. The problem is compounded by the relation between underfocus and the envelope function of the microscope. Far-from-focus images have high contrast, but the envelope function has a relatively steep fall-off limiting the range of useful spatial frequencies. Conversely, close-to-focus images have little contrast, but the envelope function is decreasing, slowly extending useful information to high spatial frequencies. In effect, it is easier to process computationally far-from-focus data and to obtain accurate alignment of particles, but the results have severely limited resolution. Processing of close-to-focus data is challenging and results tend to be less accurate, but there is the potential to obtain high-resolution information.

The experimental techniques of initial structure determination (random conical tilt, tomography) require collection of tilt data. This is facilitated by dedicated microscope stages that can be rotated inside the microscope column yielding additional views of the same field. However, collection of high-quality tilt images is difficult. The quality of tilted images tends to be adversely affected by charging and drift effects. Moreover, as the stage is tilted the effective ice thickness increases (inversely proportionally to the cosine of the tilt angle, so at 60° the factor is two) and the contrast of the images decreases correspondingly. Finally, the defocus in tilted micrographs varies depending on the position in the field, often forcing users to restrict the particle selection only to regions in the vicinity of the tilt axis. However, tilting establishes geometrical relations between different projections of the same particle, unambiguously allowing for robust determination of an initial 3D model and the handedness of the quaternary structure of the complex.

Electron microscope images can be either recorded on the film and subsequently converted to digital format, or they can be recorded using a charge-coupled device (CCD) camera in a digital format directly on a microscope. In either case, it is necessary to select the magnification of the microscope and the eventual pixel size of the digitized data before the data-collection session. High magnification can potentially yield high-resolution data, but at the same time it decreases the yield of particles. Lower magnification values can be used when images are recorded on film, which does not attenuate high spatial frequencies to the same extent as CCD cameras tend to do.

The pixel size has to be adjusted according to the expected resolution of the final structure. Although it is tempting to adopt a small pixel size (in the hope of achieving high resolution of the results), in most cases this is counterproductive, as it results in very large computer files that are difficult to handle and in excessively long data-processing times. Theoretically, the optimum pixel size is tied to the maximum frequency present in the data by Shannon's sampling theorem, which states that no information is lost if the signal is sampled at twice the maximum frequency present in the signal, and no additional information is gained by sampling using higher frequency. Thus, if the expected resolution is 12 Å, it should be sufficient to use a pixel size (on the specimen scale) of 6 Å. In practice, various image-processing operations performed during alignment of the data and 3D reconstruction of the complex significantly lower the range of useful frequencies. This is because in currently available single-particle reconstruction software packages rather unsophisticated interpolation schemes are employed, which were selected mainly for the speed of calculations. Therefore, it is advisable to oversample the data by a factor of 1.5 or even 3.0. For an expected resolution of 12 Å this corresponds to pixel sizes of 4 and 2 Å, respectively.

The windowed particles have to be normalized to adjust the image densities to a common framework of reference. The reason for this step is that microscopy conditions are never exactly the same and also within the same micrograph field the background densities can vary by a significant margin due to uneven ice thickness and other factors. A sensible approach to normalization is to assume that the statistical distribution of noise in areas surrounding particles should be the same (Boisset et al., 1993[link]). Hence a large portion of one of the micrographs from the processed set is selected and a reference histogram of its pixel values is generated. Next, assuming a linear transformation of pixel values, the two parameters of this transformation are found in such a way that the histogram of the transformed pixel values surrounding the particle optimally matches the reference histogram using χ2 statistics as a discrepancy measure.

2.5.7.4. Assessment of the data quality and estimation of the image formation parameters

| top | pdf |

The initial assessment of the quality of the micrographs is usually performed during the data collection and in most cases before the micrographs are digitized. The micrographs are examined visually and those that have noticeable drift, astigmatisms, noticeable contamination or simply too low a number of particles to justify further analysis are simply discarded. After digitization of the accepted micrographs, the first step is estimation of the power spectrum, which will be examined for the presence of Thon rings (thus confirming that the micrograph is indeed usable) and astigmatism.

The method of averaged overlapping periodograms (Welch, 1967[link]) is commonly used in EM to calculate the power spectrum. It is designed to improve the statistical properties of the estimate by taking advantage of the fact that when K identically distributed independent measurements are averaged, the variance of the average is decreased with respect to the individual variance by the ratio 1/K. Thus, instead of calculating a periodogram (squared moduli of the discrete Fourier transform) of the entire micrograph field, one subdivides it into much smaller windows, calculates their periodograms and averages them. Typically, one would chose a window size of 512 × 512 pixels and an overlap of 50%, which will result in the reduction of the variance of the estimate to few percent with respect to the variance of the periodogram of the entire field (Fernandez et al., 1997[link]; Zhu et al., 1997[link]). Further reduction of the variance is achieved by rotational averaging of the 2D power-spectrum estimate. The resulting one-dimensional (1D) profile is finally used in the third step of our procedure.

For a set of micrographs the power spectra can be evaluated either visually or computationally in an automated fashion. Of main concern are the presence of Thon rings, the astigmatism and the extent to which Thon rings can be detected. Although in principle astigmatic data could be used in subsequent analysis (in fact, astigmatism could be considered advantageous, as particles from the same micrograph would contain complementary information in Fourier space), in practice they are discarded as currently there is no software that can process astigmatic data efficiently. The extent of Thon rings indicates the `resolution' of the data, i.e., the maximum frequency to which information in the data can be present.

A number of well established programs can assist the user in the calculation of power spectra and automated estimation of defocus and astigmatism (Huang et al., 2003[link]; Mindell & Grigorieff, 2003[link]; Sander et al., 2003[link]; Mallick et al., 2005[link]). Given the analytical form of the CTF [(2.5.7.4)[link]], the problem is solved by a robust fitting of the CTF parameters such that the analytical form of the CTF matches the power spectrum of the micrograph. Usually, the steps employed are: (1) robust estimation of the power spectrum; (2) calculation of the rotational average of the power spectrum; (3) subtraction from this rotational average of the slowly decreasing background [roughly corresponding to PB in (2.5.7.5)[link]]; (4) fitting of the defocus value [\Delta f_n ] using known settings of the microscope (voltage, spherical aberration constant, …) and usually assuming a constant and known value of the amplitude contrast ratio q (for cryo-EM data, q should be in the range 0.02–0.10); and (5) using the established defocus value [\Delta f_n ], analysis of the 2D power spectrum and fitting of the astigmatism amplitude and angle while refining the defocus. As long as the defocus value is not too small and there are at least two detectable zeros of the CTF, all available programs give very good and comparable results.

In some single-particle packages, the automated calculation of defocus is integrated with the estimation of additional characteristics of the image-formation parameters that are required for advanced application of a Wiener filter [(2.5.7.18)[link]] (Saad et al., 2001[link]; Huang et al., 2003[link]), i.e., the power spectra of two noise distributions PS and PB and the envelope function of the microscope [E_n ] for each micrograph. A possible approach is to select slowly varying functions and fit their parameters to match the estimates of PS, PB and Pd obtained from the data. Finally, it is necessary to have a description of the 1D rotationally averaged power spectrum of the complex Pf . One possibility is to carry out X-ray solution scattering experiments (Gabashvili et al., 2000[link]; Saad et al., 2001[link]) that yield a 1D power spectrum of the complex in solution. However, these experiments require large amounts of purified sample and the accuracy of the results in terms of the overall fall-off of the power spectrum can be disputed. For the purpose of cryo-EM, a simple approximation of the protein power spectrum by analytical functions is satisfactory.

2.5.7.5. 2D data analysis – particle picking

| top | pdf |

Depending on the properties of the imaged complex and the magnification used, a single micrograph can yield from a few to thousands of individual particle projections. The first step of the data processing is identification of particle projections in micrographs and their selection. The particles have to be windowed (boxed) using a window size exceeding the particle size by a 30–50% margin. Thus, for example, in order to determine the structure of a 550 kDa complex that has a diameter of ~120 Å to 12 Å resolution, it is appropriate to choose a pixel size of 3 Å and a window size of 60 pixels.

The selection of particles is a labour-intensive process; however, the quality of selected particle projections is a major factor in the subsequent steps of analysis and the inclusion of too many imperfect images may preclude successful determination of the 3D structure. There are three possible approaches: (1) manual selection; (2) semi-automated selection; (3) fully automated selection. In the early stages of analysis, particularly when little is known about the shape of the protein and the distribution of projection views, the manual approach is preferable. The researcher displays the micrograph on a computer screen (usually preprocessed by Fourier filtration and contrast-adjusted for better visibility of the protein) and interactively identifies locations of particle views. A trained and careful operator can yield much better results than automated approaches. The main risk is in inherent bias of a human operator – there is a tendency to focus on more familiar and more easily visible particle projections, omitting less frequently appearing orientations and in effect jeopardizing successful structure determination. In semi-automated approaches, an initial step in which putative particle projections in a micrograph are chosen is performed by a computer, all candidates are windowed and the user screens a gallery of possible particles instead of the full micrograph. Algorithms that perform the initial identification of particle views range from very simple (for example a band-pass filtration of a micrograph with subsequent selection of peaks that are no closer to each other than half of the expected particle size) to sophisticated nonlinear noise-suppression methods [for details on various algorithms see the Special Issue of the Journal of Structural Biology (Zhu et al., 2004[link])]. Since the human operator will be responsible for the ultimate decision, preference is given to the faster method. In most cases, semi-automated methods are implemented within a framework of a user-friendly graphical user interface that can greatly facilitate the work. Fully automated methods are currently actively under development but, curiously, even for proteins whose high-resolution structure is known, the success rate cannot match that of a human operator (Zhu et al., 2004[link]).

The automated procedures can be divided into three groups: (1) those that rely on ad hoc steps of denoising and contrast enhancement followed by the search for regions of known size that emerge above the background level (Adiga et al., 2004[link]); (2) those that extract orientation-independent statistical features from regions of the micrograph that may contain particles and proceed with classification (Lata et al., 1995[link]; Hall & Patwardhan, 2004[link]); and (3) those that employ templates, i.e., either class averages of particles selected from micrographs or projections of a known 3D structure of the complex (Huang & Penczek, 2004[link]; Sigworth, 2004[link]).

The advantage of the first two approaches is that they do not require template images, i.e., since they are based on a very broadly defined description of particles (general size, shape or abstract features derived from examples of typical particles), they are applicable in cases when no 3D structure of the complex is available. Methods from the second category usually require a training session for the algorithm to construct a set of weights for the predefined features. The methods that take advantage of the availability of templates vary greatly in complexity from straightforward cross-correlation with a generic shape (a Gaussian function, a low-passed circle) (Frank & Wagenknecht, 1984[link]) to matched filters with large number of templates and parameters derived from the image-formation model of the micrograph (CTF and envelope functions) (Huang & Penczek, 2004[link]; Sigworth, 2004[link]). The motivation is clear: given an ideal object and image-formation parameters, it should be only a matter of sheer computer power and user's patience to have all particles matching the template selected. In practice, the problem is much more challenging and the success rate of template-based methods does not necessarily exceed the success rate of carefully tuned ad hoc methods.

One of the difficulties with the application of correlation techniques to the particle-picking problem is the unevenness of micrographs, which is caused by uneven illumination by the electron beam and, to a much larger degree, by the uneven thickness of the ice layer and, when used, the supporting carbon. A possible remedy is to calculate a `locally normalized' cross-correlation function, in which the total variance of the micrograph is replaced by the local variance of the micrograph calculated within a window of n pixels centred on the current location l. This method has a fast implementation in Fourier space (van Heel, 1982[link]; Roseman, 2003[link]). A faster method is to just apply a high-pass filtration of the micrograph using a high-pass Gaussian Fourier filter with a half-width (1/np) Å−1, where p is the pixel size. This simple step will all but eliminate the unevenness of the micrograph background.

The main difficulty with the correlation technique is the computational complexity of the problem arising from the very large number of templates that have to be considered. The particles in the micrograph are projections of a 3D object with arbitrary in-plane rotations. In effect, to perform an exhaustive search, it is necessary to sample quasi-uniformly three Eulerian angles [equation (2.5.7.17)[link] with [\Delta \psi = \delta \theta ]]. For example, a very crude angular step of [\delta \theta = 10^ \circ ] results in ~13 000 2D templates! A reduction in the number of templates can be achieved either using clustering techniques (Huang & Penczek, 2004[link]; Wong et al., 2004[link]) or by exploring the eigenstructure of the whole set of templates (Sigworth, 2004[link]).

2.5.7.6. 2D alignment of EM images

| top | pdf |

Alignment of pairs of 2D images is a fundamental step in single-particle reconstruction. It is aimed at bringing into register various particle projections by determining three orientation parameters (rotation angles and x and y translations) and is employed in 2D alignment of large sets of 2D noisy data and in 3D structure-refinement algorithms. The computational efficiency and numerical accuracy of this step are deciding factors in achieving high-quality structural results in an acceptable time.

All 2D alignment methods considered are aimed at finding transformation parameters such that the least-squares discrepancy between two images f and g is minimized,[\textstyle\int {\left| {f( {\bf{x}} ) - g( {{\bf{Tx}}} )} \right|} ^2 \,{\rm d}{\bf{x}} \to \min, \eqno(2.5.7.7)]where [{\bf{x}} = \left[x \quad y \quad 1 \right]^T ] is a vector containing the coordinates. T is the transformation matrix given by[{\bf{T}}(\alpha ,x,y) = \left[ {\matrix{ {\cos \alpha } & { - \sin \alpha } & {t_x } \cr {\sin \alpha } & {\cos \alpha } & {t_y } \cr 0 & 0 & 1 \cr } } \right],\eqno(2.5.7.8)]and is dependent on three transformation parameters: rotation angle [\alpha ] and two translations tx and ty . It has to be noted that a minimum of (2.5.7.7)[link] can be found rapidly using the fast Fourier transform (FFT) algorithm if only the xy translation is sought (2D FFT), or if only the rotation angle is needed (1D FFT).

2D alignment methods can be divided into three classes: (1) those that employ exhaustive searches in order to find three orientation parameters; (2) those that perform exhaustive searches by using either simplifications (separate searches for translation and rotation parameters) (Penczek et al., 1992[link]) or by taking advantage of invariant image representations (Schatz & van Heel, 1990[link]; Frank et al., 1992[link] and the following discussion; Schatz & van Heel, 1992[link]; Marabini & Carazo, 1996[link]); or finally (3) those that are aimed at improvement of previously determined parameters and employ local searches.

In practice, as the windowed particles are approximately centred, the search for translation parameters can be restricted to relatively small values. A very efficient algorithm that takes advantage of the geometry is based on resampling to polar coordinates of the area of the image that roughly corresponds to the particle size. The resampling is done around centres placed on pixels located within a distance from the image centre that corresponds to a preset maximum translation (Joyeux & Penczek, 2002[link]) (Fig. 2.5.7.2[link]). For each translation, a 1D rotational cross-correlation function in polar coordinates is calculated. Overall, the alignment method based on resampling to polar coordinates comprises the following steps: (1) the image is resampled to polar coordinates; (2) 1D FFTs of various lengths are calculated, appropriately weighted and padded with zeros to equalize their lengths; (3) complex multiplications with 1D Fourier transforms of the similarly processed referenced image are calculated; (4) the inverse 1D FFT is calculated and the position of the maximum is found. The last step yields the rotation angle. Steps (1)–(4) are repeated with the image that is being aligned shifted to account for translations. In addition, the rotation angle for one of the images being mirrored is efficiently calculated in parallel with step (3) by repeating the multiplication with the 1D Fourier transforms of the reference image complex conjugated. This additional check is a necessity in the analysis of single-particle data sets, as usually one can expect on average half of the images to be mirrored versions of the other half in the data set. Overall, the method is very accurate, because only data under the circular mask enter the calculation.

[Figure 2.5.7.2]

Figure 2.5.7.2 | top | pdf |

The geometrical constraints of the 2D alignment problem. (a) The reference 2D particle is placed within a square image frame n × n pixels and its size is such that it can be bounded by a circle with a radius r no larger than 0.9n. (b) The particle projection, the size of which is bounded by the same radius as the reference view, can be located within a circle centred on discrete locations within the image frame, such that the maximum translation is k = (n/2) − r. The number of possible translations is (2k + 1)2. Reprinted from Joyeux & Penczek (2002)[link] with permission from Elsevier.

For a set of N images containing the same object in various orientations and corrupted by an additive noise, the problem of alignment would be relatively simple. For proteins that have strong preferred orientation and particularly when a staining technique is used for grid preparation, this is certainly the case. In the procedure called reference-based alignment, one of the images that appears `typical' is selected and used as a reference to align the remaining images. After all available images are aligned their average is calculated and used as a reference in a repeated alignment of all images. The process is iterated until the orientations of the images stabilize (Frank et al., 1982[link]).

More formally, Frank et al. (1988[link]) proposed the definition of a set of N images fk , k = 1, …, N, aligned if a set of transformations Tk, k = 1, …, N, (rotation angles and translations) is found such that all pairs of images are mutually brought into register, so the expression[\eqalignno{L_1( {\{ f \},\{ {\bf{T}} \}} ) &= \textstyle\sum\limits_{k = 1}^{N - 1} {\textstyle\sum\limits_{l = k + 1}^N {\left\| {f_k ( {{\bf{T}}_k {\bf{x}}} ) - f_l ( {{\bf{T}}_l {\bf{x}}} )} \right\|^2 } } &\cr &= \textstyle\sum\limits_{k = 1}^{N - 1} {\textstyle\sum\limits_{l = k + 1}^N {\left( {\left\| {f_k ( {{\bf{T}}_k {\bf{x}}} )} \right\|^2 + \left\| {f_l ( {{\bf{T}}_l {\bf{x}}} )} \right\|^2 - 2f_k ( {{\bf{T}}_k {\bf{x}}} )f_l ( {{\bf{T}}_l {\bf{x}}})} \right)} } &\cr &&(2.5.7.9)} ]is minimized. Although there is no simple way to minimize [L_1 ], the interesting observation is that there is no requirement of the images to represent the same particle, not even a similar one. This leads to the conclusion that if the minimum of [L_1 ] could be found, a set of diverse images could be aligned; moreover, upon alignment similar images would have similar orientation and subsequent classification of such an aligned data set would reveal subsets of similar images.

A practical method of minimizing, called a reference-free alignment, was proposed by Penczek et al. (1992[link]) by showing that minimization of [L_1 ] is equivalent to maximization of[L_2 ( {\{ f \},\{ {\bf{T}} \}} ) = \textstyle\sum\limits_{k = 1}^{N - 1} {\left\| {f_k ( {{\bf{T}}_k {\bf{x}}} ) - \left\langle f \right\rangle _k } \right\|^2 } ,\eqno(2.5.7.10)]where[\left\langle f \right\rangle _k = {1 \over {N - 1}}\displaystyle\sum\limits_{l = 1, l \ne k}^N {f_l ( {{\bf{T}}_l {\bf{x}}} )} \eqno(2.5.7.11)]is the partial average of the set of images calculated with the exclusion of the kth image. The method is based on the observation that given a set of approximately aligned images, it should be possible to minimize L2 by sequentially correcting alignments of individual images using the cross-correlation function between each image and the average of the remaining ones. On each step, depending whether the orientation of the image changes or not, (2.5.7.10)[link] will decrease or remain constant.

The outcome of the reference-free alignment algorithm is an aligned set of N images, so all particles that have similar shapes will have similar orientations. Thus, it is natural (and because of the alignment possible) to divide the data set into classes of images that have similar shapes and orientations, i.e., to cluster them. A number of well known clustering algorithms have been adopted for EM applications (Frank, 1990[link]). The general purpose of clustering is to organize objects (in the case of EM, images) into classes whose members are similar to each other, while dissimilar to objects from other classes.

Reference-free alignment with subsequent clustering works well as long as all particles share the same overall shape (i.e., the very low frequency component), as is the case for ribosomes. However, some molecules yield projections that have quite different shapes, as for example is the case for barrel-like proteins GroEL (Roseman et al., 1996[link]) with rectangular views and circular end views or flat and rectangular hemocyanin (Boisset et al., 1995[link]). In this case, the reference-free alignment tends to be unstable, as (2.5.7.10)[link] has multiple local minima, which in practice means that the global average of the whole data set can vary significantly depending on the initiation of the procedure. In general, reference-free alignment is an `alignment first, classification second' approach. It is possible to reverse this order by using invariants with the supporting rationale that once approximately homogeneous classes of images were found, it should be easy to align them subsequently as within each class they will share the same motif.

A practical approach to reference-free alignment known as alignment by classification (Dube et al., 1993[link]) is based on the observation that for a very large data set and centred particles one can expect that although the in-plane rotation is arbitrary, there is a high chance that at least some of the similar images will be in the same rotational orientation. Therefore, in this approach the images are first (approximately) centred, then subjected to classification, and subsequently aligned.

In its simplest form, the multireference alignment belongs to the class of supervised classification methods: given a set of templates (i.e., reference images; these can be selected unprocessed particle projections, or class averages that resulted from preceding analysis, or projections of a previously determined EM structure, or projections of an X-ray crystallographic structure), each of the images from the available data sets is compared (using a selected discrepancy measure) with all templates and assigned to the class represented by the most similar one. Equally often multireference alignment is understood as a form of unsupervised classification, more precisely K-means classification, even if the description is not formalized in terms of the latter. Given a number of initial 2D templates, the images are compared with all templates and assigned to the most similar one. New templates are calculated by averaging images assigned to their predecessors and the whole procedure is repeated until a stable solution is reached.

2.5.7.7. Initial determination of 3D structure using tilt experiments

| top | pdf |

The 2D analysis of projection images provides insight into the behaviour of the protein on the grid in terms of the structural consistency and the number and shape of projection images. In order to obtain 3D information, it is necessary to find geometrical relations between different observed 2D images. The most robust and historically the earliest approach is based on tilt experiments. By tilting the stage in the microscope and acquiring additional pictures of the same area of the grid it is possible to collect projection images of the same molecule with some of the required Eulerian angles determined accurately by the setting of the goniometer of the microscope.

In random conical tilt (RCT) reconstruction (Radermacher et al., 1987[link]), two micrographs of the same specimen area are collected: the first one is recorded at a tilt angle of ~50° while the second one is recorded at 0° (Fig. 2.5.7.3[link]). If particles have preferred orientation on the support carbon film (or within the amorphous ice layer, if no carbon support is used), the projections of particles in the tilted micrographs form a conical tilt series. Since in-plane rotations of particles are random, the azimuthal angles of the projections of tilted particles are also randomly distributed; hence the name of the method. The untilted image is required for two reasons: (i) the particle projections from the untilted image are classified, thus a subset corresponding to possibly identical images can be selected ensuring that the projections originated from similar and similarly oriented structures; and (ii) the in-plane rotation angle found during alignment corresponds to the azimuthal angles in three dimensions (one of the three Eulerian angles needed). The second Eulerian angle, the tilt, is either taken from the microscope setting of the goniometer or calculated based on geometrical relations between tilted and untilted micrographs. The third Eulerian angle corresponds to the angle of the tilt axis of the microscope stage and is also calculated using the geometrical relations between two micrographs. In addition, it is necessary to centre the particle projections selected from tilted micrographs; although various correlation-based schemes have been proposed, the problem is difficult as the tilt data tend to be very noisy and have very low contrast.

[Figure 2.5.7.3]

Figure 2.5.7.3 | top | pdf |

Principle of random conical tilt reconstruction. A tilt pair of images of the same grid area is collected. By aligning the particle images in the untilted micrograph (left), the Eulerian angles of their counterparts in the tilted micrograph (right) are established. The particle images from the tilted micrograph are used for 3D reconstruction of the molecule (bottom). The set of projections form a cone in Fourier space; information within the cone remains undetermined.

Given three Eulerian angles and centred tilted projections, a 3D reconstruction is calculated. There are numerous advantages of the RCT method. (i) Assuming the sign of the tilt angle is read correctly (it can be confirmed by analysing the defocus gradient in the tilted micrographs), the method yields a correct hand of the structure. (ii) With the exception of the in-plane rotation of untilted projections, which can be found relatively easily using alignment procedures, the remaining parameters are determined by the experimental settings. Even if they are not extremely accurate, the possibility of a gross error is eliminated, which positively distinguishes the method from the ab initio computational approaches that use only untilted data. (iii) The computational analysis is entirely done using the untilted data, which have high contrast. (iv) The RCT method is often the only method of obtaining 3D information if the molecule has strongly preferential orientation and only one view is observed in untilted micrographs. The main disadvantage is that the conical projection series leaves a significant portion of the Fourier space undetermined. This follows from the central section theorem [equation (2.5.6.8)[link] of Section 2.5.6[link]]: as the tilt angle is less than 90°, the undetermined region can be thought to form a cone in three dimensions and is referred to as the missing cone. The problem can be overcome if the molecule has more than one preferred orientation. Subsets of particles that have similar untilted appearance (as determined by clustering) are processed independently and for each a separate 3D structure is calculated. If the preferred orientations are sufficiently different, i.e., the orientations of the original particles in three dimensions are sufficiently different in terms of their angles with respect to the z axis, the 3D structures can be aligned and merged, all but eliminating the problem of the missing cone and yielding a robust, if resolution-limited, initial model of the molecule (Penczek et al., 1994[link]). It should be noted that RCT by itself almost never results in a high-resolution 3D model of the molecule. This is due to a variety of reasons, the main ones being the already mentioned poor quality of high-tilt data and difficulties with the collection of large numbers of high-quality tilted micrographs (they are often marred by drift).

In cases when the molecule does not have well defined preferred orientations, it is possible to use electron tomography to obtain the initial model. In this method, a single-axis tilt series of projection images of the same specimen area is collected using an angular step of ~2° and a maximum tilt angle not exceeding 60° (Crowther, DeRosier & Klug, 1970[link]). The single-axis tilt data-collection geometry yields worse coverage of the Fourier space than the RCT method, leaving missing wedges uncovered (Penczek & Frank, 2006[link]). This results in severe artifacts in real space, which make smaller objects virtually unrecognizable. The situation can be largely rectified using so-called double-axis tomography, in which a second single-axis tilt series of data are collected after rotating the specimen grid in-plane by 90° (Penczek et al., 1995[link]). This reduces the undetermined region to a missing pyramid and makes the resolution almost isotropic in the xy plane.

The tomographic projection data have to be aligned. This is done using either correlation techniques that enforce pairwise alignment of images (Frank & Mcewen, 1992[link]) or by taking advantage of fiducial markers and enforcing their consistency with respect to a 3D model (Lawrence, 1992[link]; Penczek et al., 1995[link]). In the application to single-particle work, it is possible to use locations of protein in the micrographs as markers. After the 3D reconstruction is calculated, regions collecting individual molecules are windowed from the volume and all molecules are aligned in three dimensions (Walz et al., 1997[link]). While generally robust, the procedure is labour- and computer-intensive. Unlike RCT, where only two exposures of the same field are required, electron tomography may require over one hundred images, raising serious concerns about radiation damage. Moreover, most of the data have to be collected at high tilt angle, thus are of lower quality. Particularly troublesome is alignment of 3D molecules deteriorated by the missing wedge/pyramid artifacts, with the directions of artifacts different for each object. However, when successful, electron tomography yields a very good initial model of the molecule, free from missing-Fourier-space-related artifacts and with defined handedness.

2.5.7.8. Ab initio 3D structure determination using computational methods

| top | pdf |

The experiment-based methods of initial 3D structure determination (RCT and electron tomography) are quite powerful, but rather challenging to employ in practice. Particularly frustrating is the fact that a large volume of difficult-to-record tilt data have to be collected, even though they cannot be used for subsequent high-resolution work. Therefore, whenever possible, preference is given to computational methods in which 3D geometrical relations between particle projections are established using various mathematical approaches using only untilted data.

The most straightforward approach and historically the earliest is based on the central section theorem [equation (2.5.6.8)[link] of Section 2.5.6[link]]: because Fourier transforms of 2D projections of a 3D object are the central section of the 3D Fourier transform of this object, it is a straightforward consequence that Fourier transforms of any two projections intersect along a line, henceforth called a common line. (Two trivial exceptions are the case of projections in the same direction, in which their Fourier transforms coincide with possible differences in in-plane rotation, and the case of projections in opposite directions, in which they are mirror versions of each other.) This fact was originally used by Crowther, DeRosier & Klug (1970[link]) to solve the structure of viruses with icosahedral (60-fold) symmetry. In this case, the Fourier transform of each projection intersects itself (or rather the symmetry-related copies of itself) 37 times with the exception of degenerate cases of projections in directions of one of three symmetry axes, in which cases the number of common lines is less. Thus, it is possible to find the orientation of a single projection with respect to the chosen system of symmetry axes.

For asymmetric objects a set of three projections that do not intersect along the same line (which would correspond to the single-axis tilt geometry) uniquely determine their respective orientations (with the exception of the overall rotation, which remains arbitrary, and the handedness of the solution, which remains undetermined). Indeed, three projections span three common lines, and each common line yields two angles: for each of the intersecting sections it is the angle between the x axis in the system of coordinates of this section and the common line in the plane of this section. Thus, we have a total of six angles. At the same time, by arbitrarily setting the orientation of the first projection in 3D space to three Eulerian angles equal to zero (or the corresponding rotation matrix R1 = I), we need to determine two rotation matrices R2 and R3 (or two sets of three Eulerian angles) for the remaining two projections, respectively. So, given six in-plane angles we have to find a solution for six Eulerian angles. Let the angle of the common line between the ith and jth projection in the plane of the ith projection be [\alpha _{ij} ] and the corresponding unit vector in the plane of ith projection be[{\bf{n}}_{ij} = [\matrix{ {\cos \alpha _{ij} } & { - \sin \alpha _{ij} } & 0 \cr } ]^T,\quad i,j = 1,2,3,\quad i \ne j,\eqno(2.5.7.12)]where we added the third coordinate for convenience. The orientations of unit vectors nij in 3D space have to be related by rotation matrices; for example, vector n21 (the direction of the common line between the first and second projection in the plane of the second projection) should coincide with vector n12 (the direction of the same common line, but in the plane of the first projection) upon rotation by (unknown) matrix R2. All possible relations are[\eqalignno{{\bf{R}}_2 {\bf{n}}_{21} &= {\bf{n}}_{12}&\cr {\bf{R}}_2 {\bf{n}}_{23} &= {\bf{R}}_3 {\bf{n}}_{32}&\cr {\bf{R}}_3 {\bf{n}}_{31} &= {\bf{n}}_{13}.&(2.5.7.13)}]Equations (2.5.7.13)[link] have two solutions corresponding to two different hands of the molecule [for details see Farrow & Ottensmeyer (1992[link])].

The common-lines method works very well in the absence of noise. However, even a modest amount of noise can yield quite erroneous results or no results at all. The reason is that the solution of (2.5.7.13)[link] is highly nonlinear with respect to the six given angles [\alpha _{ij} ] and small errors in the location of peaks in cross-correlation functions can lead to quite large discrepancies from the correct solution. The main difficulty with the application of the common-lines method is that the analytical solution in the form of (2.5.7.13)[link] exists only for three projections (Goncharov et al., 1987[link]).

As a working approach to ab initio structure determination, the common-lines method has been implemented under the name of angular reconstitution in IMAGIC (van Heel, 1987a[link]; van Heel et al., 1996[link]). In order to reduce sensitivity to noise, the method is applied not to individual projection images, but to class averages resulting from the multireference alignment of input data (van Heel et al., 2000[link]). In order to overcome the problem of the lack of solution for the larger than three number of projections, the user has to begin with selection of three judiciously chosen class averages, obtain the solution using (2.5.7.13)[link] and subsequently include (angle) additional class averages using a brute-force approach, in which the Eulerian angles of the new projection are calculated using a similarity measure based on common lines with the already-angled set serving as a reference.

Some of the disadvantages of the angular reconstitution were addressed in the common-lines-based method for determining orientations for [N\,\gt\,3] particle projections simultaneously (Penczek et al., 1996[link]). In this method, the problem is formulated in terms of minimization of the variance of the 3D structure, as expressed in terms of common-lines discrepancy between N projections. In a sense, the design of the method is the exact opposite of the `standard' common-lines approach: instead of trying to the determine the Eulerian angles (rotation matrices Ri) based on angles [\alpha _{ij} ] of common lines in the planes of the projections, one assumes that rotation matrices Ri are known, finds the set of angles [\alpha _{ij} ] of common lines and computes the overall discrepancy along these lines. For a pair of projections i and j, the in-plane angles of common lines are found by solving the system of equations[{\bf{R}}_i {\bf{n}}_{ij} = {\bf{R}}_j {\bf{n}}_{ji} \eqno(2.5.7.14)]for [\alpha _{ij} ] and [\alpha _{ji} ]. The discrepancy minimized in the method is the variance of the 3D structure that, by analogy to the 2D case (2.5.7.10)[link] and (2.5.7.11)[link] is[\eqalignno{L_{\rm cl} \left( {\{ F\},\{ {\bf{R}} \}} \right) &= \textstyle\sum\limits_{m = 0}^M {\textstyle\sum\limits_{k = 1}^N {\left\| {F_k ( {u_m ,\alpha \semi{\bf{R}}_k } ) - \left\langle F \right\rangle _k } \right\|^2 u_m^2 \Delta u\Delta \Omega _{kl} } },&\cr\left\langle F \right\rangle _k &= {1 \over {N - 1}}\displaystyle\sum\limits_{l = 1 , l \ne k } ^N {F_l ( {u_m ,\alpha \semi{\bf{R}}_l } )}, &(2.5.7.15)}]where [L_{\rm cl} ] is written in Fourier 3D polar coordinates and [F_k ( {u_m ,\alpha ;{\bf{R}}_k })] is the Fourier transform of the kth projection in 2D polar coordinates [( {u_m ,\alpha })] with the orientation in 3D Fourier space given by the rotation matrix Rk. All Fourier planes Fk are considered to have zero thickness, so all discrepancies are calculated only along common lines and the `partial average' [\left\langle F \right\rangle _k ] is in fact an arrangement of N − 1 Fourier planes in 3D space. An approximation to [u_m^2 \Delta u\Delta \Omega _{kl} ] is calculated by equating the values of [\Delta \Omega _{kl} ] to the areas of the Voronoi diagram cells constructed on a unit sphere for points of intersection of common lines with this unit sphere (see Section 2.5.6.6[link]). Generally the method performs very well, particularly if the projection images cover 3D angular space evenly.

Some macromolecules, particularly those that have an elongated barrel-like shape, will have a strongly preferential orientation with respect to the direction of electron beam showing only what are often called `side views', i.e., projections perpendicular to rotation along one axis corresponding to single-axis tilt data-collection geometry. These orthoaxial projections form single-axis tilt reconstruction geometry. In this case, Fourier transforms of all projections share only one common line, the line coinciding with the rotation axis, and clearly the common-lines-based method is not applicable to the ab initio structure determination. To cope with this situation, a method termed Sidewinder was developed (Pullan et al., 2006[link]). It is based on the observation that a Fourier transform of a finite object with a diameter D can be considered to have a nonzero thickness 1/D (Fig. 2.5.6.5[link]). Thus, if the angle between two central sections of the 3D Fourier transform of this object, as derived from 2D Fourier transforms of its projections, is not too large, then these two sections will share information in Fourier space that is proportional to the amount of overlap of the two `slabs' in Fourier space. Using this observation, the general idea employed in Sidewinder is to calculate pairwise cross-correlation coefficients (CCCs) between class averages of side views and to use this information to deduce the values of the azimuthal Eulerian angles using the Monte Carlo minimization method (Fishman, 1995[link]).

For structures that have reasonably high symmetry and for those for which it is possible to collect high-quality EM data, it is sometimes possible to determine the initial structure using the 3D projection alignment method, which will be described in the next section. However, the approach is extremely computationally intensive and it is virtually impossible to try the method repeatedly to verify that the approach converges to more-or-less the same 3D structure, as is recommended for other ab initio methods described in this section. When the method is successful it is quite powerful, as an intermediate resolution structure can be obtained without going through intermediate and quite laborious steps of analysis of the data. A word of caution is warranted: with the direct method, unless there is external evidence that the obtained structure is correct, it is possible to obtain a self-consistent but entirely incorrect model of the molecule.

In the absence of reliable objective measures of the correctness of the structure, one can apply common sense in order to spot definitely improbable 3D maps. Given the mass of the complex it is possible to calculate the corresponding volume, and thus the threshold at which the map should be examined (Section 2.5.7.11[link]). If at this threshold the mass density is discontinuous or there are pieces of mass surrounding the structures, the map is most likely to be incorrect. Similarly, strong directional artifacts appearing as streaks permeating the structure indicate that either the collected projection images are dominated by one or two views of the structure or that the angular assignment is incorrect. In addition, the 3D map should be centred in the window box; although the centring is not strictly speaking a mathematical requirement for a successful reconstruction, all single-particle structure-determination software packages take advantage of the fact that for centred objects orientation searches are easier to perform. So, if the map is not centred it is a clear indication of the failure of the procedure. Finally, for symmetric structures there should be no large pieces of mass on the symmetry axes.

2.5.7.9. Refinement of a 3D structure

| top | pdf |

Given an initial low-resolution model of the 3D structure and the data set of 2D projection images of the complex that have Fourier-space information extending beyond the resolution of the model, it is possible to refine the structure such that the full extent of the resolution information in the data will be utilized. In some cases, it is also possible to use as an initial structure in the refinement procedure a structure of a homologous protein, thus avoiding the process of ab initio structure determination altogether. The goal of the refinement is to find such orientation parameters for each of the particle projections for which (2.5.7.6)[link] is minimized. There exist various implementations of the structure-refinement strategy and they can be roughly divided into those that perform exhaustive searches for all five orientation parameters (two translations and three Eulerian angles per 2D projection image) and those that perform local searches, usually by employing gradient information. Finally, the strategies may differ in how the correction for the CTF is implemented.

The original 3D projection-matching strategy (Penczek et al., 1994[link]) is based on the observation that given an ideal structure f and the necessary parameters of the CTF and image-formation model, it is straightforward to find five orientation parameters for each projection image. One begins with the determination of the sufficient angular step: assuming the structure is properly sampled at the Nyquist frequency and has a real-space radius of r voxels, the angular step is given by[\delta \theta \cong \arctan(1/r).\eqno(2.5.7.16)]Next, keeping in mind that projection directions are parametrized by two Eulerian angles [( {\varphi ,\theta })], one generates a set of projection directions quasi-uniformly distributed over half a unit sphere (or, in the case of a symmetric structure, over an asymmetric subunit) by taking fixed steps along the altitude or tilt angle [\theta] and a number of samples azimuthally in proportion to [\sin \theta ] (Penczek et al., 1994[link]). So, for a chosen constant increment [\delta \theta ] and given [\theta ] angle the increment of the [\varphi ] angle varies according to[\Delta \varphi = {{\delta \theta }/{\left| {\sin \theta } \right|}}.\eqno(2.5.7.17)]If all three Eulerian angles are to be sampled, as is necessary in some applications, then [\psi ] is sampled uniformly in steps of [\delta \theta ].

In order to find the orientation parameters of projection images, one step of projection matching is performed. The reference structure is projected in all directions given by (2.5.7.17)[link], yielding a set of reference images. Next, for each projection image, 2D cross-correlation functions with all reference images are calculated using one of the methods described in Section 2.5.7.6[link] and the overall maximum yields the translation, the in-plane rotation angle, the number of the most similar reference image (thus the remaining two Eulerian angles) and information about whether the image should be mirrored. Given this, a new 3D structure can be calculated using a 3D reconstruction algorithm (see Section 2.5.6[link]). This simple protocol constitutes the core of 3D projection alignment (Fig. 2.5.7.4[link]).

[Figure 2.5.7.4]

Figure 2.5.7.4 | top | pdf |

Schematic of the 3D projection-alignment procedure.

In a simple implementation of the 3D projection-matching procedure, all projection data are assembled into defocus groups, i.e., groups of projection images that have similar defocus settings (Frank et al., 2000[link]). During refinement, for each defocus group the reference volume is multiplied by the CTF with the appropriate defocus value, one step of projection matching is performed and a refined structure is reconstructed for this group (Fig. 2.5.7.5[link]). In addition, the within-group resolution is estimated using the Fourier shell correlation (FSC) approach (2.5.7.19)[link] applied to two volumes calculated from two subsets of projection images randomly split into halves. After all defocus groups have been processed, the individual refined volumes are merged in Fourier space with a CTF correction using Wiener-filter methodology (Penczek et al., 1997[link]),[F_{\rm merged} = {{\textstyle\sum_k {{\rm CTF}_k {\rm SSNR}_k F_k } } \over {\textstyle\sum_k {{\rm CTF}_k^2 {\rm SSNR}_k } + 1}},\eqno(2.5.7.18)]where [{\rm SSNR}_k ] is the spectral signal-to-noise ratio estimated for each defocus group using (2.5.7.22)[link]. Subsequently, the resolution of the merged volume is estimated by merging the half-volumes into two half-merged volumes using (2.5.7.18)[link] and comparing them using (2.5.7.19)[link]. Next, the merged volume is filtered using (2.5.7.25)[link] and the structure is centred so that its centre of mass is placed at the centre of the volume in which it is embedded.

[Figure 2.5.7.5]

Figure 2.5.7.5 | top | pdf |

Schematic of 3D projection alignment with CTF correction performed on the level of 3D maps reconstructed from projection images sorted into groups that share similar defocus settings.

The 3D projection-matching approach works very well during the initial stages of the refinement as it constitutes a very efficient approach to an exhaustive search for orientation parameters of all projection data. Once the orientation parameters are known to a degree of accuracy, it is straightforward to modify the procedure such that only subsets of reference projections are generated at a time and projection images are compared only with reference projections within a specified angular distance from their angular direction established during previous iteration. This modification speeds up the procedure significantly and makes it possible to refine structures to very high resolution by using a very small angular step [\delta \theta ]. Another possible modification is to introduce an additional step of 2D alignment of the projection data that share the same angular direction (Ludtke et al., 1999[link]). The advantage is that this can correct possible errors of alignment to the projection of a limited-resolution reference structure and also, to an extent, reduces the danger of bias from artifacts in the reference structure. Finally, it is also possible to incorporate into the refinement strategy a correction for the envelope function of the microscope (Ludtke et al., 1999[link]). The 3D projection-matching strategy is widely popular and most EM software packages have implementations of various versions of basic strategies, as outlined above (Frank et al., 1996[link]; Ludtke et al., 1999[link]; Hohn et al., 2007[link]).

A possible improvement over the 3D projection-matching procedure can be achieved by working in transformed spaces in which the distinction between orientation search and 3D reconstruction is removed: (1) spherical harmonics (Provencher & Vogel, 1988[link]; Vogel & Provencher, 1988[link]), which have found applications exclusively in the determination of icosahedral structures (Yin et al., 2001[link], 2003[link]); (2) Radon transform (Radermacher, 1994[link]), with selected applications in the determination of asymmetric particles (Ruiz et al., 2003[link]); or (3) Fourier transform, implemented in the FREALIGN package (Grigorieff, 2007[link]). In FREALIGN, the transformation between the arbitrarily oriented Fourier 2D central section and the 3D Fourier Euclidean grid is implemented using trilinear interpolation that includes ad hoc correction for the CTF effects. In high-resolution structure-refinement mode, the program uses a gradient-based Powell optimization algorithm (Powell, 1973[link]), thus overcoming the main deficiency of 3D projection-matching algorithms.

A unified approach to direct minimization of (2.5.7.6)[link] was proposed by Yang et al. (2005[link]) and is implemented in the SPARX package as the YNP method (Hohn et al., 2007[link]). The premise of the YNP method is that the orientation parameters are approximately known (thus the initial 3D map) and both the orientation parameters and the density map are updated simultaneously in a gradient-based optimization scheme. In the YNP method, the derivatives with respect to the density distribution are calculated analytically and the derivatives with respect to orientation parameters are calculated using finite difference approximations. The YNP method is very efficient and its major advantage is that it avoids many problems associated with approximate solutions inherent in methods that work in transform spaces. The projection/backprojection operations are carried out rapidly using linear interpolation, which due to sufficient oversampling of the data does not have a significant adverse impact on the solution. Moreover, because the density map f is updated simultaneously with the orientation parameters, the computationally demanding separate step of 3D reconstruction is eliminated.

2.5.7.10. Resolution estimation and analysis of errors in single-particle reconstruction

| top | pdf |

The development of resolution measures in EM was greatly influenced by earlier work in X-ray crystallography. In EM, the problem is somewhat more difficult as, unlike in crystallography, both the amplitude and the phase information in the data are affected by alignment procedures (which we consider distant analogues of phase-extension methods in crystallography). Therefore, resolution measures in EM reflect the self-consistency of the results; however, as the data are subject to alignment, there is a significant risk of introducing artifacts resulting from the alignment of the noise component in the data. Ultimately, these artifacts will unduly `improve' the resolution of the map.

The resolution measures used in EM fall into two categories: measures based on averaging of Fourier transforms of individual images and measures based on comparisons of averages calculated for subsets of the data. In the first group, we have the Q-factor (van Heel & Hollenberg, 1980[link]; Kessel et al., 1985[link]) and the spectral signal-to-noise ratio (SSNR) introduced for the 2D case by Unser and co-workers (Unser et al., 1987[link]), and for the 3D case for a class of reconstruction algorithms data are based on direct Fourier inversion by Penczek (Penczek, 2002[link]). The second group of measures includes the differential phase residual (DPR) (Frank et al., 1981[link]) and the Fourier ring correlation (FRC) (Saxton & Baumeister, 1982[link]). A marked advantage of these measures is that they are equally well applicable to 2D or 3D data. In the latter case, the volumes resulting from 3D reconstruction algorithms take the place of the 2D averages.

The resolution measures used in single-particle reconstruction are designed to evaluate the SSNR in the reconstruction as a function of spatial frequency (Penczek, 2002[link]). The `resolution' of the reconstruction is reported as a spatial frequency limit beyond which the SSNR drops below a selected level, for example below one.

The FSC is evaluated by taking advantage of the large number of single-particle images: the total data set is randomly split into halves; for each subset a 3D reconstruction is calculated (in two dimensions, a simple average); and two maps f and g are compared in Fourier space,[\eqalignno{&{\rm FSC}(f,g\semi u) &\cr &\quad= {{\textstyle\sum_{\left| {\left\| {{\bf{u}}_n } \right\| - r} \right| \le \varepsilon }^{n_r } {F( {{\bf{u}}_n } )G^* ( {{\bf{u}}_n } )} } \over {\left\{ {\left[ {\textstyle\sum_{\left| {\left\| {{\bf{u}}_n } \right\| - r} \right| \le \varepsilon }^{n_r } {\left| {F( {{\bf{u}}_n } )} \right|^2 } } \right]\left[{\textstyle\sum_{\left| {\left\| {{\bf{u}}_n } \right\| - r} \right| \le \varepsilon }^{n_r } {\left| {G( {{\bf{u}}_n } )} \right|^2 } } \right]} \right\}^{1/2}}}.&\cr&&(2.5.7.19)}]In (2.5.7.19)[link], 2[epsilon] is a preselected ring/shell thickness, the un form a uniform grid in Fourier space, [u = \left\| {{\bf{u}}_n } \right\|] is the magnitude of the spatial frequency and [n_r ] is the number of Fourier voxels in the shell corresponding to frequency u. The FSC yields a 1D curve of correlation coefficients as a function of u. Note that the FSC is insensitive to linear transformations of the densities of the objects. An FSC curve everywhere close to one reflects strong similarity between f and g; an FSC curve with values close to zero indicates the lack of similarity between f and g. Particularly convenient for the interpretation of the results in terms of `resolution' is the relation between the FSC and the SSNR, which is easily derived by taking the expectation of (2.5.7.19)[link] under the assumption that both f and g are sums of the same signal and different realizations of the noise, which are uncorrelated with the signal and between them (Saxton, 1978[link]):[E[ {\rm FSC}] \cong {{\rm SSNR} \over {\rm SSNR + 1}}.\eqno(2.5.7.20)]By solving (2.5.7.20)[link] for SSNR we obtain[{\rm SSNR} = {{\rm FSC} \over {\rm 1 - FSC}},\eqno(2.5.7.21)]which, taking into account that the FSC was calculated from the data set split into halves, has to be modified to (Unser et al., 1987[link])[{\rm SSNR} = 2\left({{\rm FSC} \over {\rm 1 - FSC}}\right).\eqno(2.5.7.22)]In order to calculate the FSC that corresponds to a given SSNR, one inverts (2.5.7.22)[link] to[{\rm FSC} = {{\rm SSNR} \over {\rm SSNR + 2}}.\eqno(2.5.7.23)]Equations (2.5.7.21)[link] and (2.5.7.22)[link] serve as a basis for various `resolution criteria' used in EM. The often-used 3σ criterion (van Heel, 1987b[link]) equates resolution with the point at which the FSC is larger than zero at a 3σ level, where σ is the expected standard deviation of the FSC that has an expected value of zero, in essence finding a frequency for which the SSNR is significantly larger than zero. The 3σ criterion has a distinct disadvantage of reporting the resolution at a frequency at which there is no significant signal, while tempting the user to interpret the detail in the map at this resolution. Moreover, as the FSC approaches zero, its relative error increases, so the curve oscillates widely around the zero level increasing the chance of selecting an incorrect resolution point. In other criteria one tries to equate the resolution with the frequency at which noise begins to dominate the signal. A good choice of the cut-off level is SSNR = 1.0, a level at which the power of the signal in the reconstruction is equal to the power of the noise. According to (2.5.7.22)[link], this corresponds to FSC = 0.333. Another often-used cut-off level is FSC = 0.5, at which the SSNR in the reconstruction is 2.0 (Böttcher et al., 1997[link]; Conway et al., 1997[link]; Penczek, 1998[link]).

The main reason behind the determination of the resolution of the EM maps is the necessary step of low-pass filtration of the results before the interpretation of the map is attempted. In order to avoid mistakes, particularly the danger of overinterpretation, one has to remove from the map unreliable Fourier coefficients. Inclusion of Fourier coefficients with a low SNR will result in the creation of spurious details and artifacts in the map. Thus, the optimal filtration should be based on the SSNR distribution in the map and the solution is given by a Wiener filter:[W( u ) = {{{\rm SSNR}( u)} \over {{\rm SSNR}( u ) + 1}}.\eqno(2.5.7.24)]Based on the relation of FSC to SSNR (2.5.7.22)[link], we can write (2.5.7.24)[link] as[W( u ) = {{2{\rm FSC}( u )} \over {{\rm FSC}( u) + 1}}.\eqno(2.5.7.25)]In practice, because of the irregular shape of typical FSC curves (particularly for small values of FSC) it is preferable to approximate the shape of the Wiener filter (2.5.7.25)[link] by one of the standard low-pass filters, such as Butterworth (Gonzalez & Woods, 2002[link]) or hyperbolic tangent (Basokur, 1998[link]).

The FRC/FSC methodology can be used to compare a noise-corrupted map with a noise-free ideal version of the same object. In single-particle reconstruction this situation emerges when an X-ray crystallographic structure of either the entire EM-determined structure or of some of its domains is available (Penczek et al., 1999[link]). In this case, we assume that in (2.5.7.19)[link] f represents a sum of the signal and additive uncorrelated noise and g represents the noise-free signal, so is straightforward to calculate the expectation of (2.5.7.19)[link] in order to obtain the relation between the cross-resolution (CRC) and the SSNR:[E[ {\rm CRC} ] \cong\left ({\rm SSNR} \over {\rm SSNR + 1}\right)^{1/2} .\eqno(2.5.7.26)]Thus[{\rm SSNR} = {{\rm CRC^2 } \over {\rm 1 - CRC^2 }}.\eqno(2.5.7.27)]Interestingly, for the same SSNR cut-off levels, corresponding values of CRC are higher than those for FSC. For example, for SSNR = 1, CRC = 0.71, while FSC = 0.33. For SSNR = 2, CRC = 0.82, while FSC = 0.5.

2.5.7.11. Analysis of 3D cryo-EM maps

| top | pdf |

The amount of structural information that can be derived from a structure of a macromolecular complex determined by cryo-EM depends on two factors: the resolution of the map and the availability of additional structural information about the system. Generally, we will refer to complexes at a resolution better than 7 Å as high-resolution structures, as at this resolution the elements of secondary structure become directly visible. Maps at resolution lower than that we will call intermediate resolution, as at this scale of detail one can only determine a general arrangement of subunits. However, it is good to realize that there is a huge difference between the amount and reliability of information derived from a map of the same complex determined at 10 Å as compared to a map determined at 30 Å resolution. Similarly, very large complexes determined at 50 Å resolution will yield more information than very small complexes determined at 15 Å resolution. On the other hand, even intermediate-resolution EM maps provide extremely valuable information if they can be placed in the context of other structural work. The single-particle structure can be also investigated within a context of a more complex system using other, lower-resolution techniques, for example electron tomography. In this case, by using docking approaches one can determine the distribution, orientation and general arrangement of smaller cryo-EM determined complexes within larger subcellular systems. On a different scale of resolution, it is quite common to have structures of some domains or even of the entire complexes determined to atomic resolution by X-ray crystallography. Again, by using docking techniques it is possible to determine whether the conformation of the EM structure differs from that determined by X-ray crystallography or to map subunits and domains of the larger complex by fitting available atomic resolution structures.

The basic mode of visualization of cryo-EM maps is surface representation. The first step involves the choice of an appropriate threshold level for the displayed surface, particularly when the scaling of the cryo-EM data is arbitrary. A good guide is provided by the total molecular mass of the complex: given a pixel size of p Å, an average protein density d = 1.36 × 10−24 g Å−3 and the total molecular mass of the complex M Da, the number of voxels Nv occupied by the complex is[N_{\rm v} = M/\left(p^3 dN_{\rm A} \right),\eqno(2.5.7.28)]where NA is the Avogadro's number (6.02 × 1023 atoms mole−1). Based on that, one can find the threshold that for a given structure encompasses the determined number of voxels Nv [appropriate functions are implemented in SPIDER (Agrawal et al., 1996[link]; Frank et al., 1996[link]) and SPARX (Hohn et al., 2007[link])]. At a sufficiently high resolution, cryo-EM maps can be analysed in the same manner as X-ray crystallographic maps and using the same graphical/analytical packages (`backbone tracing') (Jones et al., 1991[link]) (Fig. 2.5.7.6[link]).

[Figure 2.5.7.6]

Figure 2.5.7.6 | top | pdf |

A high-resolution cryo-EM map allows backbone tracing. (a) Cryo-EM map of the cricket paralysis virus (CrPV) IRES RNA in complex with the yeast 80S ribosome determined at 7.3 Å resolution. The map is shown from the L1 protuberance side with the ribosomal 40S subunit in yellow, the 60S subunit in blue and the CrPV IRES in magenta. Landmarks for the 40S subunit: b, body; h, head; p, platform. Landmarks for the 60S subunit: CP, central protuberance; L1, L1 protuberance. PKIII denotes helix PKIII of the CrPV IRES RNA and SL the two stem loops present in the secondary structure of the RNA. (b) Structure of the CrPV IRES RNA. Based on the cryo-EM map and additional biochemical knowledge, the complete chain of the RNA is found (189 nucleotides could be traced). The IRES molecular model is shown as a coloured ribbon docked into the cryo-EM density (grey mesh). PK and P denote the individual helical elements of the IRES and SL denotes the stem loops (Schüler et al., 2006[link]).

The complexity of cryo-EM maps of large macromolecular assemblies combined with their limited resolution invites attempts to automate some of the steps of analysis in an attempt to make the results more robust and less dependent on the researcher's bias. A good example of semi-automated analysis is the nucleic acid–protein separation in a 11.5 Å cryo-EM map of the 70S E. coli ribosome (Spahn et al., 2000[link]). In the procedure, the (continuous-valued) densities were analysed making use of (i) the difference in scattering density between protein and nucleic acids; (ii) continuity constraints that the image of any nucleic acid molecule must obey and (iii) knowledge of the molecular volumes of all proteins. As a result, it was possible to reproduce boundary assignments between ribosomal RNA (rRNA) and proteins made from higher-resolution X-ray maps of the ribosomal subunits with a high degree of accuracy, and allowed plausible predictions to be made for the placements of proteins and RNA components as yet unassigned. One of the conclusions derived from this separation was that the 23S rRNA is solely responsible for the catalysis of peptide-bond formation; thus, the ribosome is a ribozyme. The same conclusion was reached independently in the studies of the X-ray crystallographic structure of the 70S ribosome (Nissen et al., 2000[link]). The method by Spahn et al. cannot be easily extended to other macromolecules that comprise only protein and generally it is very difficult to delineate at intermediate resolution subunits of large macromolecular assemblies, automatically or not, in the absence of independent knowledge about their shape. The reason is that both the density and shape of the subunits are affected by the limited resolution differently depending on their spatial context. In general, subunits that are isolated and located on the surface or protruding from the structure will have relatively lower density while at the same time their overall shape will be better preserved and easier to discern. Subunits located inside the structure and surrounded by other structural elements, while having higher density, are more difficult to recognize, as they fuse with the surrounding mass densities. Therefore, it is difficult to provide a general method that could cope with the problem of automated mass-density analysis.

As most cryo-EM structures are determined at intermediate resolution, the most common mode of analysis is either to compare the map with the available X-ray crystallographic structures of its domains or to consider the result in the context of larger, subcellular structures obtained by electron tomography. In both cases correlation techniques are used extensively to obtain objective results or to validate the results obtained by manual fitting.

In docking of X-ray crystallographic structures into EM maps, the first step is the conversion of atomic coordinates from X-ray molecular models, as given in Protein Data Bank (PDB) files, into an electron-density map in a way that would mimic the physical image formation process. Although sophisticated methods of computational emulation of the image-formation process in the electron microscope are available, very simple approaches to conversion yield quite satisfactory results at the resolution of the EM results. The most common one is to assume that the Coulomb potential of an atom is proportional to its atomic number and add these atomic numbers within a Euclidean grid with a cell size equal to the EM pixel size in Å. The atomic coordinates of atoms are interpolated within the grid using trilinear interpolation. After such conversion, the X-ray map can be handled using the general image-processing tools of a single-particle software package. Initial orientation (or orientations, if the general placement is not immediately visually apparent) of the X-ray map can be easily performed manually within any number of graphical packages, for example Chimera (Pettersen et al., 2004[link]). The initial six orientation parameters (three translations and three Eulerian angles) are next transferred to the EM package (for details see Baldwin & Penczek, 2007[link]) and the manual docking is refined using correlation techniques (Fig. 2.5.7.7[link]). Similarly, the handedness of the EM map can be established or confirmed by performing fitting of the X-ray determined structure to two EM maps that differ by their hand.

[Figure 2.5.7.7]

Figure 2.5.7.7 | top | pdf |

Hrs (blue) embedded into the membrane (yellow) of an early endosome. Two functional domains have been docked into the cryo-EM density map of hexameric Hrs, determined to 16 Å resolution, and are shown as ribbons coloured by secondary structure (Pullan et al., 2006[link]). The structures of the VHS and FYVE domains have been determined crystallographically (Mao et al., 2000[link]) and are docked into the EM density map of the Hrs. The knowledge of the location of the FYVE and VHS domains, which are reported to bind to PI(3)P molecules found within the endosomal membrane (Kutateladze et al., 1999[link]), has guided the hypothetical placement of Hrs within the endosomal membrane. The immersion of the end caps of Hrs into the endosomal membrane demonstrates an `end-on' binding model of the Hrs particles with the membrane. According to this model, either end cap can embed into the membrane, allowing the other end cap to carry out other essential protein trafficking functions, or to embed into another membrane, thus preventing fusion of membranes during early endosomal fusion.

Docking of EM maps into the broader cellular context of structures determined by electron tomography can provide information about the distribution of complexes and their interactions within the cell. Conceptually, the approach is very similar to that of particle picking, i.e., template matching, with the main difference being that calculations are performed in three instead of two dimensions. Given a 3D structure of a single-particle EM complex, a set of 3D templates is prepared by rotating the template around its centre of mass using the quasi-uniformly distributed three Eulerian angles [equation (2.5.7.17)[link] with [\Delta \psi = \delta \theta ]]. However, in application to tomography the angular step [\delta \theta ] can be relatively large, resulting in a much smaller number of templates than in two dimensions, the reason being the rather low resolution of typical electron tomograms (not exceeding 50 Å). Next, a brute-force 3D cross-correlation search with all templates is performed (Frangakis et al., 2002[link]). After windowing out 3D subvolumes containing putative complexes, subsequent averaging and classification can be performed.

Cryo-EM is a unique structural technique in its ability to detect conformational variability of large molecular assemblies within one sample that may contain a mixture of complexes in various conformational states. In addition to the expected conformational heterogeneity of the assemblies, due to fluctuations of the structure around the ground state one can expect to capture molecules in different functional states, especially if the binding of a ligand induces a conformational change in the macromolecular assembly. Therefore, a data set of images from an EM experiment must be interpreted as a mixture of projections from similar but not identical structures. The analysis of the extent of the resulting variability requires the calculation of the real-space distribution of 3D variance/covariance in macromolecules reconstructed from a set of their projections. The problem is difficult, as there is no clear relation between the variance in sets of projections that have the same angular direction and the variance of the 3D structure calculated from these projections. Penczek, Chao et al. (2006[link]) proposed calculating the variance in the 3D mass distribution of the structure using a statistical bootstrap resampling technique, in which a new set of projections is selected with replacements from the available whole set of N projections. In the new set, some of the original projections will appear more than once, while others will be omitted. This selection process is repeated a number of times and for each new set of projections the corresponding 3D volume is calculated. Next, the voxel-by-voxel bootstrap variance [\sigma _{\rm B}^2 ] of the resulting set of volumes is calculated. The target variance is obtained using a relationship between the variance of arithmetic means for sampling with replacements and the sample variance,[\sigma ^2 = N\sigma _{\rm B}^2 .\eqno(2.5.7.29)]The estimated structure-variance map can be used for (i) detection of different functional states (for example, those characterized by binding of a ligand) and subsequent classification of the data set into homogeneous groups (Penczek, Frank & Spahn, 2006a[link]), (ii) analysis of the significance of small details in 3D reconstructions, (iii) analysis of the significance of details in difference maps, and (iv) docking of known structural domains into EM density maps.

The bootstrap technique also leads to the analysis of conformational modes of macromolecular complexes, and this is due to the fact that the covariance matrix of the structure can be directly calculated from the bootstrap volumes. The covariance matrix obtained this way would be very large. One possibility is to calculate only correlation coefficients between regions of interest that have large variance (Penczek, Chao et al., 2006[link]). Another possibility is to use the iterative Lanczos technique (Parlett, 1980[link]) and calculate eigenvolumes directly from bootstrap volumes without forming the covariance matrix. These eigenvolumes are related to conformational modes of the molecule, as captured by the projection data of the sample (Penczek, Frank & Spahn, 2006b[link]). Thus, this direct relation to the actual cryo-EM projection data positively distinguishes this approach from other techniques in which conformations are postulated based on flexible models of the EM map (Ming et al., 2002[link]; Mitra et al., 2005[link]).

References

Adiga, P. S., Malladi, R., Baxter, W. & Glaeser, R. M. (2004). A binary segmentation approach for boxing ribosome particles in cryo EM micrographs. J. Struct. Biol. 145, 142–151.
Agrawal, R. K., Penczek, P., Grassucci, R. A., Li, Y., Leith, A., Nierhaus, K. H. & Frank, J. (1996). Direct visualization of A-, P-, and E-site transfer RNAs in the Escherichia coli ribosome. Science, 271, 1000–1002.
Baldwin, P. R. & Penczek, P. A. (2007). The transform class in SPARX and EMAN2. J. Struct. Biol. 157, 250–261.
Basokur, A. T. (1998). Digital filter design using the hyperbolic tangent functions. J. Balkan Geophys. Soc. 1, 14–18.
Boisset, N., Penczek, P., Pochon, F., Frank, J. & Lamy, J. (1993). Three-dimensional architecture of human alpha 2-macroglobulin transformed with methylamine. J. Mol. Biol. 232, 522–529.
Boisset, N., Penczek, P., Taveau, J. C., Lamy, J. & Frank, J. (1995). Three-dimensional reconstruction of Androctonus australis hemocyanin labeled with a monoclonal Fab fragment. J. Struct. Biol. 115, 16–29.
Böttcher, B., Wynne, S. A. & Crowther, R. A. (1997). Determination of the fold of the core protein of hepatitis B virus by electron cryomicroscopy. Nature (London), 386, 88–91.
Conway, J. F., Cheng, N., Zlotnick, A., Wingfield, P. T., Stahl, S. J. & Steven, A. C. (1997). Visualization of a 4-helix bundle in the hepatitis B virus capsid by cryo-electron microscopy. Nature (London), 386, 91–94.
Crowther, R. A., DeRosier, D. J. & Klug, A. (1970). The reconstruction of a three-dimensional structure from projections and its application to electron microscopy. Proc. R. Soc. London Ser. A, 317, 319–340.
Dube, P., Tavares, P., Lurz, R. & van Heel, M. (1993). The portal protein of bacteriophage SPP1: a DNA pump with 13-fold symmetry. EMBO J. 12, 1303–1309.
Farrow, N. A. & Ottensmeyer, F. P. (1992). A posteriori determination of relative projection directions of arbitrarily oriented macromolecules. J. Opt. Soc. Am. A, 9, 1749–1760.
Fernandez, J.-J., Sanjurjo, J. R. & Carazo, J. M. (1997). A spectral estimation approach to contrast transfer function detection in electron microscopy. Ultramicroscopy, 68, 267–295.
Fishman, G. (1995). Monte Carlo: Concepts, Algorithms, and Applications. New York: Springer.
Frangakis, A. S., Bohm, J., Forster, F., Nickell, S., Nicastro, D., Typke, D., Hegerl, R. & Baumeister, W. (2002). Identification of macromolecular complexes in cryoelectron tomograms of phantom cells. Proc. Natl Acad. Sci. USA, 99, 14153–14158.
Frank, J. (1990). Classification of macromolecular assemblies studied as `single particles'. Quart. Rev. Biophys. 23, 281–329.
Frank, J. & Mcewen, B. (1992). Alignment by crosscorrelation. In Electron Tomography, edited by J. Frank, pp. 205–214. New York: Plenum.
Frank, J., Penczek, P., Agrawal, R. K., Grassucci, R. A. & Heagle, A. B. (2000). Three-dimensional cryoelectron microscopy of ribosomes. Methods Enzymol. 317, 276–291.
Frank, J., Penczek, P. & Liu, W. (1992). Alignment, classification, and three-dimensional reconstruction of single particles embedded in ice. Scan. Microsc. Suppl. 6, 11–20.
Frank, J., Radermacher, M., Penczek, P., Zhu, J., Li, Y., Ladjadj, M. & Leith, A. (1996). SPIDER and WEB: processing and visualization of images in 3D electron microscopy and related fields. J. Struct. Biol. 116, 190–199.
Frank, J., Radermacher, M., Wagenknecht, T. & Verschoor, A. (1988). Studying ribosome structure by electron microscopy and computer-image processing. Methods Enzymol. 164, 3–35.
Frank, J., Verschoor, A. & Boublik, M. (1981). Computer averaging of electron micrographs of 40S ribosomal subunits. Science, 214, 1353–1355.
Frank, J., Verschoor, A. & Boublik, M. (1982). Multivariate statistical analysis of ribosome electron micrographs. L and R lateral views of the 40 S subunit from HeLa cells. J. Mol. Biol. 161, 107–133.
Frank, J. & Wagenknecht, T. (1984). Automatic selection of molecular images from electron micrographs. Ultramicroscopy, 12, 169–176.
Gabashvili, I. S., Agrawal, R. K., Spahn, C. M., Grassucci, R. A., Svergun, D. I., Frank, J. & Penczek, P. (2000). Solution structure of the E. coli 70S ribosome at 11.5 Å resolution. Cell, 100, 537–549.
Goncharov, A. B., Vainshtein, B. K., Ryskin, A. I. & Vagin, A. A. (1987). Three-dimensional reconstruction of arbitrarily oriented identical particles from their electron photomicrographs. Sov. Phys. Crystallogr. 32, 504–509.
Gonzalez, R. F. & Woods, R. E. (2002). Digital Image Processing. Upper Saddle River: Prentice Hall.
Grigorieff, N. (2007). FREALIGN: High-resolution refinement of single particle structures. J. Struct. Biol. 157, 117.
Hall, R. J. & Patwardhan, A. (2004). A two step approach for semi-automated particle selection from low contrast cryo-electron micrographs. J. Struct. Biol. 145, 19–28.
Heel, M. van (1982). Detection of objects in quantum-noise limited images. Ultramicroscopy, 8, 331–342.
Heel, M. van (1987a). Angular reconstitution: a posteriori assignment of projection directions for 3D reconstruction. Ultramicroscopy, 21, 111–124.
Heel, M. van (1987b). Similarity measures between images. Ultramicroscopy, 21, 95–100.
Heel, M. van, Gowen, B., Matadeen, R., Orlova, E. V., Finn, R., Pape, T., Cohen, D., Stark, H., Schmidt, R., Schatz, M. & Patwardhan, A. (2000). Single-particle electron cryo-microscopy: towards atomic resolution. Quart. Rev. Biophys. 33, 307–369.
Heel, M. van, Harauz, G. & Orlova, E. V. (1996). A new generation of the IMAGIC image processing system. J. Struct. Biol. 116, 17–24.
Heel, M. van & Hollenberg, J. (1980). The stretching of distorted images of two-dimensional crystals. In Electron Microscopy at Molecular Dimensions, edited by W. Baumeister, pp. 256–260. Berlin: Springer.
Hohn, M., Tang, G., Goodyear, G., Baldwin, P. R., Huang, Z., Penczek, P. A., Yang, C., Glaeser, R. M., Adams, P. D. & Ludtke, S. J. (2007). SPARX, a new environment for cryo-EM image processing. J. Struct. Biol. 157, 47–55.
Huang, Z., Baldwin, P. R., Mullapudi, S. R. & Penczek, P. A. (2003). Automated determination of parameters describing power spectra of micrograph images in electron microscopy. J. Struct. Biol. 144, 79–94.
Huang, Z. & Penczek, P. A. (2004). Application of template matching technique to particle detection in electron micrographs. J. Struct. Biol. 145, 29–40.
Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Cryst. A47, 110–119.
Joyeux, L. & Penczek, P. A. (2002). Efficiency of 2D alignment methods. Ultramicroscopy, 92, 33–46.
Kessel, M., Radermacher, M. & Frank, J. (1985). The structure of the stalk surface layer of a brine pond microorganism: correlation averaging applied to a double layered lattice structure. J. Microsc. 139, 63–74.
Lata, K. R., Penczek, P. & Frank, J. (1995). Automatic particle picking from electron micrographs. Ultramicroscopy, 58, 381–391.
Lawrence, M. C. (1992). Least-squares method of alignment using markers. In Electron Tomography, edited by J. Frank, pp. 197–204. New York: Plenum Press.
Ludtke, S. J., Baldwin, P. R. & Chiu, W. (1999). EMAN: semiautomated software for high-resolution single-particle reconstructions. J. Struct. Biol. 128, 82–97.
Mallick, S. P., Carragher, B., Potter, C. S. & Kriegman, D. J. (2005). ACE: automated CTF estimation. Ultramicroscopy, 104, 8–29.
Marabini, R. & Carazo, J. M. (1996). On a new computationally fast image invariant based on bispectral projections. Pattern Recognit. Lett. 17, 959–967.
Mindell, J. A. & Grigorieff, N. (2003). Accurate determination of local defocus and specimen tilt in electron microscopy. J. Struct. Biol. 142, 334–347.
Ming, D. M., Kong, Y. F., Lambert, M. A., Huang, Z. & Ma, J. P. (2002). How to describe the movement of protein without amino acids sequence and coordinates. Proc. Natl Acad. Sci. USA, 13, 8620–8625.
Mitra, K., Schaffitzel, C., Shaikh, T., Tama, F., Jenni, S., Brooks, C. L. III, Ban, N. & Frank, J. (2005). Structure of the E. coli protein-conducting channel bound to a translating ribosome. Nature (London), 438, 318–324.
Mouche, F., Boisset, N. & Penczek, P. A. (2001). Lumbricus terrestris hemoglobin – the architecture of linker chains and structural variation of the central toroid. J. Struct. Biol. 133, 176–192.
Nissen, P., Hansen, J., Ban, N., Moore, P. B. & Steitz, T. A. (2000). The structural basis of ribosome activity in peptide bond synthesis. Science, 289, 920–930.
Parlett, B. N. (1980). A new look at the Lanczos-algorithm for solving symmetric-systems of linear-equations. Linear Algebr. Its Appl. 29, 323–346.
Penczek, P. (1998). Measures of resolution using Fourier shell correlation. J. Mol. Biol. 280, 115–116.
Penczek, P., Ban, N., Grassucci, R. A., Agrawal, R. K. & Frank, J. (1999). Haloarcula marismortui 50S subunit – complementarity of electron microscopy and X-ray crystallographic information. J. Struct. Biol. 128, 44–50.
Penczek, P., Marko, M., Buttle, K. & Frank, J. (1995). Double-tilt electron tomography. Ultramicroscopy, 60, 393–410.
Penczek, P., Radermacher, M. & Frank, J. (1992). Three-dimensional reconstruction of single particles embedded in ice. Ultramicroscopy, 40, 33–53.
Penczek, P. A. (2002). Three-dimensional spectral signal-to-noise ratio for a class of reconstruction algorithms. J. Struct. Biol. 138, 34–46.
Penczek, P. A., Chao, Y., Frank, J. & Spahn, C. M. T. (2006). Estimation of variance in single particle reconstruction using the bootstrap technique. J. Struct. Biol. 154, 168–183.
Penczek, P. A. & Frank, J. (2006). Resolution in electron tomography. In Electron Tomography: Methods for Three-Dimensional Visualization of Structures in the Cell, 2nd ed., edited by J. Frank, pp. 307–330. Berlin: Springer.
Penczek, P. A., Frank, J. & Spahn, C. M. T. (2006a). A method of focused classification, based on the bootstrap 3-D variance analysis, and its application to EF-G-dependent translocation. J. Struct. Biol. 154, 184–194.
Penczek, P. A., Frank, J. & Spahn, C. M. T. (2006b). Conformational analysis of macromolecules analyzed by cryo-electron microscopy. In Microscopy and Microanalysis, edited by P. Kotula, M. Marko, J.-H. Scott et al., p. CD386. Chicago: Cambridge University Press.
Penczek, P. A., Grassucci, R. A. & Frank, J. (1994). The ribosome at improved resolution: new techniques for merging and orientation refinement in 3D cryo-electron microscopy of biological particles. Ultramicroscopy, 53, 251–270.
Penczek, P. A., Zhu, J. & Frank, J. (1996). A common-lines based method for determining orientations for N > 3 particle projections simultaneously. Ultramicroscopy, 63, 205–218.
Penczek, P. A., Zhu, J., Schröder, R. & Frank, J. (1997). Three-dimensional reconstruction with contrast transfer function compensation from defocus series. Scan. Microsc. Suppl. 11, 1–10.
Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C. & Ferrin, T. E. (2004). UCSF Chimera – a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612.
Powell, M. J. D. (1973). On search directions for minimization algorithm. Math. Program. 4, 193–201.
Provencher, S. W. & Vogel, R. H. (1988). Three-dimensional reconstruction from electron micrographs of disordered specimens. I. Method. Ultramicroscopy, 25, 209–221.
Pullan, L., Mullapudi, S., Huang, Z., Baldwin, P. R., Chin, C., Sun, W., Tsujimoto, S., Kolodziej, S., Stoops, J. K., Lee, J. C., Waxham, M. N., Bean, A. J. & Penczek, P. A. (2006). The endosome-associated protein Hrs is hexameric and controls cargo sorting as a `master molecule'. Structure, 14, 661–671.
Radermacher, M. (1994). Three-dimensional reconstruction from random projections: orientational alignment via Radon transforms. Ultramicroscopy, 53, 121–136.
Radermacher, M., Wagenknecht, T., Verschoor, A. & Frank, J. (1987). Three-dimensional reconstruction from a single-exposure, random conical tilt series applied to the 50S ribosomal subunit of Escherichia coli. J. Microsc. 146, 113–136.
Roseman, A. M. (2003). Particle finding in electron micrographs using a fast local correlation algorithm. Ultramicroscopy, 94, 225–236.
Roseman, A. M., Chen, S., White, H., Braig, K. & Saibil, H. R. (1996). The chaperonin ATPase cycle: mechanism of allosteric switching and movements of substrate-binding domains in GroEL. Cell, 87, 241–251.
Ruiz, T., Mechin, I., Bar, J., Rypniewski, W., Kopperschlager, G. & Radermacher, M. (2003). The 10.8-A structure of Saccharomyces cerevisiae phosphofructokinase determined by cryoelectron microscopy: localization of the putative fructose 6-phosphate binding sites. J. Struct. Biol. 143, 124–134.
Saad, A., Ludtke, S. J., Jakana, J., Rixon, F. J., Tsuruta, H. & Chiu, W. (2001). Fourier amplitude decay of electron cryomicroscopic images of single particles and effects on structure determination. J. Struct. Biol. 133, 32–42.
Sander, B., Golas, M. M. & Stark, H. (2003). Automatic CTF correction for single particles based upon multivariate statistical analysis of individual power spectra. J. Struct. Biol. 142, 392–401.
Saxton, W. O. (1978). Computer Techniques for Image Processing of Electron Microscopy. New York: Academic Press.
Saxton, W. O. & Baumeister, W. (1982). The correlation averaging of a regularly arranged bacterial envelope protein. J. Microsc. 127, 127–138.
Schatz, M. & van Heel, M. (1990). Invariant classification of molecular views in electron micrographs. Ultramicroscopy, 32, 255–264.
Schatz, M. & van Heel, M. (1992). Invariant recognition of molecular projections in vitreous ice preparations. Ultramicroscopy, 45, 15–22.
Sigworth, F. J. (2004). Classical detection theory and the cryo-EM particle selection problem. J. Struct. Biol. 145, 111–122.
Spahn, C. M. T., Penczek, P., Leith, A. & Frank, J. (2000). A method for differentiating proteins from nucleic acids in intermediate-resolution density maps: cryo-electron microscopy defines the quaternary structure of the Escherichia coli 70S ribosome. Struct. Fold. Des. 8, 937–948.
Unser, M., Trus, B. L. & Steven, A. C. (1987). A new resolution criterion based on spectral signal-to-noise ratios. Ultramicroscopy, 23, 39–51.
Vogel, R. H. & Provencher, S. W. (1988). Three-dimensional reconstruction from electron micrographs of disordered specimens. II. Implementation and results. Ultramicroscopy, 25, 223–239.
Walz, J., Typke, D., Nitsch, M., Koster, A. J., Hegerl, R. & Baumeister, W. (1997). Electron tomography of single ice-embedded macromolecules: three-dimensional alignment and classification. J. Struct. Biol. 120, 387–395.
Welch, P. D. (1967). The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short modified periodograms. IEEE Trans. Audio Electroacoust. AU-15, 70–73.
Wong, H. C., Chen, J., Mouche, F., Rouiller, I. & Bern, M. (2004). Model-based particle picking for cryo-electron microscopy. J. Struct. Biol. 145, 157–167.
Yang, C., Ng, E. G. & Penczek, P. A. (2005). Unified 3-D structure and projection orientation refinement using quasi-Newton algorithm. J. Struct. Biol. 149, 53–64.
Yin, Z. H., Zheng, Y. L., Doerschuk, P. C., Natarajan, P. & Johnson, J. E. (2003). A statistical approach to computer processing of cryo-electron microscope images: virion classification and 3-D reconstruction. J. Struct. Biol. 144, 24–50.
Yin, Z. Y., Zheng, Y. L. & Doerschuk, P. C. (2001). An ab initio algorithm for low-resolution 3-D reconstructions from cryoelectron microscopy images. J. Struct. Biol. 133, 132–142.
Zhu, J., Penczek, P. A., Schröder, R. & Frank, J. (1997). Three-dimensional reconstruction with contrast transfer function correction from energy-filtered cryoelectron micrographs: procedure and application to the 70S Escherichia coli ribosome. J. Struct. Biol. 118, 197–219.
Zhu, Y., Carragher, B., Glaeser, R. M., Fellmann, D., Bajaj, C., Bern, M., Mouche, F., de Haas, F., Hall, R. J., Kriegman, D. J., Ludtke, S. C., Mallick, S. P., Penczek, P. A., Roseman, A. M., Sigworth, F. J., Volkmann, N. & Potter, C. S. (2004). Automatic particle selection: results of a comparative study. J. Struct. Biol. 145, 3–14.








































to end of page
to top of page