Tables for
Volume B
Reciprocal space
Edited by U. Shmueli

International Tables for Crystallography (2010). Vol. B, ch. 2.5, pp. 386-388

Section Analysis of 3D cryo-EM maps

P. A. Penczekg Analysis of 3D cryo-EM maps

| top | pdf |

The amount of structural information that can be derived from a structure of a macromolecular complex determined by cryo-EM depends on two factors: the resolution of the map and the availability of additional structural information about the system. Generally, we will refer to complexes at a resolution better than 7 Å as high-resolution structures, as at this resolution the elements of secondary structure become directly visible. Maps at resolution lower than that we will call intermediate resolution, as at this scale of detail one can only determine a general arrangement of subunits. However, it is good to realize that there is a huge difference between the amount and reliability of information derived from a map of the same complex determined at 10 Å as compared to a map determined at 30 Å resolution. Similarly, very large complexes determined at 50 Å resolution will yield more information than very small complexes determined at 15 Å resolution. On the other hand, even intermediate-resolution EM maps provide extremely valuable information if they can be placed in the context of other structural work. The single-particle structure can be also investigated within a context of a more complex system using other, lower-resolution techniques, for example electron tomography. In this case, by using docking approaches one can determine the distribution, orientation and general arrangement of smaller cryo-EM determined complexes within larger subcellular systems. On a different scale of resolution, it is quite common to have structures of some domains or even of the entire complexes determined to atomic resolution by X-ray crystallography. Again, by using docking techniques it is possible to determine whether the conformation of the EM structure differs from that determined by X-ray crystallography or to map subunits and domains of the larger complex by fitting available atomic resolution structures.

The basic mode of visualization of cryo-EM maps is surface representation. The first step involves the choice of an appropriate threshold level for the displayed surface, particularly when the scaling of the cryo-EM data is arbitrary. A good guide is provided by the total molecular mass of the complex: given a pixel size of p Å, an average protein density d = 1.36 × 10−24 g Å−3 and the total molecular mass of the complex M Da, the number of voxels Nv occupied by the complex is[N_{\rm v} = M/\left(p^3 dN_{\rm A} \right),\eqno(]where NA is the Avogadro's number (6.02 × 1023 atoms mole−1). Based on that, one can find the threshold that for a given structure encompasses the determined number of voxels Nv [appropriate functions are implemented in SPIDER (Agrawal et al., 1996[link]; Frank et al., 1996[link]) and SPARX (Hohn et al., 2007[link])]. At a sufficiently high resolution, cryo-EM maps can be analysed in the same manner as X-ray crystallographic maps and using the same graphical/analytical packages (`backbone tracing') (Jones et al., 1991[link]) (Fig.[link]).


Figure | top | pdf |

A high-resolution cryo-EM map allows backbone tracing. (a) Cryo-EM map of the cricket paralysis virus (CrPV) IRES RNA in complex with the yeast 80S ribosome determined at 7.3 Å resolution. The map is shown from the L1 protuberance side with the ribosomal 40S subunit in yellow, the 60S subunit in blue and the CrPV IRES in magenta. Landmarks for the 40S subunit: b, body; h, head; p, platform. Landmarks for the 60S subunit: CP, central protuberance; L1, L1 protuberance. PKIII denotes helix PKIII of the CrPV IRES RNA and SL the two stem loops present in the secondary structure of the RNA. (b) Structure of the CrPV IRES RNA. Based on the cryo-EM map and additional biochemical knowledge, the complete chain of the RNA is found (189 nucleotides could be traced). The IRES molecular model is shown as a coloured ribbon docked into the cryo-EM density (grey mesh). PK and P denote the individual helical elements of the IRES and SL denotes the stem loops (Schüler et al., 2006[link]).

The complexity of cryo-EM maps of large macromolecular assemblies combined with their limited resolution invites attempts to automate some of the steps of analysis in an attempt to make the results more robust and less dependent on the researcher's bias. A good example of semi-automated analysis is the nucleic acid–protein separation in a 11.5 Å cryo-EM map of the 70S E. coli ribosome (Spahn et al., 2000[link]). In the procedure, the (continuous-valued) densities were analysed making use of (i) the difference in scattering density between protein and nucleic acids; (ii) continuity constraints that the image of any nucleic acid molecule must obey and (iii) knowledge of the molecular volumes of all proteins. As a result, it was possible to reproduce boundary assignments between ribosomal RNA (rRNA) and proteins made from higher-resolution X-ray maps of the ribosomal subunits with a high degree of accuracy, and allowed plausible predictions to be made for the placements of proteins and RNA components as yet unassigned. One of the conclusions derived from this separation was that the 23S rRNA is solely responsible for the catalysis of peptide-bond formation; thus, the ribosome is a ribozyme. The same conclusion was reached independently in the studies of the X-ray crystallographic structure of the 70S ribosome (Nissen et al., 2000[link]). The method by Spahn et al. cannot be easily extended to other macromolecules that comprise only protein and generally it is very difficult to delineate at intermediate resolution subunits of large macromolecular assemblies, automatically or not, in the absence of independent knowledge about their shape. The reason is that both the density and shape of the subunits are affected by the limited resolution differently depending on their spatial context. In general, subunits that are isolated and located on the surface or protruding from the structure will have relatively lower density while at the same time their overall shape will be better preserved and easier to discern. Subunits located inside the structure and surrounded by other structural elements, while having higher density, are more difficult to recognize, as they fuse with the surrounding mass densities. Therefore, it is difficult to provide a general method that could cope with the problem of automated mass-density analysis.

As most cryo-EM structures are determined at intermediate resolution, the most common mode of analysis is either to compare the map with the available X-ray crystallographic structures of its domains or to consider the result in the context of larger, subcellular structures obtained by electron tomography. In both cases correlation techniques are used extensively to obtain objective results or to validate the results obtained by manual fitting.

In docking of X-ray crystallographic structures into EM maps, the first step is the conversion of atomic coordinates from X-ray molecular models, as given in Protein Data Bank (PDB) files, into an electron-density map in a way that would mimic the physical image formation process. Although sophisticated methods of computational emulation of the image-formation process in the electron microscope are available, very simple approaches to conversion yield quite satisfactory results at the resolution of the EM results. The most common one is to assume that the Coulomb potential of an atom is proportional to its atomic number and add these atomic numbers within a Euclidean grid with a cell size equal to the EM pixel size in Å. The atomic coordinates of atoms are interpolated within the grid using trilinear interpolation. After such conversion, the X-ray map can be handled using the general image-processing tools of a single-particle software package. Initial orientation (or orientations, if the general placement is not immediately visually apparent) of the X-ray map can be easily performed manually within any number of graphical packages, for example Chimera (Pettersen et al., 2004[link]). The initial six orientation parameters (three translations and three Eulerian angles) are next transferred to the EM package (for details see Baldwin & Penczek, 2007[link]) and the manual docking is refined using correlation techniques (Fig.[link]). Similarly, the handedness of the EM map can be established or confirmed by performing fitting of the X-ray determined structure to two EM maps that differ by their hand.


Figure | top | pdf |

Hrs (blue) embedded into the membrane (yellow) of an early endosome. Two functional domains have been docked into the cryo-EM density map of hexameric Hrs, determined to 16 Å resolution, and are shown as ribbons coloured by secondary structure (Pullan et al., 2006[link]). The structures of the VHS and FYVE domains have been determined crystallographically (Mao et al., 2000[link]) and are docked into the EM density map of the Hrs. The knowledge of the location of the FYVE and VHS domains, which are reported to bind to PI(3)P molecules found within the endosomal membrane (Kutateladze et al., 1999[link]), has guided the hypothetical placement of Hrs within the endosomal membrane. The immersion of the end caps of Hrs into the endosomal membrane demonstrates an `end-on' binding model of the Hrs particles with the membrane. According to this model, either end cap can embed into the membrane, allowing the other end cap to carry out other essential protein trafficking functions, or to embed into another membrane, thus preventing fusion of membranes during early endosomal fusion.

Docking of EM maps into the broader cellular context of structures determined by electron tomography can provide information about the distribution of complexes and their interactions within the cell. Conceptually, the approach is very similar to that of particle picking, i.e., template matching, with the main difference being that calculations are performed in three instead of two dimensions. Given a 3D structure of a single-particle EM complex, a set of 3D templates is prepared by rotating the template around its centre of mass using the quasi-uniformly distributed three Eulerian angles [equation ([link] with [\Delta \psi = \delta \theta ]]. However, in application to tomography the angular step [\delta \theta ] can be relatively large, resulting in a much smaller number of templates than in two dimensions, the reason being the rather low resolution of typical electron tomograms (not exceeding 50 Å). Next, a brute-force 3D cross-correlation search with all templates is performed (Frangakis et al., 2002[link]). After windowing out 3D subvolumes containing putative complexes, subsequent averaging and classification can be performed.

Cryo-EM is a unique structural technique in its ability to detect conformational variability of large molecular assemblies within one sample that may contain a mixture of complexes in various conformational states. In addition to the expected conformational heterogeneity of the assemblies, due to fluctuations of the structure around the ground state one can expect to capture molecules in different functional states, especially if the binding of a ligand induces a conformational change in the macromolecular assembly. Therefore, a data set of images from an EM experiment must be interpreted as a mixture of projections from similar but not identical structures. The analysis of the extent of the resulting variability requires the calculation of the real-space distribution of 3D variance/covariance in macromolecules reconstructed from a set of their projections. The problem is difficult, as there is no clear relation between the variance in sets of projections that have the same angular direction and the variance of the 3D structure calculated from these projections. Penczek, Chao et al. (2006[link]) proposed calculating the variance in the 3D mass distribution of the structure using a statistical bootstrap resampling technique, in which a new set of projections is selected with replacements from the available whole set of N projections. In the new set, some of the original projections will appear more than once, while others will be omitted. This selection process is repeated a number of times and for each new set of projections the corresponding 3D volume is calculated. Next, the voxel-by-voxel bootstrap variance [\sigma _{\rm B}^2 ] of the resulting set of volumes is calculated. The target variance is obtained using a relationship between the variance of arithmetic means for sampling with replacements and the sample variance,[\sigma ^2 = N\sigma _{\rm B}^2 .\eqno(]The estimated structure-variance map can be used for (i) detection of different functional states (for example, those characterized by binding of a ligand) and subsequent classification of the data set into homogeneous groups (Penczek, Frank & Spahn, 2006a[link]), (ii) analysis of the significance of small details in 3D reconstructions, (iii) analysis of the significance of details in difference maps, and (iv) docking of known structural domains into EM density maps.

The bootstrap technique also leads to the analysis of conformational modes of macromolecular complexes, and this is due to the fact that the covariance matrix of the structure can be directly calculated from the bootstrap volumes. The covariance matrix obtained this way would be very large. One possibility is to calculate only correlation coefficients between regions of interest that have large variance (Penczek, Chao et al., 2006[link]). Another possibility is to use the iterative Lanczos technique (Parlett, 1980[link]) and calculate eigenvolumes directly from bootstrap volumes without forming the covariance matrix. These eigenvolumes are related to conformational modes of the molecule, as captured by the projection data of the sample (Penczek, Frank & Spahn, 2006b[link]). Thus, this direct relation to the actual cryo-EM projection data positively distinguishes this approach from other techniques in which conformations are postulated based on flexible models of the EM map (Ming et al., 2002[link]; Mitra et al., 2005[link]).


Agrawal, R. K., Penczek, P., Grassucci, R. A., Li, Y., Leith, A., Nierhaus, K. H. & Frank, J. (1996). Direct visualization of A-, P-, and E-site transfer RNAs in the Escherichia coli ribosome. Science, 271, 1000–1002.
Baldwin, P. R. & Penczek, P. A. (2007). The transform class in SPARX and EMAN2. J. Struct. Biol. 157, 250–261.
Frangakis, A. S., Bohm, J., Forster, F., Nickell, S., Nicastro, D., Typke, D., Hegerl, R. & Baumeister, W. (2002). Identification of macromolecular complexes in cryoelectron tomograms of phantom cells. Proc. Natl Acad. Sci. USA, 99, 14153–14158.
Frank, J., Radermacher, M., Penczek, P., Zhu, J., Li, Y., Ladjadj, M. & Leith, A. (1996). SPIDER and WEB: processing and visualization of images in 3D electron microscopy and related fields. J. Struct. Biol. 116, 190–199.
Hohn, M., Tang, G., Goodyear, G., Baldwin, P. R., Huang, Z., Penczek, P. A., Yang, C., Glaeser, R. M., Adams, P. D. & Ludtke, S. J. (2007). SPARX, a new environment for cryo-EM image processing. J. Struct. Biol. 157, 47–55.
Jones, T. A., Zou, J.-Y., Cowan, S. W. & Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Cryst. A47, 110–119.
Ming, D. M., Kong, Y. F., Lambert, M. A., Huang, Z. & Ma, J. P. (2002). How to describe the movement of protein without amino acids sequence and coordinates. Proc. Natl Acad. Sci. USA, 13, 8620–8625.
Mitra, K., Schaffitzel, C., Shaikh, T., Tama, F., Jenni, S., Brooks, C. L. III, Ban, N. & Frank, J. (2005). Structure of the E. coli protein-conducting channel bound to a translating ribosome. Nature (London), 438, 318–324.
Nissen, P., Hansen, J., Ban, N., Moore, P. B. & Steitz, T. A. (2000). The structural basis of ribosome activity in peptide bond synthesis. Science, 289, 920–930.
Parlett, B. N. (1980). A new look at the Lanczos-algorithm for solving symmetric-systems of linear-equations. Linear Algebr. Its Appl. 29, 323–346.
Penczek, P. A., Chao, Y., Frank, J. & Spahn, C. M. T. (2006). Estimation of variance in single particle reconstruction using the bootstrap technique. J. Struct. Biol. 154, 168–183.
Penczek, P. A., Frank, J. & Spahn, C. M. T. (2006a). A method of focused classification, based on the bootstrap 3-D variance analysis, and its application to EF-G-dependent translocation. J. Struct. Biol. 154, 184–194.
Penczek, P. A., Frank, J. & Spahn, C. M. T. (2006b). Conformational analysis of macromolecules analyzed by cryo-electron microscopy. In Microscopy and Microanalysis, edited by P. Kotula, M. Marko, J.-H. Scott et al., p. CD386. Chicago: Cambridge University Press.
Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C. & Ferrin, T. E. (2004). UCSF Chimera – a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612.
Spahn, C. M. T., Penczek, P., Leith, A. & Frank, J. (2000). A method for differentiating proteins from nucleic acids in intermediate-resolution density maps: cryo-electron microscopy defines the quaternary structure of the Escherichia coli 70S ribosome. Struct. Fold. Des. 8, 937–948.

to end of page
to top of page