International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by E. Arnold, D. M. Himmel and M. G. Rossmann

International Tables for Crystallography (2012). Vol. F, ch. 19.10, pp. 629-632
https://doi.org/10.1107/97809553602060000876

Chapter 19.10. Single-particle reconstruction with EMAN

S. Ludtkea*

aNational Center for Macromolecular Imaging, Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
Correspondence e-mail: sludtke@bcm.edu

Single-particle reconstruction is a technique for determining the three-dimensional structure of large sets of identical nanoscale objects, typically proteins or macromolecular assemblies, using electron cryomicroscopy (cryo-EM). This technique has now demonstrated resolutions of 3–5 Å for a number of targets and can routinely achieve subnanometre resolution. EMAN is a scientific image-processing software suite first developed to make single-particle reconstructions more reproducible and less labour-intensive. While it has since expanded to include many algorithms for other types of image processing, providing robust single-particle reconstructions with minimal effort remains one of its primary focuses. It includes a complete workflow infrastructure for this and other tasks in cryo-EM image analysis and structural biology.

19.10.1. Introduction

| top | pdf |

Single-particle reconstruction is a technique for determining the three-dimensional structure of identical nanoscale objects without requiring crystallization (Frank, 2006[link]). Typical targets are large proteins or macromolecular assemblies, with ∼200 kDa regarded as the lower size limit for this technique due to high image noise levels. Typical non-viral targets are in the size range from 500 kDa to 5 MDa, with icosahedral viruses extending to hundreds of megadaltons. Several examples of structures being reconstructed to ∼4 Å resolution using this technique have been published (Ludtke et al., 2008[link]; Jiang et al., 2008[link]; Zhang et al., 2008[link]; Yu et al., 2008[link]) and subnanometre resolution is frequently achieved, but in negative stain, or with unusually difficult specimens, resolutions may still be limited to 12–30 Å.

19.10.2. Overview of EMAN

| top | pdf |

EMAN is a suite of scientific image-processing tools with a particular focus on cryogenic electron microscopy (cryo-EM) and single-particle reconstruction. The first stable version was released in 1999 and it has undergone continuous development since that time. Since roughly 2005, development has focused on EMAN2, with a completely redesigned modular image-processing library, a workflow system, and a new openGL-based graphical user interface (GUI). EMAN2 supports virtually all image file formats used in the cryo-EM community, as well as various conventions for particle-orientation specification. The image-processing library is completely modular, so new algorithms can be made available throughout the system without any modification of end-user programs. Metadata, which refer to both experimental and derived information describing the image data, are archived using an embedded database system. These metadata can be harvested to produce information for manuscripts or for deposition to centralized databases such as the EMDB (http://www.emdatabank.org ), operated by the PDBe (Europe) and RCSB PDB (USA).

EMAN2 has a tiered structure. The GUI interface and workflow represent the top of the hierarchy. Next are individual command-line programs, providing the user with a greater level of control and a wide range of utility functions. Below this is the Python (http://www.python.org ) layer, for interactive use or high-level programming. Finally, we have the C++ core image-processing library, which handles all computationally intensive image processing. Full documentation is provided through the EMAN wiki (http://blake.bcm.tmc.edu/eman/eman2 ).

19.10.2.1. GUI layer and workflow

| top | pdf |

The workflow interface (Fig. 19.10.2.1[link]) is designed to guide the end user through entire processes, while storing a complete record of all processing completed during any given task. While this interface is gradually being expanded with new tasks, such as tomographic particle averaging, for the purposes of this discussion we will focus on the single-particle reconstruction task, which is broken down into a number of discrete subtasks as discussed in detail in Section 19.10.3[link]. The workflow is designed so that users wishing to perform only a subset of any task can easily import data into any subtask, proceed through additional subtasks, then export the results at any point. There are future plans to integrate other cryoEM software into the workflow interface to enable users to try alternative algorithms for specific subtasks. The workflow is accessed using either e2workflow.py or e2desktop.py.

[Figure 19.10.2.1]

Figure 19.10.2.1 | top | pdf |

The overall workflow interface, and the dialogue for the three-dimensional refinement subtask.

The GUI interface (Fig. 19.10.2.2[link]) includes a number of visualization modules for displaying individual two-dimensional images, stacks of two-dimensional images, three-dimensional volumetric models, two-dimensional plots, three-dimensional plots and symmetry information. These modules are used by the interactive file/database browser available in the workflow or via the program e2display.py, but may also be used directly from the Python interface.

[Figure 19.10.2.2]

Figure 19.10.2.2 | top | pdf |

A representation of the Euler GUI tool. The three-dimensional reconstruction is shown (lower left), along with a projection and class average for a particular orientation identified in the asymmetric triangle on the right. Each cylinder in the asymmetric triangle represents a projection direction, with its height representing the number of particles found in that orientation.

19.10.2.2. Command-line programs

| top | pdf |

EMAN2 (Tang et al., 2007[link]) currently contains ∼60 command-line programs for specific tasks, plus an additional ∼60 programs as part of the related SPARX project (Hohn et al., 2007[link]), included with EMAN2 releases. High-level programs such as e2refine2d.py, e2initialmodel.py, e2eotest.py and e2refine.py are essentially wrapper scripts which call the lower-level Python scripts to complete specific tasks.

In addition, there are a number of general-purpose utility programs, such as e2proc2d.py for two-dimensional image processing, e2proc3d.py for three-dimensional image processing and e2bdb.py for access to the embedded database. Both e2proc2d.py and e2proc3d.py take similar arguments and permit application of any of the ∼170 modular image-processing algorithms in the core library via the --process option. These include filters, masks, mathematical operations and a range of other processing algorithms. The e2help.py program provides detailed documentation for all of the modular algorithms. All EMAN2 programs are able to read automatically any of the supported file formats, and will write to any supported file format based on file extension or explicit specification of output format. For example, the command to convert a three-dimensional volume from HDF format to MRC format is simply e2proc3d.py input.hdf output.mrc.

19.10.2.3. Python wrapper

| top | pdf |

Python is an easy-to-learn scripting language in wide use in both scientific and non-scientific disciplines. It provides both an interactive prompt as well as a programming interface, and is easily extensible using higher-performance languages such as C++. EMAN2 was designed such that all command-line programs and the entire GUI interface are written in Python. Thus, any of the distributed programs in EMAN2 can be modified by experienced end users without a full EMAN2 compilation environment. In addition, the program e2.py offers an iPython-based (http://ipython.scipy.org ) interactive prompt which gives full access to all of the library functions and GUI tools of EMAN2. An introduction to this interface can be found at http://blake.bcm.edu/emanwiki/Eman2ProgQuickstart .

19.10.2.4. C++

| top | pdf |

The C++ interface contains a modular system, so algorithms of various types can be trivially made available to the entire workflow system. The C++ and Python interfaces were intentionally designed to be as similar to each other as possible, so prototypes of new algorithms can be developed in Python and then later ported to C++. See http://blake.bcm.edu/emanwiki/Eman2CProgQuickstart for an introduction.

19.10.2.5. Cross-platform support

| top | pdf |

All of the major computing platforms are supported, including Linux workstations and clusters, Mac OSX and Windows. Our primary development platforms are Linux and OSX. While we are committed to full support of EMAN2 on Windows, we have encountered idiosyncratic behaviour on some specific machines running Vista which have defied explanation, but we continue to strive for full support.

19.10.2.6. Parallel processing

| top | pdf |

The three-dimensional refinement process can be extremely computationally intensive. While a refinement of a small ∼1 MDa particle with some symmetry to 15 Å resolution may be accomplished on a single workstation in a few hours, a project like an asymmetric reconstruction to 4 Å resolution could easily require hundreds of thousands of CPU hours. Updated details of the modular parallelism strategy of EMAN and its GPGPU (general purpose graphics processing unit) methodology can be found in the wiki at http://blake.bcm.edu/emanwiki/EMAN2/Parallel .

19.10.2.7. File formats and other conventions

| top | pdf |

EMAN2 supports all cryo-EM file formats for which specifications were available, in addition to its embedded database storage mechanism. While any EMAN2 program can read and write images in any supported format, we have adopted HDF5 as an interchange format and the internal database BDB for use during workflow operations. These two permit arbitrary metadata to be associated with each image, unlike the standard cryo-EM formats. Table 19.10.2.1[link] contains a list of the currently supported formats.

Table 19.10.2.1| top | pdf |
Listing of EMAN2 supported file formats and whether each has read (R) and/or write (W) support

BDB refers to the EMAN2 embedded database. DM2/3 are Gatan Digital Micrograph formats. LST files are text files from EMAN1.

FormatR or W supportFormatR or W support
BDB R/W TIFF R/W
HDF5 R/W PGM R/W
MRC R/W PNG R/W
IMAGIC R/W JPEG W
SPIDER R/W LST R/W
PIF R/W AMIRA R/W
DM3 R XPLOR W
DM2 R VTK W
EM R/W FITS R/W
ICOS R/W SAL R

The other primary convention of concern to single-particle reconstruction is three-dimensional orientation specification. While EMAN2 has its own convention, it can convert to and from the most common conventions in use in the cryo-EM community, including MRC, SPIDER, IMAGIC, quaternions and spin axis. EMAN's own convention uses Z-X-Z′ Euler angles named az, alt and phi, respectively.

19.10.3. Single-particle reconstruction

| top | pdf |

The single-particle reconstruction process is broken down into a sequence of subtasks in the EMAN2 workflow. Each subtask is documented in the GUI, with a block of text and mouse-over popups. While the workflow is the method of choice for performing reconstructions, all of the individual subtasks can also be completed using direct command-line programs. The workflow is launched with the e2workflow.py program. For each stage, the alternative command-line program is also shown in parentheses.

19.10.3.1. Particle selection (e2boxer.py)

| top | pdf |

Locating images of individual particles in the micrograph/charge-coupled device (CCD) frame is arguably the most critical and labour-intensive step in single-particle processing. Owing to the high noise levels in typical cryo-EM images, combined with various types of contamination, correctly identifying all relevant particles within a micrograph is challenging (Zhu et al., 2004[link]). Despite many years of effort, no reliable and generally applicable algorithms for fully automated particle picking exist. The present approach of EMAN2 is to streamline the process of semi-automated particle picking. Multiple CCD frames/micrographs can be opened simultaneously and are handled by the user as a group. At the time of writing, there is a single autopicking algorithm available, based on Woolford et al. (2007[link]), but there are plans to expand this in the future.

19.10.3.2. Contrast transfer function/image evaluation (e2ctf.py)

| top | pdf |

EMAN1 contained a very accurate contrast transfer function (CTF) correction methodology, which was also very labour-intensive. While the mathematical basis for correction remains effectively unchanged, the correction methodology is now entirely automatic. The only substantial change to the mathematical formulation is that the spectroscopic profile of the background noise, which was originally characterized using four empirical parameters, is now specified as a data-derived non-parametric curve. In addition, this methodology permits the computation of a one-dimensional structure factor from sets of images without user intervention, a process which was quite challenging in EMAN1 (Ludtke et al., 1999[link]). The automated process is supplemented by a fully featured GUI for manual image assessment.

19.10.3.3. Grouping particles into sets

| top | pdf |

Once images have been preprocessed, the user is given another, optional, opportunity to examine the images of individual particles, now in Wiener-filtered form, to identify any `bad' particles which should be eliminated from further processing. In addition, several values are provided to the user which permit assessment of particles on a per-micrograph basis, such as the integrated signal-to-noise ratio, defocus and B factor. Particles from user-selected micrographs are then combined into a set for further processing.

19.10.3.4. Reference-free two-dimensional classification (e2refine2d.py)

| top | pdf |

This fully automated principal-component-analysis-based (Frank, 2006[link]) process produces two-dimensional averages representative of the various particle views present in the raw images. As with the workflow as a whole, while the user can provide a large number of different options for this subtask, sensible default values are provided, which should produce good results in most cases. The two-dimensional classification process is quite similar to the corresponding process in EMAN1 (Chen et al., 2006[link]), with subtle improvements to increase speed and accuracy.

The two-dimensional classification process generates class averages with improved signal-to-noise, characteristic of the different orientations and dynamic states of the particles. These can be used to help identify the particle's symmetry and any problems with preferred orientation or particle flexibility/heterogeneity.

19.10.3.5. Initial model generation (e2initialmodel.py)

| top | pdf |

There are a variety of opinions in the single-particle reconstruction community about how initial models should be generated. Briefly, the most widely used methods are: common lines with reference-free class averages (van Heel, 1987[link]), random conical tilt (Radermacher et al., 1987[link]), orthogonal tilt (Leschziner & Nogales, 2006[link]) and single-particle tomography with three-dimensional averaging (Walz et al., 1997[link]). While not the primary method we promote for traditional single-particle processing, the EMAN2 workflow includes a task for three-dimensional alignment and averaging of tomographic subvolumes.

The primary method provided to generate initial models in EMAN2 is based on the robustness of the iterative refinement strategy used for the final high-resolution refinement (Section 19.10.3.6[link]). For any given three-dimensional structure, there are a small number of stable structures which can result from refinement, regardless of starting model. The correct model represents the global minimum in the energy space defined by requiring that projections of the reconstruction match the particle images.

To make use of this concept in EMAN2, we downsample the two-dimensional class averages (Section 19.10.3.4[link]) and make a large number of completely random starting models, then quickly iteratively refine each. While the speedups used in this process make it more susceptible to local minima than the full high-resolution refinement method, it will generally still produce the correct model some fraction of the time, without having to resort to additional experiments. The resulting models are easily assessed by comparing the original class averages with projections of the model, so the best answer can be readily identified. The primary risk associated with this process is with respect to particles with structural heterogeneity in solution or those with strongly preferred orientations, in which case the standard refinement algorithm is not robust. In such cases, the tomographic method can provide further insights.

19.10.3.6. Refinement (e2refine.py)

| top | pdf |

Once particles have been prepared and an initial model has been produced, the next step is three-dimensional refinement. This process is largely unchanged from EMAN1 (Ludtke et al., 1999[link]), apart from minor improvements designed to speed up the process and provide additional flexibility. As with two-dimensional refinement, there are a large number of options for the user to specify, but, where possible, sensible defaults are provided. Documentation for the various parameters is provided through the workflow interface.

19.10.4. Evaluating the reconstruction

| top | pdf |

Evaluating the final three-dimensional reconstruction remains one of the most problematic aspects of single-particle processing from a quantitative standpoint, and there have been numerous debates in the community over standards for model assessment. As the field is now beginning to achieve resolutions where protein side chains can be visualized, reliable new assessments based on methods from X-ray crystallography are emerging, but at lower resolutions robust assessment remains elusive.

19.10.4.1. Model accuracy

| top | pdf |

Before assessing the resolution of a reconstruction, the fundamental question of whether the model even qualitatively represents the original data must be addressed. Regardless of the reconstruction methodology used, the fundamental questions to be asked are whether computed projections of the reconstruction match both the raw particles and class averages, and whether all of the particle views are represented by the three-dimensional model in some orientation. The `Eulers' option in the workflow provides a number of tools for making such model assessments both qualitatively and quantitatively.

19.10.4.2. Measures of resolution and resolvability

| top | pdf |

Resolution in single-particle processing is distinct from the related concept of resolvability. Resolvability is a measure of the level of detail visible in a model, in terms of the shortest separation distance over which two objects can be identified as being distinct. However, in structural biology, `resolution' is a statement of the spatial frequency at which the noise level exceeds a threshold. It is important to recognize that a model with 4 Å resolution could be low-pass filtered to a resolvability of only 20 Å, and yet cryo-EM resolution measures would still (properly) show it to have 4 Å resolution. To help bring the resolvability in line with the resolution, it is typical to apply an appropriate filter to the three-dimensional reconstruction such that the resolvability is in reasonable agreement with the resolution, but there remains no consensus in the community over the optimal filter and/or filtration level appropriate for this task. EMAN2 provides a signal-to-noise-ratio- and structure-factor-based technique, as well as two mechanisms for assessing the resolution of a reconstruction.

19.10.4.3. Model/noise bias

| top | pdf |

The final issue to consider in a single-particle reconstruction is the well known model/noise bias problem (Stewart & Grigorieff, 2004[link]). With a traditional iterative refinement strategy, and very high noise levels in the raw particle images, it is possible to produce a reconstruction including features derived from the initial model or from systematic algorithmic artifacts which are not represented in the raw data. There are relatively few techniques for assessing this sort of bias, and each single-particle reconstruction package handles this issue differently. In EMAN, the use of iterative class averaging during the iterative refinement process permits this bias to be greatly reduced or eliminated, when used as suggested. Ensuring that the reference-free class averages agree well with projections of the reconstruction can at least place some limits on the extent of such artifacts.

Acknowledgements

The development of EMAN2 is funded by NIH grant No. R01GM080139. I would like to thank David Woolford for his work on the figures, and Ben Bammes and Jesus Montoya for their comments.

References

Chen, D. H., Song, J. L., Chuang, D. T., Chiu, W. & Ludtke, S. J. (2006). An expanded conformation of single-ring GroEL-GroES complex encapsulates an 86 kDa substrate. Structure, 14, 1711–1722.
Frank, J. (2006). Three-dimensional electron microscopy of macromol­ecular assemblies: visualization of biological molecules in their native state. In Multivariate Data Analysis and Classification of Images. Oxford University Press.
Heel, M. van (1987). Angular reconstitution: a posteriori assignment of projection directions for 3D reconstruction. Ultramicroscopy, 21, 111–123.
Hohn, M., Tang, G., Goodyear, G., Baldwin, P. R., Huang, Z., Penczek, P. A., Yang, C., Glaeser, R. M., Adams, P. D. & Ludtke, S. J. (2007). SPARX, a new environment for cryo-EM image processing. J. Struct. Biol. 157, 47–55.
Jiang, W., Baker, M. L., Jakana, J., Weigele, P. R., King, J. & Chiu, W. (2008). Backbone structure of the infectious epsilon15 virus capsid revealed by electron cryomicroscopy. Nature (London), 451, 1130–1134.
Leschziner, A. E. & Nogales, E. (2006). The orthogonal tilt reconstruction method: an approach to generating single-class volumes with no missing cone for ab initio reconstruction of asymmetric particles. J. Struct. Biol. 153, 284–299.
Ludtke, S. J., Baker, M. L., Chen, D. H., Song, J. L., Chuang, D. T. & Chiu, W. (2008). De novo backbone trace of GroEL from single particle electron cryomicroscopy. Structure, 16, 441–448.
Ludtke, S. J., Baldwin, P. R. & Chiu, W. (1999). EMAN: semiautomated software for high-resolution single-particle reconstructions. J. Struct. Biol. 128, 82–97.
Radermacher, M., Wagenknecht, T., Verschoor, A. & Frank, J. (1987). Three-dimensional reconstruction from a single-exposure, random conical tilt series applied to the 50S ribosomal subunit of Escherichia coli. J. Microsc. 146, 113–136.
Stewart, A. & Grigorieff, N. (2004). Noise bias in the refinement of structures derived from single particles. Ultramicroscopy, 102, 67–84.
Tang, G., Peng, L., Baldwin, P. R., Mann, D. S., Jiang, W., Rees, I. & Ludtke, S. J. (2007). EMAN2: an extensible image processing suite for electron microscopy. J. Struct. Biol. 157, 38–46.
Walz, J., Typke, D., Nitsch, M., Koster, A. J., Hegerl, R. & Baumeister, W. (1997). Electron tomography of single ice-embedded macromolecules: three-dimensional alignment and classification J. Struct. Biol. 120, 387–395.
Woolford, D., Ericksson, G., Rothnagel, R., Muller, D., Landsberg, M. J., Pantelic, R. S., McDowall, A., Pailthorpe, B., Young, P. R., Hankamer, B. & Banks, J. (2007). SwarmPS: rapid, semi-automated single particle selection software. J. Struct. Biol. 157, 174–188.
Yu, X., Jin, L. & Zhou, Z. H. (2008). 3.88 Å structure of cytoplasmic polyhedrosis virus by cryo-electron microscopy. Nature (London), 453, 415–419.
Zhang, X., Settembre, E., Xu, C., Dormitzer, P. R., Bellamy, R., Harrison, S. C. & Grigorieff, N. (2008). Near-atomic resolution using electron cryomicroscopy and single-particle reconstruction. Proc. Natl Acad. Sci. USA, 105, 1867–1872.
Zhu, Y., Carragher, B., Glaeser, R. M., Fellmann, D., Bajaj, C., Bern, M., Mouche, F., de Haas, F., Hall, R. J., Kriegman, D. J., Ludtke, S. J., Mallick, S. P., Penczek, P. A., Roseman, A. M., Sigworth, F. J., Volkmann, N. & Potter, C. S. (2004). Automatic particle selection: results of a comparative study. J. Struct. Biol. 145, 3–14.








































to end of page
to top of page