International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by E. Arnold, D. M. Himmel and M. G. Rossmann

International Tables for Crystallography (2012). Vol. F, ch. 17.1, pp. 443-447
https://doi.org/10.1107/97809553602060000853

Chapter 17.1. Macromolecular model building and validation using Coot

P. Emsley,a* B. Lohkampb and K. Cowtanc

aDepartment of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, UK,bDepartment of Medical Biochemistry and Biophysics, Karolinska Institute, SE-171 77 Stockholm, Sweden, and cDepartment of Chemistry, University of York, Heslington, York YO10 5DD, UK
Correspondence e-mail:  paul.emsley@bioch.ox.ac.uk

The Coot software package is a tool for interactive macromolecular model building and validation. The software is designed to be easy to learn for novice users by ensuring that tools for common tasks are `discoverable' without consulting the documentation, while also providing enhanced usability for experts through customizable key bindings and extensive scripting interfaces. These considerations have resulted in substantial use throughout the crystallographic community. Coot displays electron-density maps and atomic models, and allows model manipulations such as idealization, real-space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers and Ramachandran idealization. Several of the most important tools are described together with the underlying methods employed.

17.1.1. Introduction

| top | pdf |

Macromolecular model building using X-ray data is to some extent an interactive task, involving the iterative application of various optimization algorithms with the evaluation of the model and interpretation of the electron density by the scientist. Coot is an interactive three-dimensional molecular-modelling program particularly designed for building and validation of protein structures by facilitating the steps of the process.

In recent years, the initial construction of the protein chain has often been carried out using automatic model-building tools such as ARP/wARP (Langer et al., 2008[link]), SOLVE/RESOLVE (Wang et al., 2004[link]) and more recently Buccaneer (Cowtan, 2006[link]). Therefore, relatively more time and emphasis has been placed on model validation than had previously been the case (Dauter, 2006[link]). The refinement and validation steps become increasingly important and also more time consuming as data sets become poorer. Coot aims to provide convenient access to as many of the tools required in the iterative refinement and validation of a macromolecular structure as possible, to facilitate those aspects of the process which cannot be performed automatically. The software is also designed with a goal of being easy to learn, in order to provide a low barrier for scientists who are beginning to work with X-ray data (Fig. 17.1.1.1[link]).

[Figure 17.1.1.1]

Figure 17.1.1.1 | top | pdf |

The Coot main window. The main display area shows a molecule and electron density. At the top of the window is a menu bar providing access to most of the tools. Commonly used model-manipulation tools are also available through the toolbar on the right. Below the menu bar is an area for user-definable buttons.

The principal tasks of the software are the visualization of macromolecular structures and data, the building of models into electron density, and the validation of existing models; these will be considered in the next three sections. The remaining sections of the chapter will deal with more technical aspects of the software, including interactions with external software, scripting and testing.

17.1.2. Model building

| top | pdf |

Initial building of protein structures from experimental phasing is usually accomplished by automated methods. The main focus in Coot, therefore, is the completion of initial models generated by either molecular replacement or automated model building. However, the following features are provided for cases where an initial model is not available.

17.1.2.1. Tools for general model building

| top | pdf |

17.1.2.1.1. Cα baton mode

| top | pdf |

Baton building, introduced by Jones [see for example Kleywegt & Jones (1994[link])] allows a protein main chain to be built by using a 3.8 Å `baton' to position successive α-carbons at the correct spacing. In Coot, this facility is reproduced and coupled with an electron-density ridge-trace skeleton (Greer, 1974[link]). First, a skeleton is calculated following the ridges of the electron density. Then, the user selects baton-building mode, which places an initial baton with one end at the current screen centre. Candidate positions for the next α-carbon are highlighted as crosses, selected from those points on the skeleton which lie at the correct distance from the start point. The user can cycle through a list of candidate positions using the `Try Another' button, or, alternatively, rotate the baton freely by use of the mouse. Additionally, the length of the baton can be changed to accommodate moderate inaccuracies in the α-carbon positions. Once a new position is accepted, the baton moves so that its base is on the new α-carbon. In this way, a chain may be traced manually at a rate of between 1 and 10 residues per minute.

Having placed the α-carbons, the rest of the main-chain atoms may be generated automatically using an implementation of the method of CALPHA (Esnouf, 1997[link]).

17.1.2.1.2. Find secondary structure

| top | pdf |

Protein secondary-structure elements, including α-helices and β-strands, can be located by their repeating electron-density features, which lead to high and low electron-density values in characteristic positions relative to the consecutive α-carbons. The `Find Secondary Structure' tool performs a six-dimensional rotational and translational search to find the likely positions of helical and strand elements within the electron density. This search has been highly optimized in order to achieve interactive performance for moderately sized structures, and, as a result, is less exhaustive than the corresponding tools employed in the automated model-building packages. However, it can provide a very rapid indication of map quality and a starting point for model building.

17.1.2.1.3. Place helix here and Place strand here

| top | pdf |

At low resolution it is sometimes possible to identify secondary-structure features in the electron density when the Cα positions are not obvious. If the user can identify that a helix or strand feature is present at a position in the electron-density map, Coot can place that feature automatically. This involves several stages. First, the orientation of the helix or strand is identified by using a rotational search to orient a cylindrical template to contain the maximum amount of density. Then, an ensemble of helix or strand fragments is placed in the density to determine the direction and registration of the helix or strand. The best fragment is extended to provide the longest element with a good match to the density.

17.1.2.1.4. Find ligands

| top | pdf |

The ability to fit ligands rapidly into electron-density maps is a frequently used technique, particularly useful to pharmaceutical companies (e.g. Williams et al., 2005[link]). The mechanism in Coot addresses a number of ligand-fitting scenarios and is a modified form of an algorithm described previously (Oldfield, 2001[link]). It is common practice in `fragment screening' to soak different ligand types into the same crystal (e.g. Blundell et al., 2002[link]). Using Coot, one can either specify a region in space or search a whole map for either a single or a number of different ligand types. In the `whole map' scenario, candidate ligand sites are found by cluster analysis of a residual map and the candidate ligands are fitted in turn to each site, the candidate orientations being generated by matching the eigenvectors of the ligand to that of the cluster. Each can­didate ligand is fitted and scored against the electron density. The best fitting orientation of the ligand candidates is chosen.

17.1.2.2. Rebuilding and refinement

| top | pdf |

The rebuilding and refinement tools are the primary means of model manipulation in Coot, and are all grouped together in the `Model/Fit/Refine' tool set. These tools may be accessed either through a toolbar which is usually docked on the right-hand side of the main window, or through a separate `Model/Fit/Refine' window containing buttons for each of the toolbar functions.

The core of the rebuilding and refinement tools is the real-space refinement (RSR) engine, which handles refinement of the atomic model against an electron-density map and regularization of the atomic model against geometric restraints. Refinement may be invoked both interactively when executed by the user, and non-interactively as part of some of the automated fitting tools. The refinement and regularization tools are supplemented by a range of additional tools aimed at assisting the fitting of protein chains. These features are discussed below.

17.1.2.3. Tools for moving existing atoms

| top | pdf |

17.1.2.3.1. Real-space refine zone

| top | pdf |

The real-space refine tool is the most frequently used tool in the refinement and rebuilding of atomic models, and is also incorporated as a final stage in a number of other tools, e.g. `Add Terminal Residue…'. In interactive mode, the user selects the RSR button and then two atoms bounding a range of monomers (amino acids or otherwise). All atoms in the selected range of monomers will be refined, including any flanking residues. Atoms of the flanking residues are marked as `fixed' but are required to be added to the refinement so that the geometry (e.g. peptide bonds, angles and planes) between fixed and moving parts is also optimized.

The selected atoms are refined to optimize a target consisting of two terms: the first being the Z-weighted sum of the electron-density values over all the atomic centres and the second being the stereochemical restraints on bond lengths, angles etc. The progress of the refinement is shown with a new set of atoms, displayed in white/pale colours. Once convergence is reached, the user is offered a dialogue box with a set of r.m.s. deviation scores and coloured `traffic lights' indicating the current geometry scores in each of the geometrical criteria. Additionally, a warning is issued if the refined range contains new cis-peptide bonds.

The user may adjust the model by selecting an atom with the mouse and dragging it. The other atoms will move with the dragged atoms. Alternatively, a single atom may be dragged by holding the Ctrl key. As soon as the atoms are released, the model will refine from the dragged position. Additional atoms may be selected to be fixed during the refinement. This selection is made before starting the refinement.

17.1.2.3.2. Sphere refinement

| top | pdf |

One problem with the refinement mode described above was that it only considers a linear range of residues. This can cause problems, particularly at lower resolutions with some side chains inappropriately falling into electron density of neighbouring residues. Additionally, a linear residue selection precluded the refinement of entities such as disulfide bonds. Therefore, a new residue-selection mechanism was introduced to address these issues: the so called `Sphere Refinement'. This mode selected residues that have atoms within a given radius of a specified position (typically within 4 Å of the centre of the screen). The residues are pairwise matched against the dictionary to find the appropriate restraint type.

17.1.2.3.3. Ramachandran restraints

| top | pdf |

At lower resolutions it is sometimes difficult to obtain an acceptable fit to the density and at the same time produce a Ramachandran plot of high quality (most residues in favourable regions and less than 1% outliers). If a Ramachandran score is added to the target function then the Ramachandran plot can be improved.

The analytical form for torsion gradients ([\partial\theta/\partial x_1] and so on) for each of the [x,y,z] positions of four atoms contributing to the torsion angle has been reported previously (Emsley & Cowtan, 2004[link]) (in the case of Ramachandran restraints, the [\theta] torsions will be [\varphi] and [\psi]). The extension of the torsion gradients for use in Ramachandran restraints is performed using a two-dimensional log Ramachdran plot. This is generated as tables (one for each type of Pro, Gly and non-Pro or Gly), containing the log of the Ramachandran distribution. Where the Ramachandran probability goes to zero, the log probability would become infinite, and so is replaced by values which become increasingly negative with distance from the nearest nonzero value. This provides a weak gradient in the disallowed regions towards the nearest allowed region.

17.1.2.3.4. Rotamer tools

| top | pdf |

Four tools are available for the fitting of protein side chains. For a side chain whose amino-acid type is already correctly assigned, the best rotamer may be chosen to fit the density either automatically or manually. If the automatic option is chosen, then the side-chain rotamer from the MolProbity library (Lovell et al., 2000[link]) that gives rise to the highest density values at the atomic centres is selected and refined. Otherwise, the user is presented with a list of rotamers for that side-chain type, sorted by frequency in the database. Rotamers are named according to the MolProbity system (Lovell et al., 2000[link]).

The other two options, `Mutate & Auto Fit' and `Simple Mutate', allow the amino-acid type to be assigned or changed. The `Mutate & Auto Fit Rotamer' option allows an amino-acid type to be selected from a list and then immediately performs the autofit rotamer operation as above. The `Simple Mutate' option changes the amino-acid type and builds the side-chain atoms in the most frequently occurring rotamer, without refinement.

17.1.2.3.5. Torsion editing

| top | pdf |

Side-chain (or ligand) torsion angles need to be defined prior to editing. Either the user manually defines the four atoms forming the torsion angle (`Torsion General') or the torsion angles are determined from a dictionary description and the user selects the one to edit. In the latter case, the bond around which the selected torsion angle is edited is visually marked.

17.1.2.4. Tools for adding atoms to the model

| top | pdf |

17.1.2.4.1. Find waters

| top | pdf |

The water-finding mechanism in Coot uses the same mechanism for detecting unmodelled density as is used in the ligand fitting. However, only those features below a certain volume (by default 4.2 Å3) are considered as candidate sites for water molecules. The centre of each feature is computed and then a distance check is made to the potential hydrogen-bond donors or acceptors in the protein molecule (or other waters). The distance criteria for an acceptable hydrogen-bond length is under user control. Additionally, a test for acceptable sphericity of the electron density is performed.

17.1.2.4.2. Add terminal residue

| top | pdf |

The MolProbity [\varphi], [\psi] distribution is used to generate a set of randomly selected [\varphi], [\psi] pairs. These random angles are used to generate the position of the N, Cα, O and C atoms of the next two residues. The conformation of these new atoms is then scored against the electron-density map and recorded. This procedure is carried out a number of times. The best-fitting conformation is offered as a candidate to the user (only the nearest of the two residues is kept).

17.1.2.5. Tools for handling noncrystallographic symmetry (NCS)

| top | pdf |

Noncrystallographic symmetry (NCS) can be exploited during the building of an atomic model, and also in the analysis of an existing model. In addition to some specific validation tools, Coot provides five tools to help with the building and visualization of NCS-related molecules:

  • (i) NCS ghost molecules. In order to visualize the similarities and differences between NCS-related molecules, a `ghost' copy of any or all NCS-related chains may be superimposed over a chain in the model. The `ghost' copies are displayed in thin lines and coloured differently as well as uniformly in order to distinguish them from the original. The superposition may be performed automatically by secondary-structure matching (Krissinel & Henrick, 2004[link]), or by least-squares superposition.

  • (ii) NCS-averaged maps. In addition to viewing NCS-related copies of the electron density, the average density of the related regions may be computed and viewed. In noisy maps, this may provide a clearer starting point for model building.

  • (iii) NCS rebuilding. When building an atomic model of a molecule with NCS, it is often more convenient to work on one chain and then replicate the changes made to every NCS-related copy of that chain (at least in the early stages of model building).

  • (iv) NCS `jumping': The view centre jumps to the next NCS-related peer chain and, at the same time, the NCS operators are taken into account so that the relative view remains the same. This provides a means for rapid visual comparison of NCS-related entities.

17.1.3. Validation

| top | pdf |

Coot incorporates a range of validation tools, including generic validation of the model against density, comprehensive geometrical validation tools for protein structures and additional validation tools specific to nucleotides. It also provides convenient interfaces to external validation tools, most notably the Mol­Probity suite (Davis et al., 2007[link]; Chapter 21.6[link] ), but also to the Refmac refinement software (Murshudov et al., 1997[link]) and dictionary (Vagin et al., 2004[link]).

Many of the internal validation tools provide a uniform interface in the form of colour-coded bar charts, for example the `Density fit analysis' chart (Fig. 17.1.3.1[link]). This window contains one bar chart for each chain in the structure. Each chart contains one bar for each residue in the chain. The height and colour of the bar both indicate the `goodness' of the monomer, with small green bars being good or expected/conventional and large red bars being bad or unconventional. The chart is active, i.e. on mousing over the bar, tooltips provide relevant statistics, and clicking on a bar changes the view in the main graphics window to centre on the selected residue. In this way, a rapid overview of model quality is obtained and problem areas can be easily accessed. In order to obtain a good structure for submission, the user may simply cycle though the validation options, correcting any problems found.

[Figure 17.1.3.1]

Figure 17.1.3.1 | top | pdf |

A typical validation graph. Bars represent individual residues in a chain, with an indication of quality for that residue given by both size and colour. The plot is interactive, i.e. clicking on a bar takes the user to the corresponding residue in the main window.

The available validation tools are described in more detail in the following sections.

17.1.3.1. Ramachandran plot

| top | pdf |

The Ramachandran plot tool (Fig. 17.1.3.2[link]) launches a new window in which the Ramachandran plot for the active molecule is displayed. A data point appears in this plot for each residue in the protein, with different symbols distinguishing Gly and Pro residues. The background of the plot shows frequency data for Ramachandran angles using the Richardsons' data (Lovell et al., 2003[link]).

[Figure 17.1.3.2]

Figure 17.1.3.2 | top | pdf |

Interactive Ramachandran plot. The axes show the [\varphi] and [\psi] angles. Preferred regions are coloured in pink, allowed regions in yellow, and the background in grey for disallowed regions. Standard residues are shown as dark-blue squares, Pro residues as light-blue squares and Gly residues as light-blue open triangles. Residues in the disallowed regions are coloured red. Below the graph, in the Coot dialogue box, a summary of the number (and percentages) of residues in the different regions of the Ramachandran plot is given.

The plot is interactive: clicking on a data point moves the view in the three-dimensional canvas to centre on the corresponding residue. Similarly, selecting an atom in the model highlights the corresponding data point. Moving the mouse over a data point corresponding to a Gly or Pro residue causes the Ramachandran frequency data for that residue type to be displayed.

17.1.3.2. Geometry analysis

| top | pdf |

The geometry (bonds, angles, planes) for each residue in the selected molecule is compared to dictionary values (typically provided by the mmCIF Refmac dictionary). Torsion angles are not analysed (there are other validation tools for torsion angles).

The statistics for the geometry graph are the average Z value for each of the geometry terms for that residue (peptide geometry distortion is shared between neighbouring residues). The tooltip on the geometry graph describes the geometry features giving rise to the highest Z value.

17.1.3.3. Peptide [\omega] analysis

| top | pdf |

This validation tool for peptide [\omega] torsion angles produces a graph marking the deviation from 180 degrees of the peptide [\omega] angle. The deviation is assigned to the residue to include the C and O atoms of the peptide link. Thus, peptide [\omega] angles of 90° are very bad. Optionally, [\omega] angles of 0° can be considered ideal (for the case of intentional cis-peptide bonds).

17.1.3.4. Rotamer analysis

| top | pdf |

The rotamer statistics are generated from the nearest conformation in the MolProbity rotamer probability distribution (Lovell et al., 2000[link]). The height of the bar is inversely proportional to the rotamer probability.

17.1.3.5. Density-fit analysis

| top | pdf |

The bars in the density-fit graphs are inversely proportional to the average Z-weighted electron density at the atom centres and to the grid sampling of the map (i.e. maps with coarser grid sampling will have lower bars than a more finely gridded map, all other things being equal). Accounting for the grid sampling allows lower-resolution maps to have a density-fit graph without many or most residues being marked as worrisome owing to their atoms being in generally low levels of density.

17.1.4. Scripting

| top | pdf |

Many internal functions in Coot are accessible via an automically generated (SWIG) interface to the scripting languages Python (http://www.python.org ) and Guile (a scheme interpreter; Kelsey et al., 1998[link]; http://www.gnu.org/software/guile/guile.html ). Some of Coot's graphics widgets are available for scripting via the same interface.

17.1.5. Discussion

| top | pdf |

Coot combines a range of modern methods in macromolecular model building and validation with a modern graphical user interface, following current design conventions to achieve ease of use, productivity, aesthetics and forgiveness. This is an ongoing process and although improvements can still be made, we believe that Coot provides a shallow learning curve combined with a high level of crystallographic awareness to aid the novice and non-specialist user, while providing a powerful and configurable tool set for the expert.

References

Blundell, T. L., Jhoti, H. & Abell, C. (2002). High-throughput crystallography for lead discovery in drug design. Nat. Rev. Drug Discov. 1, 45–54.
Cowtan, K. (2006). The Buccaneer software for automated model building. 1. Tracing protein chains. Acta Cryst. D62, 1002–1011.
Dauter, Z. (2006). Current state and prospects of macromolecular crystallography. Acta Cryst. D62, 1–11.
Davis, I. W., Leaver-Fay, A., Chen, V. B., Block, J. N., Kapral, G. J., Wang, X., Murray, L. W., Arendall, W. B., Snoeyink, J., Richardson, J. S. & Richardson, D. C. (2007). MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 35, W375–W383.
Emsley, P. & Cowtan, K. (2004). Coot: model-building tools for molecular graphics. Acta Cryst. D60, 2126–2132.
Esnouf, R. M. (1997). Polyalanine reconstruction from Cα positions using the program CALPHA can aid initial phasing of data by molecular replacement procedures. Acta Cryst. D53, 665–672.
Greer, J. (1974). Three-dimensional pattern recognition: An approach to automated interpretation of electron density maps of proteins. J. Mol. Biol. 82, 279–301.
Kelsey, R., Clinger, W. & Rees, J. (1998). Higher-Order Symb. Comput. 11, 7–105.
Kleywegt, G. J. & Jones, T. A. (1994). In From First Map to Final Model. Proceedings of the CCP4 Study Weekend, edited by S. Bailey, R. Hubbard & D. Waller, pp. 59–66. SERC Daresbury Laboratory.
Krissinel, E. & Henrick, K. (2004). Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Cryst. D60, 2256–2268.
Langer, G., Cohen, S. X., Lamzin, V. S. & Perrakis, A. (2008). Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7. Nat. Protoc. 3, 1171–1179.
Lovell, S. C., Davis, I. W., Arendall, W. B. III, de Bakker, P. I. W., Word, J. M., Prisant, M. G., Richardson, J. S. & Richardson, D. C. (2003). Structure validation by Cα geometry: ϕ, ψ and Cβ deviation. Proteins, 50, 437–450.
Lovell, S. C., Word, J. M., Richardson, J. S. & Richardson, D. C. (2000). The penultimate rotamer library. Proteins, 40, 389–408.
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Refinement of macromolecular structures by the maximum-likelihood method. Acta Cryst. D53, 240–255.
Oldfield, T. J. (2001). X-LIGAND: an application for the automated addition of flexible ligands into electron density. Acta Cryst. D57, 696–705.
Vagin, A. A., Steiner, R. A., Lebedev, A. A., Potterton, L., McNicholas, S., Long, F. & Murshudov, G. N. (2004). REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use. Acta Cryst. D60, 2184–2195.
Wang, J. W., Chen, J. R., Gu, Y. X., Zheng, C. D., Jiang, F., Fan, H. F., Terwilliger, T. C. & Hao, Q. (2004). SAD phasing by combination of direct methods with the SOLVE/RESOLVE procedure. Acta Cryst. D60, 1244–1253.
Williams, S. P., Kuyper, L. F. & Pearce, K. H. (2005). Recent applications of protein crystallography and structure-guided drug design. Curr. Opin. Chem. Biol. 9, 371–380.








































to end of page
to top of page