International
Tables for Crystallography Volume F Crystallography of biological macromolecules Edited by E. Arnold, D. M. Himmel and M. G. Rossmann © International Union of Crystallography 2012 
International Tables for Crystallography (2012). Vol. F, ch. 18.6, pp. 512519
https://doi.org/10.1107/97809553602060000860 Chapter 18.6. CNS, a program system for structuredetermination and refinement
A. T. Brunger,^{a}^{*} P. D. Adams,^{b} W. L. DeLano,^{c} P. Gros,^{d} R. W. GrosseKunstleve,^{b} J.S. Jiang,^{e} N. S. Pannu,^{f} R. J. Read,^{g} L. M. Rice^{h} and T. Simonson^{i}
^{a}Howard Hughes Medical Institute, and Departments of Molecular and Cellular Physiology, Neurology and Neurological Sciences, and Stanford Synchrotron Radiation Laboratory (SSRL), Stanford University, 1201 Welch Road, MSLS P210, Stanford, CA 94305, USA,^{b}The Howard Hughes Medical Institute and Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06511, USA,^{c}Graduate Group in Biophysics, Box 0448, University of California, San Francisco, CA 94143, USA,^{d}Crystal and Structural Chemistry, Bijvoet Center for Biomolecular Research, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands,^{e}Protein Data Bank, Biology Department, Brookhaven National Laboratory, Upton, NY 11973–5000, USA,^{f}Department of Mathematical Sciences, University of Alberta, Edmonton, Alberta, Canada T6G 2G1,^{g}Department of Haematology, University of Cambridge, Wellcome Trust Centre for Molecular Mechanisms in Disease, CIMR, Wellcome Trust/MRC Building, Hills Road, Cambridge CB2 2XY, England,^{h}Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06511, USA, and ^{i}Laboratoire de Biologie Structurale (CNRS), IGBMC, 1 rue Laurent Fries, 67404 Illkirch (CU de Strasbourg), France The Crystallography & NMR System (CNS) is described. 
We have developed a new and advanced software system, the Crystallography & NMR System (CNS), for crystallographic and NMR structure determination (Brünger et al., 1998). The goals of CNS are: (1) to create a flexible computational framework for exploration of new approaches to structure determination; (2) to provide tools for structure solution of difficult or large structures; (3) to develop models for analysing structural and dynamical properties of macromolecules; and (4) to integrate all sources of information into all stages of the structuredetermination process.
To meet these goals, algorithms were moved from the source code into a symbolic structuredetermination language which represents a new concept in computational crystallography. The highlevel CNS computing language allows definition of symbolic target functions, data structures, procedures and modules. The CNS program acts as an interpreter for the highlevel CNS language and includes hardwired functions for efficient processing of computingintensive tasks. Methods and algorithms are therefore more clearly defined and easier to adapt to new and challenging problems. The result is a multilevel system which provides maximum flexibility to the user (Fig. 18.6.1.1). The CNS language provides a common framework for nearly all computational procedures of structure determination. A comprehensive set of crystallographic procedures for phasing, density modification and refinement has been implemented in this language. Taskoriented input files written in the CNS language, which can also be accessed through an HTML graphical interface (Graham, 1995), are available to carry out these procedures.
One of the key features of the CNS language is symbolic data structure manipulation, for example, which is equivalent to the following mathematical expression for all acentric indices h, where [`fp' in equation (18.6.2.1)] is the `native' structurefactor array, [`fph' in equation (18.6.2.1)] is the derivative structurefactor array, [`sph' in equation (18.6.2.1)] is the corresponding experimental σ, v is the expectation value for the lack of closure (including lack of isomorphism and errors in the heavyatom model), and [`fh' in equation (18.6.2.1)] is the calculated heavyatom structurefactor array. This expression computes the coefficient of the phase probability distribution for single isomorphous replacement described by Hendrickson & Lattman (1970) and Blundell & Johnson (1976).
The expression in equation (18.6.2.1) is computed for the specified subset of reflections `(acentric)'. This expression means that only the selected (in this case all acentric) reflections are used. More sophisticated selections are possible, e.g. selects all reflections with Bragg spacing, d, greater than 3 Å for which both native (fp) and derivative (fph) amplitudes are greater than two times their corresponding σ values (`sh' and `sph', respectively). Extensive use of this structurefactor selection facility is made for crossvalidating statistical properties, such as R values (Brünger, 1992), values (Kleywegt & Brünger, 1996; Read, 1997) and maximumlikelihood functions (Pannu & Read, 1996; Adams et al., 1997).
Similar operations exist for electrondensity maps, e.g. is an example of a truncation operation: all map values less than 0.1 are set to 0. Atoms can be selected based on a number of atomic properties and descriptors, e.g. sets the B factors of all polypeptide backbone atoms of residues 1 through 40 to 10 Å^{2}.
Operations exist between data structures, e.g. real and reciprocalspace arrays, and atom properties. For example, Fourier transformations between real and reciprocal space can be accomplished by the following CNS commands: which computes a map on a 1 Å grid by Fourier transformation of the array for all acentric reflections.
Atoms can be associated with calculated structure factors, e.g. This statement will associate the reciprocalspace array `f_cal' with the atoms belonging to residues 1 through 50. These structurefactor associations are used in the symbolic target functions described below.
There are no predefined reciprocal or realspace arrays in CNS. Dynamic memory allocation allows one to carry out operations on arbitrarily large data sets with many individual entries (e.g. derivative diffraction data) without the need to recompile the source code. The various reciprocalspace structurefactor arrays must therefore be declared and their type specified prior to invoking them. For example, a reciprocalspace array with real values, such as observed amplitudes, is declared by Reciprocalspace arrays can be grouped. For example, Hendrickson & Lattman (1970) coefficients are represented as a group of four reciprocalspace structurefactor arrays, where `pa', `pb', `pc' and `pd' refer to the individual arrays. This group statement indicates to CNS that the specified arrays need to be transformed together when reflection indices are changed, e.g. during expansion of the diffraction data to space group P1.
The CNS language supports two types of data elements which may be used to store and retrieve information. Symbols are typed variables, such as numbers, character strings of restricted length and logicals. Parameters are untyped data elements of arbitrary length that may contain collections of CNS commands, numbers, strings or symbols.
Symbols are denoted by a dollar sign ($), and parameters by an ampersand (&). Symbols and parameters may contain a single data element, or they may be a compound data structure of arbitrary complexity. The hierarchy of these data structures is denoted using a period (.). Figs. 18.6.3.1(a) and (b) demonstrate how crystallattice information can be stored in compound symbols and parameters, respectively. The information stored in symbols or parameters can be retrieved by simply referring to them within a CNS command: the symbol or parameter name is substituted by its content. Symbol substitution of portions of the compound names (e.g. `&crystal_lattice.unit_cell.$para') allows one to carry out conditional and iterative operations on data structures, such as matrix multiplication.
The CNS language contains a number of statistical operations, such as binwise averages and summations. The resolution bins are defined by a central facility in CNS.
Fig. 18.6.4.1 shows how , and D (Read, 1986, 1990) are computed from the observed structure factors (`fobs') and the calculated model structure factors (`fcalc') using the CNS statistical operations. The first five operations are performed for the reflections in the test set, while the last three operations expand the results to all reflections. The `norm' function computes normalized structurefactor amplitudes for the specified arguments. The `sigacv' function evaluates from the normalized structure factors. The `save' function computes the statistical average where w is 1 and 2 for centric and acentric reflections, respectively, and is the statistical weight. The averages are computed binwise, and the result for a particular bin is stored in all selected reflections belonging to the bin.
One of the key innovative features of CNS is the ability to symbolically define target functions and their first derivatives for crystallographic searches and refinement. This allows one conveniently to implement new crystallographic methodologies as they are being developed.
The power of symbolic target functions is illustrated by two examples. In the first example, a target function is defined for simultaneous heavyatom parameter refinement of three derivatives. The sites for each of the three derivatives can be disjoint or identical, depending on the particular situation. For simplicity, the Blow & Crick (1959) approach is used, although maximumlikelihood targets are also possible (see below). The heavyatom sites are refined against the target
, and are complex structure factors corresponding to the three sets of heavyatom sites, represents the structure factors of the native crystal, , and are the structurefactor amplitudes of the derivatives, and , and are the variances of the three lackofclosure expressions. The corresponding target expression and its first derivatives with respect to the calculated structure factors are shown in Fig. 18.6.5.1(a). The derivatives of the target function with respect to each of the three associated structurefactor arrays are specified with the `dtarget' expressions. The `tselection' statement specifies the selected subset of reflections to be used in the target function (e.g. excluding outliers), and the `cvselection' statement specifies a subset of reflections to be used for crossvalidation (Brünger, 1992) (i.e. the subset is not used during refinement but only as a monitor for the progress of refinement).
The second example is the refinement of a perfectly twinned crystal with overlapping reflections from two independent crystal lattices. Refinement of the model is carried out against the residual The symbolic definition of this target is shown in Fig. 18.6.5.1(b). The twinning operation itself is imposed as a relationship between the two sets of selected atoms (not shown). This example assumes that the two calculated structurefactor arrays (`fcalc1' and `fcalc2') that correspond to the two lattices have been appropriately scaled with respect to the observed structure factors, and the twinning fractions have been incorporated into the scale factors. However, a more sophisticated target function could be defined which incorporates scaling.
A major advantage of the symbolic definition of the target function and its derivatives is that any arbitrary function of structurefactor arrays can be used. This means that the scope of possible targets is not limited to leastsquares targets. Symbolic definition of numerical integration over unknown variables (such as phase angles) is also possible. Thus, even complicated maximumlikelihood target functions (Bricogne, 1984; Otwinowski, 1991; Pannu & Read, 1996; Pannu et al., 1998) can be defined using the CNS language. This is particularly valuable at the prototype stage. For greater efficiency, the standard maximumlikelihood targets are provided through CNS source code which can be accessed as functions in the CNS language. For example, the maximumlikelihood target function MLF (Pannu & Read, 1996) and its derivative with respect to the calculated structure factors are defined as where `mlf( )' and `dmlf( )' refer to internal maximumlikelihood functions, `fobs' and `sigma' are the observed structurefactor amplitudes and corresponding σ values, `fcalc' is the (complex) calculated structurefactor array, `fbulk' is the structurefactor array for a bulk solvent model, and `d' and are the crossvalidated D and functions (Read, 1990; Kleywegt & Brünger, 1996; Read, 1997) which are precomputed prior to invoking the MLF target function using the test set of reflections. The availability of internal Fortran subroutines for the most computingintensive target functions and the symbolic definitions involving structurefactor arrays allow for maximal flexibility and efficiency. Other examples of available maximumlikelihood target functions include MLI (intensitybased maximumlikelihood refinement), MLHL [crystallographic model refinement with prior phase information (Pannu et al., 1998)], and maximumlikelihood heavyatom parameter refinement for multiple isomorphous replacement (Otwinowski, 1991) and MAD phasing (Hendrickson, 1991; Burling et al., 1996). Work is in progress to define target functions that include correlations between different heavyatom derivatives (Read, 1994).
Modules exist as separate files and contain collections of CNS commands related to a particular task. In contrast, procedures can be defined and invoked from within any file. Modules and procedures share a similar parameterpassing mechanism for both input and output. Modules and procedures make it possible to write programs in the CNS language in a manner similar to that of a computing language, such as Fortran or C. CNS modules and procedures have defined sets of input (and output) parameters that are passed into them (or returned) when they are invoked. This enables long collections of CNS language statements to be broken down into modules for greater clarity of the underlying algorithm.
Parameters passed into a module or procedure inherit the scope of the calling task file or module, and thus they exhibit a behaviour analogous to most computing languages. Symbols defined within a module or procedure are purely local variables.
The following example shows how the unitcell parameters defined above (Fig. 18.6.3.1b) are passed into a module named `compute_unit_cell_volume' (Fig. 18.6.6.1), which computes the volume of the unit cell from the crystal lattice parameters using well established formulae (Stout & Jensen, 1989): The parameter `volume' is equated to the symbol upon invocation in order to return the result (the unitcell volume) from this module. Note that the use of compound parameters to define the crystal lattice parameters (Fig. 18.6.3.1b) provides a convenient way to pass all required information into the module by referring to the base name of the compound parameter (`&crystal_lattice.unit_cell') instead of having to specify each individual data element.
Fig. 18.6.6.2(a) shows another example of a CNS module: the module named computes phase probability distributions using the Hendrickson & Lattman formalism (Hendrickson & Lattman, 1970; Hendrickson, 1979; Blundell & Johnson, 1976). An example for invoking the module is shown in Fig. 18.6.6.2(b). This module could be called from task files that need access to isomorphous phase probability distributions. It would be straightforward to change the module in order to compute different expressions for the phase probability distributions.
A large number of additional modules are available for crystallographic phasing and refinement. CNS library modules include spacegroup information, Gaussian atomic form factors, anomalousscattering components, and molecular parameter and topology databases.
Task files consist of CNS language statements and module invocations. The CNS language permits the design and execution of nearly any numerical task in Xray crystallographic structure determination using a minimal set of `hardwired' functions and routines. A list of the currently available crystallographic procedures and features is shown in Fig. 18.6.7.1.
Each task file is divided into two main sections: the initial parameter definition and the main body of the task file. The definition section contains definitions of all CNS parameters that are used in the main body of the task file. Modification of the main body of the file is not required, but may be done by experienced users in order to experiment with new algorithms. The definition section also contains the directives that specify specific HTML features, e.g. text comments (indicated by ), usermodifiable fields (indicated by ), and choice boxes (indicated by ). Fig. 18.6.7.2 shows a portion of the `define' section of a typical CNS refinement task file.
The task files produce a number of output files (e.g. coordinate, reflection, graphing and analysis files). Comprehensive information about input parameters and results of the task are provided in these output files. In this way, the majority of the information required to reproduce the structure determination is kept with the results. Analysis data are often given in simple columns and rows of numbers. These data files can be used for graphing, for example, by using commonly available spreadsheet programs. An HTML graphical output feature for CNS which makes use of these analysis files is planned. In addition, list files are often produced that contain a synopsis of the calculation.
The HTML graphical interface uses HTML to create a highlevel menudriven environment for CNS (Fig. 18.6.8.1). Compact and relatively simple Common Gateway Interface (CGI) conversion scripts are available that transform a task file into a form page and the edited form page back into a task file (Fig. 18.6.8.2). These conversion scripts are written in PERL.

Example of a CNS HTML form page. This particular example corresponds to the task file in Fig. 18.6.7.2. 

Use of the CNS HTML form page interface, emphasizing the correspondence between input fields in the form page and parameters in the task file. 
A comprehensive collection of task files are available for crystallographic phasing and refinement (Fig. 18.6.7.1). New task files can be created or existing ones modified in order to address problems that are not currently met by the distributed collection of task files. The HTML graphical interface thus provides a common interface for distributed and `personal' CNS task files (Fig. 18.6.8.2).
CNS has a comprehensive task file for simulatedannealing refinement of crystal structures using Cartesian (Brünger et al., 1987; Brünger, 1988) or torsionangle molecular dynamics (Rice & Brünger, 1994). This task file automatically computes crossvalidated estimates, determines the weighting scheme between the Xray refinement target function and the geometric energy function (Brünger et al., 1989), refines a flat bulk solvent model (Jiang & Brünger, 1994) and an overall anisotropic B value for the model by leastsquares minimization, and subsequently refines the atomic positions by simulated annealing. Options are available for specification of alternate conformations, multiple conformers (Burling & Brünger, 1994), noncrystallographic symmetry constraints and restraints (Weis et al., 1990), and `flat' solvent models (Jiang & Brünger, 1994). Available target functions include the maximumlikelihood functions MLF, MLI and MLHL (Pannu & Read, 1996; Adams et al., 1997; Pannu et al., 1998). The user can choose between slow cooling (Brünger et al., 1990) and constanttemperature simulated annealing, and the respective rate of cooling and length of the annealing scheme. For a review of simulated annealing in Xray crystallography, see Brünger et al. (1997).
During simulatedannealing refinement, the model can be significantly improved. Therefore, it becomes important to recalculate the crossvalidated error estimates (Kleywegt & Brunger, 1996; Read, 1997) and the weight between the Xray diffraction target function and the geometric energy function in the course of the refinement (Adams et al., 1997). This is important for the maximumlikelihood target functions that depend on the crossvalidated error estimates. In the simulatedannealing task file, the recalculation of values and subsequently the weight for the crystallographic energy term are carried out after initial energy minimization, and also after moleculardynamics simulated annealing.
CNS is a general system for structure determination by Xray crystallography and solution NMR. It covers the whole spectrum of methods used to solve Xray or solution NMR structures. The multilayer architecture allows use of the system with different levels of expertise. The HTML interface allows the novice to perform standard tasks. The interface provides a convenient means of editing complicated task files, even for the expert (Fig. 18.6.8.2). This graphical interface makes it less likely that an important parameter will be overlooked when editing the file. In addition, the graphical interface can be used with any task file, not just the standard distributed ones. HTMLbased documentation and graphical output are planned in the future.
Most operations within a crystallographic algorithm are defined through modules and task files. This allows for the development of new algorithms and for existing algorithms to be precisely defined and easily modified without the need for sourcecode modifications.
The hierarchical structure of CNS allows extensive testing at each level. For example, once the source code and CNS basic commands have been tested, testing of the modules and task files is performed. A test suite consisting of more than a hundred test cases is frequently evaluated during CNS development in order to detect and correct programming errors. Furthermore, this suite is run on several hardware platforms in order to detect any machinespecific errors. This testing scheme makes CNS highly reliable.
Algorithms can be readily understood by inspecting the modules or task files. This selfdocumenting feature of the modules provides a powerful teaching tool. Users can easily interpret an algorithm and compare it with published methods in the literature. To our knowledge, CNS is the only system that enables one to define symbolically any target function for a broad range of applications, from heavyatom phasing or molecularreplacement searches to atomic resolution refinement.
Acknowledgements
Support by the Howard Hughes Medical Institute and the National Science Foundation to ATB (DBI9514819 and ASC 93–181159), the Natural Sciences and Engineering Research Council of Canada to NSP, the Howard Hughes Medical Institute and the Medical Research Council of Canada to RJR (MT11000), the Netherlands Foundation for Chemical Research (SON–NWO) to PG and the Howard Hughes Medical Institute to LMR is gratefully acknowledged.
References
Adams, P. D., Pannu, N. S., Read, R. J. & Brünger, A. T. (1997). Crossvalidated maximum likelihood enhances crystallographic simulated annealing refinement. Proc. Natl Acad. Sci. USA, 94, 5018–5023.Blow, D. M. & Crick, F. H. C. (1959). The treatment of errors in the isomorphous replacement method. Acta Cryst. 12, 794–802.
Blundell, T. L. & Johnson, L. N. (1976). Protein Crystallography, pp. 375–377. London: Academic Press.
Bricogne, G. (1984). Maximum entropy and the foundations of direct methods. Acta Cryst. A40, 410–445.
Brünger, A. T. (1988). Crystallographic refinement by simulated annealing: application to a 2.8 Å resolution structure of aspartate aminotransferase. J. Mol. Biol. 203, 803–816.
Brünger, A. T. (1992). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature (London), 355, 472–475.
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., GrosseKunstleve, R. W., Jiang, J.S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Crystallography & NMR System (CNS): a new software suite for macromolecular structure determination. Acta Cryst. D54, 905–921.
Brünger, A. T., Adams, P. D. & Rice, L. M. (1997). New applications of simulated annealing in Xray crystallography and solution NMR. Structure, 5, 325–336.
Brünger, A. T., Karplus, M. & Petsko, G. A. (1989). Crystallographic refinement by simulated annealing: application to crambin. Acta Cryst. A45, 50–61.
Brünger, A. T., Krukowski, A. & Erickson, J. W. (1990). Slowcooling protocols for crystallographic refinement by simulated annealing. Acta Cryst. A46, 585–593.
Brünger, A. T., Kuriyan, J. & Karplus, M. (1987). Crystallographic R factor refinement by molecular dynamics. Science, 235, 458–460.
Burling, F. T. & Brünger, A. T. (1994). Thermal motion and conformational disorder in protein crystal structures: comparison of multiconformer and timeaveraging models. Isr. J. Chem. 34, 165–175.
Burling, F. T., Weis, W. I., Flaherty, K. M. & Brünger, A. T. (1996). Direct observation of protein solvation and discrete disorder with experimental crystallographic phases. Science, 271, 72–77.
Graham, I. S. (1995). The HTML Sourcebook. John Wiley and Sons.
Hendrickson, W. A. (1979). Phase information from anomalousscattering measurements. Acta Cryst. A35, 245–247.
Hendrickson, W. A. (1991). Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science, 254, 51–58.
Hendrickson, W. A. & Lattman, E. E. (1970). Representation of phase probability distributions for simplified combination of independent phase information. Acta Cryst. B26, 136–143.
Jiang, J.S. & Brünger, A. T. (1994). Protein hydration observed by Xray diffraction: solvation properties of penicillopepsin and neuraminidase crystal structures. J. Mol. Biol. 243, 100–115.
Kleywegt, G. J. & Brünger, A. T. (1996). Checking your imagination: applications of the free R value. Structure, 4, 897–904.
Otwinowski, Z. (1991). In Proceedings of the CCP4 Study Weekend. Isomorphous Replacement and Anomalous Scattering, edited by W. Wolf, P. R. Evans & A. G. W. Leslie, pp. 80–86. Warrington: Daresbury Laboratory.
Pannu, N. S., Murshudov, G. N., Dodson, E. J. & Read, R. J. (1998). Incorporation of prior phase information strengthens maximumlikelihood structure refinement. Acta Cryst. D54, 1285–1294.
Pannu, N. S. & Read, R. J. (1996). Improved structure refinement through maximum likelihood. Acta Cryst. A52, 659–668.
Read, R. J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Cryst. A42, 140–149.
Read, R. J. (1990). Structurefactor probabilities for related structures. Acta Cryst. A46, 900–912.
Read, R. J. (1994). Maximum likelihood refinement of heavy atoms. Lecture notes for a workshop on isomorphous replacement methods in macromolecular crystallography. American Crystallographic Association Annual Meeting, 1994, Atlanta, GA, USA.
Read, R. J. (1997). Model phases: probabilities and bias. Methods Enzymol. 277, 110–128.
Rice, L. M. & Brünger, A. T. (1994). Torsion angle dynamics: reduced variable conformational sampling enhances crystallographic structure refinement. Proteins Struct. Funct. Genet. 19, 277–290.
Stout, G. H. & Jensen, L. H. (1989). Xray Structure Determination, p. 33. New York: Wiley Interscience.
Weis, W. I., Brünger, A. T., Skehel, J. J. & Wiley, D. C. (1990). Refinement of the influenza virus haemagglutinin by simulated annealing. J. Mol. Biol. 212, 737–761.