Tables for
Volume F
Crystallography of biological macromolecules
Edited by E. Arnold, D. M. Himmel and M. G. Rossmann

International Tables for Crystallography (2012). Vol. F, ch. 24.2, pp. 833-837   | 1 | 2 |

Chapter 24.2. The Nucleic Acid Database

B. Schneider,a J. de la Cruz,a S. Dutta,a Z. Feng,a L. Chen,a J. Westbrook,a H. Yang,a J. Young,a C. Zardeckia and H. M. Bermana*

aThe Nucleic Acid Database, Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA
Correspondence e-mail:

The Nucleic Acid Database (NDB) was one of the earliest relational databases for use in structural biology. Today, it provides valuable resources for researchers and students. Users can search the archive using a variety of constraints, and generate quick and detailed reports. Structures can be explored through an online atlas organized by experimental and structure type. The NDB has developed and hosts standards and software tools, and offers special programs for exploring RNA structure.

24.2.1. Introduction

| top | pdf |

The Nucleic Acid Database (NDB) (Berman et al., 1992[link]) was established in 1991 as a resource for specialists in the field of nucleic acid structure. Over the years, the NDB has developed generalized software for processing, archiving, querying and distributing structural data for nucleic acid-containing structures. The core of the NDB has been its relational database of nucleic acid-containing crystal structures; it also incorporates structures derived by nuclear magnetic resonance spectroscopy (NMR). Recognizing the importance of a standard data representation in building a database, the NDB became an active participant in the mmCIF project (Fitzgerald et al., 2005a[link],b[link]) and was the test-bed for this format. With a foundation of well curated data, the NDB created a searchable relational database of primary and derivative data with very rich query and reporting capabilities. This robust database was unique in that it allowed researchers to perform comparative analyses of nucleic acid-containing structures according to the many attributes stored in the database.

In 1992, the NDB assumed responsibility for processing all nucleic acid crystal structures that were deposited into the Protein Data Bank (PDB) (Berman et al., 2003[link], Bernstein et al., 1977[link]); it was the direct deposit site for nucleic acid structures from 1996 to 1998. In order to meet data-processing requirements, the NDB created the first validation software package for nucleic acids (Westbrook et al., 2003[link]), which is still used today. The NDB continues to provide a high level of information about nucleic acids and serves as a speciality database for its community of researchers. When the NDB began, the world of nucleic acid structures consisted of DNA and RNA oligonucleotides, a few protein–DNA complexes and some tRNA structures. Annotation of structural features was performed manually, and structures were easily classified into a few known molecular architectures by visual inspection. However, in the last several years a whole new universe of nucleic acid structures has emerged. There are many ribozyme structures and many different types of protein–nucleic acid complexes are represented. The additions of ribosomal structures to the archive have increased the number of nucleotide residues resident in the NDB several fold (Moore, 2001[link]) since they emerged in 2000 (Fig.[link]).


Figure | top | pdf |

Growth of the NDB. The number of structures available released per year in the NDB is indicated in blue (scale on the left) and the number of nucleotides in these structures in yellow (scale on the right).

24.2.2. Data processing and validation

| top | pdf |

The NDB created a robust data-processing system to produce high-quality data that are readily loaded into a database. The full capability of this system was demonstrated by the successful processing of ribosomal subunits, which are very large and complex structures.

Early on, the NDB adopted the Macromolecular Crystallographic Information File (mmCIF; Bourne et al., 1997[link]; Fitzgerald et al., 2005a[link],b[link]; Westbrook et al., 2005[link]) as its data standard. This format has several advantages from the point of view of building a database: (1) the definitions for the data items are based on a comprehensive dictionary of crystallographic terminology and molecular structure description; (2) it is self-defining; and (3) the syntax contains explicit rules that further define the characteristics of the data items, particularly the relationships between data items (Westbrook & Bourne, 2000[link]). The latter feature is important because it allows for rigorous checking of the data.

The tools first developed by the NDB project are used by the Research Collaboratory for Structural Bioinformatics PDB (Berman et al., 2000[link]) and the Protein Data Bank Japan (both members of the Worldwide PDB, ) for processing both proteins and nucleic acids. The validation tool NUCheck verifies valence geometry, torsion angles, intermolecular contacts, and the chiral centres of the sugars and phosphates for nucleic acids (Westbrook et al., 2003[link]). The dictionaries used for checking the structures were developed by the NDB Project from analyses of high-resolution, small-molecule structures (Clowney et al., 1996[link]; Gelbin et al., 1996[link]) from the Cambridge Structural Database (CSD; Allen et al., 1979[link]). The torsion-angle ranges for double-helical DNA forms were derived from an analysis of well resolved nucleic acid structures (Schneider et al., 1997[link]). One important outgrowth of these validation projects was the creation of the force constants and restraints that are now in common use for crystallographic refinement of nucleic acid structures (Parkinson et al., 1996[link]). In addition to geometry checks, the molecular model is checked against the experimental data using SFCheck (Vaguine et al., 1999[link]).

Once primary annotation and validation are complete, nucleic acid-specific structural and functional annotations are added. These vary from a broad characterization of nucleic acid conformation as double helical A, B or Z type, to a statement of the presence of a simple structural feature such as a bulge or a helical loop. Proteins from protein/nucleic acid complexes are annotated for their function in the complexes as structural proteins (including, e.g., ribosomal and histone proteins), regulatory (including, e.g., different types of transcription and translation co-factors) and enzymes (polymerases, topoisomerases, endonucleases etc.).

24.2.3. The database

| top | pdf | Information content of the NDB

| top | pdf |

Structures available in the NDB include RNA and DNA oligonucleotides with two or more bases either alone or complexed with ligands, peptide nucleic acids, natural nucleic acids such as tRNA, and protein–nucleic acid complexes. The archive stores both primary and derived information about the structures (Table[link]). The primary data include: the crystallographic coordinate data, structure factors, NMR constraints, and information about the experiments used to determine the structures, such as crystallization information, data collection and refinement statistics. Structural derived information, such as valence geometry, torsion angles and intermolecular contacts, is calculated and stored in the database. Structural and functional features annotated specifically for nucleic acid-containing structures are also loaded to the database.

Table| top | pdf |
The information content of the NDB

(a) Identity of depositions, processing information
Unique codes – NDB, PDB codes
Processing information – deposition, release dates
(b) Primary experimental information stored in the NDB
Chemical description – the chemical composition of the deposition (content of all chemical constituents of the deposition), detailed description of chemical modifications of the nucleotides
Sequence description – sequences of polymers, nucleic acids and proteins
Structural description – structure type
Citation – authors, title, journal, volume, pages, year
(c) NMR structures
Coordinate information – atomic coordinates for all models submitted
Experimental data for the deposition – spectrometer, types of NMR experiments, solution content, derivatization
(d) X-ray structures
Coordinate information – atomic (orthogonal or fractional) coordinates, occupancies and temperature factors for the asymmetric unit
Experimental data for the deposition – cell dimensions; space group for the X-ray structures, spectrometer, solution content etc.
Data collection description – radiation source and wavelength; data collection device; temperature; resolution range; total and unique number of reflections
Crystallization description – crystallization method; temperature; pH value; solution composition
Refinement information – program used; number of reflections used for refinement; data cutoff; resolution range; different R factors; refinement of temperature factors and occupancies
(e) Derived information stored in the NDB
Structure summary – descriptor briefly characterizing the structure
Distances – chemical bond lengths; virtual bonds between phosphorus atoms
Angles – valence angles, virtual angles involving phosphorus atoms
Torsion angles – backbone and side-chain torsion angles for nucleic acids and proteins; pseudo-rotational parameters for the sugar rings
Base pairing – detailed description of base pairing, classification by Leontis–Westhof (Leontis & Westhof, 2001[link]) and Saenger (1984[link]) with full identification of all pairs
Base morphology – base-morphology parameters calculated by 3DNA (Alexandrov & Shindyalov, 2003[link]) using the standard reference frame (Olson et al., 2001[link])
Special geometrical features – nonbonded contacts, crystal-related geometries (for X-ray structures), symmetry-related coordinates, coordinates for symmetry-related strands, root-mean-square deviations from small-molecule standards for valence geometries
Sequence pattern statistics
Manually annotated structural features – type of helix, classification of double helix, presence of loops, bulges, three/four way junctions
Program-assigned structural features – presence of A/B/Z type of helix, presence of triple, quadruple helices, loops, bulges, three/four way junctions

Some structural features of nucleic acids have historically been derived by different algorithms, and it can be difficult to provide the most reliable values. Whenever possible, the NDB has tried to promote standards that allow structure comparison. An out­standing example of this difficulty was that different programs produced different values for base-morphology parameters (Lavery & Sklenar, 1989[link]; Bansal et al., 1995[link]; Dickerson, 1998[link]; Babcock et al., 1994[link]; Lu et al., 1997[link]). This meant that it was not possible to compare any two structures by using the numbers in the published literature and that it was necessary to recalculate these values for any analysis. To help resolve this problem, the NDB co-sponsored the Tsukuba Workshop on Nucleic Acid Structure and Interactions (Tsukuba, Japan, 1999). Key software developers in this field attended and resolved that a single reference frame would be used to calculate these values, and an agreement was reached about the definition of that reference frame (Olson et al., 2001[link]). This work fully quantitated the proposal for base morphology made previously at a meeting in Cambridge (Dickerson et al., 1989[link]). The NDB has recalculated these values for all the structures in the repository using the program 3DNA (Lu & Olson, 2003[link]), and they are available as output from NDB searches and prepared reports. All the programs have been amended so that they produce very similar values for the base-morphology parameters. User web access

| top | pdf |

The core of the NDB project is a relational database that can be used to generate queries and reports about the stored structures. This key content of the NDB is supplemented by other services. An integrated web interface ( ) provides wide and easy distribution of all NDB functionalities. The NDB home page links to the search and Atlas pages, coordinate files, standards for nucleic acid geometry, prepared reports about the database content, and pages with software tools for the calculation and analysis of nucleic acid structural properties. Query capabilities

| top | pdf |

The NDB relational database has all the data organized into over 90 tables of summary, experimental and derived information, with each table containing five to 20 data items. For instance, the table citation contains all items relevant for related literature reference(s), entity contains critical items related to chemical entities observed in the structure and entity_poly contains the sequences of polymer(s) observed in the structure indexed to the entity table.

A web interface makes the query capabilities of the NDB as widely accessible as possible with a simple NDB Search and a more complex Integrated Search. The simple NDB Search provides a web form to query singular items, or combinations of items, such as NDB and PDB identification codes, author names and year of publication or release, experimental method, as well as chemical and structural characteristics such as the presence of chemical modification, the presence of non-Watson–Crick base pairing, the presence of different types of double helices (B/A/Z/right-handed), or the presence of a tetraplex or a three- or four-way junction. If no selection is made, the entire database is selected. After the query is executed, a list of structure IDs and descriptors that match the desired conditions is returned.

The Integrated Search provides the user with a wide range of options for structure selection and the use of Boolean operators to make complex queries. The interface offers pull-down menus to select key components of database entries: citation information, experiment type and details, molecule type (DNA, RNA, protein and ligand) present, presence of nucleic acid modifications, and nucleic acid structural conformation type (description, type and features).

Atlas pages (see below) and several different reports generated on the fly (Table[link]) can be used to explore search results.

Table| top | pdf |
Reports automatically generated for searches of the NDB

Report nameContains
Descriptor NDB ID, structure description
NDB and PDB identifiers NDB and PDB IDs
NDB status NDB and PDB IDs, descriptor, authors, status, deposition and release dates
Citation information Authors, title, journal, volume, pages, year
NA sequence NDB ID, nucleic acid sequence, descriptor
Cell dimensions Crystallographic cell constants, space group
Refinement data Resolution limits and reflections used in refinement, R factor
Nucleic acid backbone torsion Backbone torsions with unique residue identifier
Base-pair parameters Buckle, propeller, opening, shear, stretch, stagger with unique pair identifier
Base-step parameters Twist, rise, helical rise, tilt, roll, inclination, shift, slide, unique step identifier Atlas pages

| top | pdf |

A powerful reporting option is the NDB Atlas (Fig.[link]). An Atlas report summarizes the most important characteristics of a structure with information about the authors, citation, sequence(s) of nucleic acids and proteins reported. Atlas entries contain an outline of the experimental information, a molecular view of the biologically relevant assembly of the reported structure, and links to tables of derived quantities such as torsion angles, base morphology and base pairing. Structural coordinates can be downloaded in PDB and mmCIF formats, and a link is provided to the PDB Structure Explorer page for more details.


Figure | top | pdf |

NDB Atlas entry for a structure of ribonuclease P RNA, UR0027 (Krasilnikov et al., 2003[link]). Clockwise from the upper left: the summary Atlas page, schematic 2D drawing of base pairing as calculated and drawn by RNAView (Yang et al., 2003[link]), hydrogen-bonding classification, and torsion angles.

Atlas pages are created for all entries directly from the NDB database and are organized by experimental method (X-ray and NMR), and by structure type into broad groups of structures (with possible overlaps) that allow effective browsing of the database. Atlas entries are also linked from database query results.

24.2.4. Distribution of information

| top | pdf |

In addition to querying and reporting capabilities, the NDB hosts coordinate and experimental data files, database reports, software programs, and other resources. They are available from the web ( ) as well as via the ftp server ( ). An electronic help desk ( provides support. Structural coordinates and experimental data

| top | pdf |

Structural coordinate files can be downloaded in mmCIF and PDB formats (Westbrook & Fitzgerald, 2009[link]). Coordinates for the biological assembly of the molecule are provided for X-ray structures. When available, experimental files are made accessible to the community users. Standards and software tools

| top | pdf |

The Standards section contains dictionaries of standard (`ideal') geometries of nucleic acid components as well as their parameter files for X-PLOR (Brünger, 1992[link]). It also provides a full explanation of the standard reference frame (Olson et al., 2001[link]).

The NDB links to several software Tools. With the recent increase in the number of RNA structures available, there have been attempts to establish systems for systematic analysis. The result of one of these studies has been the proposal of a classification scheme for the hydrogen bonds in the base pairs (Westhof & Fritsch, 2000[link]; Leontis et al., 2002[link]). The syntax RNAML has also been proposed for representing RNA structural features ( ). The program RNAViewer (Yang et al., 2003[link]) uses Leontis–Westhof classification of base pairs and RNAML syntax to generate two-dimensional (2D) graphical representations of base pairing. Base Pair Viewer (BPView) visualizes selected base pairs or triplets. In both cases, the base pairs according to the Leontis–Westhof (Leontis & Westhof, 2001[link]) classification are generated on the fly.

Two programs, predictdnahth (McLaughlin & Berman, 2003[link]) and HTHquery (Ferrer-Costa et al., 2005[link]), analyse the protein DNA-binding helix-turn-helix (HTH) structural motif, and both can be used to determine whether a 3D protein target (from the archive or uploaded by the user) is a DNA-binding protein with the HTH motif.

Several software tools can be downloaded from the NDB site for local installation on the user's computer. RNAViewer (Yang et al., 2003[link]) and S2S (originally RNAMLview; Jossinet & Westhof, 2005[link]) analyse base pairing in a PDB formatted file and draw its 2D representation. Three programs, 3DNA (Lu & Olson, 2003[link]), Freehelix98 (Dickerson, 1998[link]) and hel_param (Babcock et al., 1993[link]) are provided to compute base-pair parameters.


The NDB Project was funded by the National Science Foundation and the Department of Energy.

This chapter is an updated version of Schneider et al. (2009[link]). Reprinted with permission of John Wiley & Sons, Inc.


Alexandrov, N. & Shindyalov, I. (2003). PDP: protein domain parser. Bioinformatics, 19, 429–430.
Allen, F. H., Bellard, S., Brice, M. D., Cartwright, B. A., Doubleday, A., Higgs, H., Hummelink, T., Hummelink-Peters, B. G., Kennard, O., Motherwell, W. D. S., Rodgers, J. R. & Watson, D. G. (1979). The Cambridge Crystallographic Data Centre: computer-based search, retrieval, analysis and display of information. Acta Cryst. B35, 2331–2339.
Babcock, M. S., Pednault, E. P. & Olson, W. K. (1993). Nucleic acid structure analysis: a users guide to a collection of new analysis programs. J. Biomol. Struct. Dyn. 11, 597–628.
Babcock, M. S., Pednault, E. P. & Olson, W. K. (1994). Nucleic acid structure analysis. Mathematics for local Cartesian and helical structure parameters that are truly comparable between structures. J. Mol. Biol. 237, 125–156.
Bansal, M., Bhattacharyya, D. & Ravi, B. (1995). Nuparm and nucgen: software for analysis and generation of sequence dependent nucleic acid structures. CABIOS, 11, 281–287.
Berman, H., Henrick, K. & Nakamura, H. (2003). Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980.
Berman, H. M., Olson, W. K., Beveridge, D. L., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S. H., Srinivasan, A. R. & Schneider, B. (1992). The Nucleic Acid Database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J. 63, 751–759.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235–242.
Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542.
Bourne, P. E., Berman, H. M., McMahon, B., Watenpaugh, K. D., Westbrook, J. D. & Fitzgerald, P. M. (1997). Macromolecular Crystallographic Information File. Methods Enzymol. 277, 571–590.
Brünger, A. T. (1992). X-plor. A System for X-ray Crystallography and NMR. Version 3.1. Yale University, USA.
Clowney, L., Jain, S. C., Srinivasan, A. R., Westbrook, J., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: nitrogenous bases. J. Am. Chem. Soc. 118, 509–518.
Dickerson, R. E. (1998). DNA bending: the prevalence of kinkiness and the virtues of normality. Nucleic Acids Res. 26, 1906–1926.
Dickerson, R. E., Bansal, M., Calladine, C. R., Diekmann, S., Hunter, W. N., Kennard, O., von Kitzing, E., Lavery, R., Nelson, H. C. M., Olson, W., Saenger, W., Shakked, Z., Sklenar, H., Soumpasis, D. M., Tung, C.-S., Wang, A. H.-J. & Zhurkin, V. B. (1989). Definitions and nomenclature of nucleic acid structure parameters. EMBO J. 8, 1–4.
Ferrer-Costa, C., Shanahan, H. P., Jones, S. & Thornton, J. M. (2005). HTHquery: a method for detecting DNA-binding proteins with a helix-turn-helix structural motif. Bioinformatics, 21, 3679–3680.
Fitzgerald, P. M. D., Westbrook, J. D., Bourne, P. E., McMahon, B., Watenpaugh, K. D. & Berman, H. M. (2005a). International Tables for Crystallography, Vol. G, Definition and Exchange of Crystallographic Data, edited by S. R. Hall & B. McMahon, pp. 295–443. Dordrecht: Springer.
Fitzgerald, P. M. D., Westbrook, J. D., Bourne, P. E., McMahon, B., Watenpaugh, K. D. & Berman, H. M. (2005b). International Tables for Crystallography, Vol. G, Definition and Exchange of Crystallographic Data, edited by S. R. Hall & B. McMahon, pp. 144–198. Dordrecht: Springer.
Gelbin, A., Schneider, B., Clowney, L., Hsieh, S.-H., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: sugar and phosphate constituents. J. Am. Chem. Soc. 118, 519–528.
Jossinet, F. & Westhof, E. (2005). Sequence to Structure (S2S): display, manipulate and interconnect RNA data from sequence to structure. Bioinformatics, 21, 3320–3321.
Krasilnikov, A. S., Yang, X., Pan, T. & Mondragón, A. (2003). Crystal structure of the specificity domain of ribonuclease P. Nature (London), 421, 760–764.
Lavery, R. & Sklenar, H. (1989). Defining the structure of irregular nucleic acids: conventions and principles. J. Biomol. Struct. Dyn. 6, 655–667.
Leontis, N. B., Stombaugh, J. & Westhof, E. (2002). The non-Watson–Crick base pairs and their associated isostericity matrices. Nucleic Acids Res. 30, 3497–3531.
Leontis, N. B. & Westhof, E. (2001). Geometric nomenclature and classification of RNA base pairs. RNA, 7, 499–512.
Lu, X. J., El Hassan, M. A. & Hunter, C. A. (1997). Structure and conformation of helical nucleic acids: analysis program (SCHNAaP). J. Mol. Biol. 273, 668–680.
Lu, X. J. & Olson, W. K. (2003). 3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res. 31, 5108–5121.
McLaughlin, W. A. & Berman, H. M. (2003). Statistical models for discerning protein structures containing the DNA-binding helix-turn-helix motif. J. Mol. Biol. 330, 43–55.
Moore, P. B. (2001). The ribosome at atomic resolution. Biochemistry, 40, 3243–3250.
Olson, W. K., Bansal, M., Burley, S. K., Dickerson, R. E., Gerstein, M., Harvey, S. C., Heinemann, U., Lu, X.-J., Neidle, S., Shakked, Z., Sklenar, H., Suzuki, M., Tung, C.-S., Westhof, E., Wolberger, C. & Berman, H. M. (2001). A standard reference frame for the description of nucleic acid base-pair geometry. J. Mol. Biol. 313, 229–237.
Parkinson, G., Vojtechovsky, J., Clowney, L., Brünger, A. T. & Berman, H. M. (1996). New parameters for the refinement of nucleic acid-containing structures. Acta Cryst. D52, 57–64.
Saenger, W. (1984). Principles of Nucleic Acid Structure. New York: Springer-Verlag.
Schneider, B., de la Cruz, J., Feng, Z., Chen, L., Westbrook, J., Yang, H., Young, J., Zardecki, C. & Berman, H. M. (2009). Structural Bioinformatics, 2nd ed., edited by J. Gu & P. E. Bourne, pp. 293–303. Hoboken: John Wiley & Sons, Inc.
Schneider, B., Neidle, S. & Berman, H. M. (1997). Conformations of the sugar–phosphate backbone in helical DNA crystal structures. Bio­poly­mers, 42, 113–124.
Vaguine, A. A., Richelle, J. & Wodak, S. J. (1999). SFCHECK: a unified set of procedures for evaluating the quality of macromolecular structure-factor data and their agreement with the atomic model. Acta Cryst. D55, 191–205.
Westbrook, J., Feng, Z., Burkhardt, K. & Berman, H. M. (2003). The Protein Data Bank. Nucleic Acids Res. 28, 235–242.
Westbrook, J., Yang, H., Feng, Z. & Berman, H. M. (2005). International Tables for Crystallography, Vol. G, Definition and Exchange of Crystallographic Data, edited by S. R. Hall & B. McMahon, pp. 539–543. Dordrecht: Springer.
Westbrook, J. D. & Bourne, P. E. (2000). STAR/mmCIF: an ontology for macromolecular structure. Bioinformatics, 16, 159–168.
Westbrook, J. D. & Fitzgerald, P. M. D. (2009). Structural Bioinformatics, 2nd ed., edited by P. E. Bourne & J. Gu, pp. 271–291. Hoboken: John Wiley & Sons, Inc.
Westhof, E. & Fritsch, V. (2000). RNA folding: beyond Watson–Crick pairs. Structure, 8, R55–R65.
Yang, H., Jossinet, F., Leontis, N., Chen, L., Westbrook, J., Berman, H. M. & Westhof, E. (2003). Tools for the automatic identification and classification of RNA base pairs. Nucleic Acids Res. 31, 3450–3460.

to end of page
to top of page