International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G, ch. 2.6, p. 61

Section 2.6.1. Introduction

J. D. Westbrook,a* H. M. Bermana and S. R. Hallb

aProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, NJ 08854-8087, USA, and bSchool of Biomedical and Chemical Sciences, University of Western Australia, Crawley, Perth, WA 6009, Australia
Correspondence e-mail:  jwest@rcsb.rutgers.edu

2.6.1. Introduction

| top | pdf |

The dictionary definition language version 2 (DDL2) presented here extends the DDL1 version (Hall & Cook, 1995[link]) currently used by the IUCr for the description of data items common to all crystallographic studies (i.e. core items). The DDL2 extensions were introduced primarily to address two issues arising during the development of a CIF dictionary for the terminology of macromolecular crystallography: the need to accurately describe the hierarchical nature of macromolecular structure and associated structural features, and the desire to encode dictionary definitions in a manner that would permit more detailed software-driven validation.

The decision not to use DDL1 for the macromolecular CIF dictionary (mmCIF) was not made lightly. The Working Group responsible for the development of the mmCIF dictionary spent three years building a dictionary version within a DDL1 framework. When presented at the first mmCIF workshop in York in 1993 (Chapter 1.1[link] ), it was criticized as lacking the rigour to be usefully interpreted by software. In particular, the draft dictionary lacked machine-interpretable relationships between the components of macromolecular structure, the components of structure and structural properties, and the components of structure and the experimental description. These relationships are important to a complete description of macromolecular structural data, and need to be present in a dictionary in a form that permits software to navigate and validate them.

Following the York workshop, the Working Group set about redesigning the framework of the data model used to organize the dictionary definitions. Initially there was significant interest in adopting a more object-oriented data model so as to match closely the object-oriented characteristics of macromolecular structure. However, an object-oriented model would depart significantly from the organization of the core CIF dictionary based on DDL1, and interoperability between the two approaches would be likely to be problematic.

The new DDL concepts were presented at the Brussels mmCIF workshop in 1994. This approach, which later became known as DDL2 (Westbrook & Hall, 1995[link]), employs a largely relational data model that adhered more closely to the data model implicit in the core DDL1. DDL2 added new elements to expand the DDL1 concept of a data category. Categories in DDL2 are fully realized definitional elements that have their own set of attributes (e.g. definitions and examples). DDL2 also added data elements to explicitly define parent–child relationships between data items within a hierarchy of categories. It was demonstrated that using this conservative relational data model it was possible to accurately describe the content in the mmCIF dictionary.

In view of the importance of maintaining continuity between small- and large-molecule crystallography, the extensions in DDL2 have been introduced in a manner that provides the greatest degree of backward compatibility with applications and dictionaries developed on the core DDL1. Like DDL1, DDL2 dictionaries and data files are fully compliant with the underlying syntax rules of the Self-defining Text Archive and Retrieval (STAR) File (Hall, 1991[link]; Hall & Spadaccini, 1994[link]). Although DDL2 uses a different convention to name data items, an alias feature in DDL2 is used to maintain a correspondence with the published data-item names in the core CIF dictionary (Hall et al., 1991[link]; Chapter 4.1[link] ).

Since its introduction in 1994, DDL2 has been used to develop the mmCIF dictionary and this has been adopted as the data exchange standard of the Protein Data Bank, the international repository of three-dimensional macromolecular structure data (see Chapter 5.5[link] ).

References

Hall S. R. & Spadaccini, N. (1994). The STAR File: detailed specifications. J. Chem. Inf. Comput. Sci. 34, 505–508.
Hall, S. R. (1991). The STAR File: a new format for electronic data transfer and archiving. J. Chem. Inf. Comput. Sci. 31, 326–333.
Hall, S. R., Allen, F. H. & Brown, I. D. (1991). The Crystallographic Information File (CIF): a new standard archive file for crystallography. Acta Cryst. A47, 655–685.
Hall, S. R. & Cook, A. P. F. (1995). STAR dictionary definition language: initial specification. J. Chem. Inf. Comput. Sci. 35, 819–825.
Westbrook, J. D. & Hall, S. R. (1995). A dictionary description language for macromolecular structure. http://ndbserver.rutgers.edu/mmcif/ddl/ .








































to end of page
to top of page