International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G, ch. 3.2, pp. 114-116

Section 3.2.6. File metadata

S. R. Hall,a* P. M. D. Fitzgeraldb and B. McMahonc

aSchool of Biomedical and Chemical Sciences, University of Western Australia, Crawley, 6009, Australia,bMerck Research Laboratories, Rahway, New Jersey, USA, and cInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail:  syd@crystal.uwa.edu.au

3.2.6. File metadata

| top | pdf |

The categories describing the history of a data block and its relation to other blocks are as follows:

AUDIT group
 AUDIT
 AUDIT_AUTHOR
 AUDIT_CONFORM
 AUDIT_CONTACT_AUTHOR
 AUDIT_LINK

Information about the origin and purpose of a CIF is needed to be able to make full use of the content of the CIF. Information about the CIF itself (rather than the experiment or structural model it describes) is known as metadata.

Because the scope of any data value is restricted to the data block in which it resides, each data block should contain its own set of _audit_* data items (a requirement that is often overlooked in the construction of a CIF with multiple data blocks). The data items in the AUDIT_LINK category may be used to record relationships between different data blocks within the same file.

Data items in these categories are as follows:

(a) AUDIT [Scheme scheme86]

(b) AUDIT_AUTHOR [Scheme scheme87]

(c) AUDIT_CONFORM [Scheme scheme88]

(d) AUDIT_CONTACT_AUTHOR [Scheme scheme89]

(e) AUDIT_LINK [Scheme scheme90]

The AUDIT category provides a small set of data names suitable for identifying a data block and recording its creation date and subsequent modifications. Each data block in a CIF is introduced by a string of the form data_xxxx, where the block code xxxx is an arbitrary string. CIF offers no guidelines for choosing a block code, and there are many cases where the same string has been chosen to label data blocks in different files. The _audit_block_code data item is meant to encourage authors to provide a unique label for a data block. Also, as a separate data item, _audit_block_code has the advantage that it can be interrogated using standard CIF query tools; this is not true of the block code.

The core dictionary does not specify a procedure for choosing a unique identifier for the data block, but other dictionaries do. The modulated structures dictionary recommends specific naming procedures (Section 3.4.4.4[link] ) and the power dictionary supplies alternative data items designed to generate globally unique identifiers (Section 3.3.7.1[link] ).

Some applications modify the block code in the data_xxxx string. The value of _audit_block_code may not be changed arbitrarily to suit the convenience of external applications.

In Example 3.2.6.1[link], the _audit_block_code assigned is different from the data-block code; the creation date is expressed in the CIF date format convention of yyyy-mm-dd and the revision record is generated by adding material to the _audit_update_record field. Each addition has been prefixed with the date and initialled by the person who made the change. It is good practice to maintain a full record of any changes of substance to the contents of the data block.

Example 3.2.6.1. Items identifying a data block and recording its revision history.

[Scheme scheme91]

Data items in the AUDIT_AUTHOR category record details of the author or authors of the data block. Where there is more than a single author, the names and addresses are looped. The use of these data items parallels that of the items in the PUBL_AUTHOR category; the difference is that the latter are used specifically to record details of authors of an article for publication. The AUDIT_AUTHOR category refers to the creators of a CIF data block regardless of its intended purpose.

Data items in the AUDIT_CONFORM category describe the version of the dictionary or dictionaries that contain the definitions of the data names in the current data block. It is very helpful to provide this information, so that applications software can locate the original definitions and validate the contents of the current data block against them (Example 3.2.6.2[link]). The dictionary identifier _audit_conform_dict_name is essential. The version is less important, as the dictionaries are revised in such a way as to try to retain compatibility between versions, but may occasionally be useful if changes of substance have crept in between versions. The location specified by _audit_conform_dict_location is useful only for local applications; in general the public register of CIF dictionaries should be used to locate dictionary files (see Section 3.1.8.3[link] ).

Example 3.2.6.2. The CIF dictionaries to which the data block conforms.

[Scheme scheme92]

Data items in the AUDIT_CONTACT_AUTHOR category record details of the name and address of the author to be contacted concerning the contents of the data block. The use of these data items parallels that of the items in the PUBL_CONTACT_AUTHOR category; the difference is that the latter are used specifically to record details of the contact author of an article for publication. The AUDIT_CONTACT_AUTHOR category refers to the creator of a CIF data block regardless of its intended purpose.

The original purpose of a CIF, to record the data relevant to a single-crystal structure determination, was quickly extended to include the creation of an article reporting several crystal structures, as well as to powder CIFs recording information about multiple phases, modulated-structure CIFs describing superimposed structures and macromolecular CIFs recording results of multiple refinement cycles. A mechanism is required to differentiate the purpose of an individual data block and its relationship to other data blocks in the same file. This is provided by the AUDIT_LINK category. Example 3.2.6.3[link] shows how a CIF of an article for publication might show the relationships between the data blocks in the file. Note that the link references the value of _audit_block_code in the referenced data block, not the data-block header string itself (although in this example, and in Example 3.2.6.4[link], they have the same value).

Example 3.2.6.3. List of linked data blocks in a CIF.

[Scheme scheme93]

For many applications, it is enough for a statement of the links between the data blocks in a CIF to be included once only in the file, normally in the initial data block. However, for completeness and to permit consistency checking, it is best if the other data blocks in the file have complementary declarations (Example 3.2.6.4[link]).

Example 3.2.6.4. Complementary list of linked data blocks in a secondary block.

[Scheme scheme94]

Current practice as described in the core dictionary restricts this reporting of links between data blocks to the contents of a single file. In principle, if _audit_block_code were known to have globally unique values in each distinct data block, the mechanism could be extended to permit inter-file linkage.








































to end of page
to top of page