Tables for
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G, ch. 2.2, pp. 33-34

Section Data-value semantics

S. R. Hall,a* N. Spadaccini,c I. D. Brown,d H. J. Bernstein,e J. D. Westbrookb and B. McMahonf Data-value semantics

| top | pdf |

(14) The STAR syntax permits retrieval of data by simply requesting a specific data name within a specific data block. Prior knowledge about data type (e.g. text or numbers), whether the item is looped or whether the item exists in the file at all is unnecessary. However, applications in general need to know data type, valid ranges of values and relationships between data items, and a program designer needs to know the purpose of the data item (i.e. what physical quantity or internal book-keeping function it represents). While such semantic information may be defined informally for local data items (ones not intended for exchange between different users or software applications), formal descriptions of the semantics associated with data values are catalogued in data dictionary files. Currently two formalisms (dictionary definition languages) for describing data-value attributes are supported; full specifications of these formalisms (known as DDL1 and DDL2) are provided in Chapters 2.5[link] and 2.6[link] . Data typing

| top | pdf |

(15) Four base data types are supported in CIF. These are:

(i) numb: a value interpretable as a decimal base number and supplied as an integer, a floating-point number or in scientific notation;

(ii) char: a value to be interpreted as character or text data (where the value contains white-space characters, it must be quoted);

(iii) uchar: a value to be interpreted as character or text data but in a case-insensitive manner (i.e. the values FOO and foo are to be taken as identical);

(iv) null: a special data type associated with items for which no definite value may be stored in computer memory. It is the type associated with the special character literal values ? (query mark) and . (full point), which may appear as values for any data item within a data file (see Section[link] below). It is also the type assigned to items defined in dictionary files that may not occur in data files.

(16) Comment: Many applications distinguish between multi-line text fields and character-string values that fit within a single line of text. While this is a convenient practical distinction for coding purposes, formally both manifestations should be regarded as having the same base type, which might be `char' or `uchar'. Applications are at liberty to choose whether to define specific multi-line text subtypes, and whether to permit casting between subtypes of a base type. The examples of character-string delimiters in Section[link](20) are predicated on an approach that handles all subtypes of character or text data equivalently.

(17) Where the attributes of a data value are not available in a dictionary listing, it may be assumed that a character string interpretable as a number should be taken to represent an item of type `numb'. However, an explicit dictionary declaration of type will override such an assumption. Subtyping

| top | pdf |

(18) The base data types detailed in the previous section are very general and need to be refined for practical application. Refinement of types is to some extent application-dependent, and different subtypes are supported for data items defined by DDL1 and DDL2 dictionary files. The following notes indicate some considerations, but the relevant dictionary files and documentation should be consulted in each case.

(19) DDL1 dictionaries. Values of type `numb' may include a standard uncertainty in the final digit(s) of the number where the associated item definition includes the attribute [Scheme scheme30] (or _type_conditions su, a synonym introduced to DDL1 in 2005). For example, a value of 34.5(12) means 34.5 with a standard uncertainty of 1.2; it may also be expressed in scientific notation as 3.45E1(12).

(20) DDL2 dictionaries. DDL2 provides a number of tags that may be used in a dictionary file to specify subtypes for data items defined by that dictionary alone. Examples of the subtypes specified for the macromolecular CIF dictionary are:

code identifying code strings or single words
ucode identifying code strings or single words (case-insensitive)
uchar1 single-character codes (case-insensitive)
uchar3 three-character codes (case-insensitive)
line character strings forming a single line of text
uline character strings forming a single line of text (case-insensitive)
text multi-line text
int integers
float floating-point real numbers
yyyy-mm-dd dates
symop symmetry operations
any any type permitted

to end of page
to top of page