International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G, ch. 2.4, pp. 45-47

Section 2.4.4. MIF concepts and syntax

F. H. Allen,a* J. M. Barnard,b A. P. F. Cookb and S. R. Hallc

aCambridge Crystallographic Data Centre, 12 Union Road, Cambridge, CB2 1EZ, England,bBCI Ltd, 46 Uppergate Road, Stannington, Sheffield S6 6BX, England, and cSchool of Biomedical and Chemical Sciences, University of Western Australia, Crawley, Perth, WA 6009, Australia
Correspondence e-mail:  allen@ccdc.cam.ac.uk

2.4.4. MIF concepts and syntax

| top | pdf |

The syntax of the Molecular Information File is based on that of the STAR File (Hall, 1991[link]; Hall & Spadaccini, 1994[link]). A MIF is an ASCII text file that can be read or amended using a standard text editor, and that can be processed computationally without conversion to another format. The organization and expression of MIF data is summarized in Table 2.4.4.1[link]. Each file consists of a series of data blocks and each block consists of a series of individual data items. There may be any number of items within a block and any number of blocks within a file. A data block represents a logical grouping of data items and, in most MIF applications, a data block will usually specify a complete chemical entity, i.e. a fully defined molecule or a query substructure.

Table 2.4.4.1| top | pdf |
Brief overview of the MIF syntax

A text string is a string of characters bounded by white space, single or double quotes, or semicolons in column 1.
A data name is a text string bounded by white space starting with an underline.
A data value is a text string not starting with underline, preceded by an identifying data name.
A list is a sequence of data names, preceded by `loop_' and followed by a list of data values.
A save frame is a collection of data within a data block, preceded by `save_framecode' and closed with `save_'.
A data block is a collection of data, preceded by `data_blockcode'.
A global block is a collection of data, preceded by global_, that is common to all subsequent data blocks.
A file may contain any number of data blocks or global blocks.
A data name must be unique within a data block.

The MIF syntax, unlike that of a CIF, places no restrictions on line lengths or nested loop levels. For a detailed understanding of the differences between a MIF and a CIF, the reader should compare this chapter with Chapter 2.2[link] or refer to the published details of the STAR syntax (Hall & Spadaccini, 1994[link]), the specification of the CIF core data items (Hall et al., 1991[link]) and the Dictionary Definition Language (Hall & Cook, 1995[link]) used to define data items in the electronic version of a STAR dictionary.

CIF data, described by over a thousand items in the current dictionaries (see Part 4), encompass the fields of crystallographic structure and diffraction techniques, and these data items could readily be incorporated into a MIF. It should be noted, however, that currently the reverse is not possible because the current CIF syntax does not support nested loops or save frames.

2.4.4.1. Data identification

| top | pdf |

The fundamental principle that underpins MIF is exactly as for CIF: every data item is represented by a unique data tag followed by its associated data value. These combinations are referred to as tag–value pairs or tuples. Data names must start with an underscore (i.e. underline) character and data values may be any type of string, ranging from a single character to many lines of text. Here are some simple examples of MIF data items: [Scheme scheme1]

The complete list of MIF core data items is given in Chapter 4.8[link] .

2.4.4.2. Looped lists

| top | pdf |

Repetitive data are stored in a MIF as lists of values, as they are in a CIF. Each list is prefaced by a loop_ statement and a sequence of data names that identify the data values that follow in `packets' of equal length. The values in each packet match the order and number of the data names. Any number of packets may appear in a looped list.

Atom and bond properties are typical of the information to appear in a looped list. The atoms and bonds of thiabutyrolactone in MIF format are shown in Fig. 2.4.4.1[link]. The description of each data item in this example is given in Chapter 4.8[link] , although the meanings are clear from the self-descriptive data names. The number of data values in each list is an exact multiple of the number of data names at the start of each loop structure. Looped lists are terminated by the next list or by any other data name, data block or end of file. Comments may be included in a MIF and are preceded by a # character, as illustrated in Fig. 2.4.4.1[link].

[Figure 2.4.4.1]

Figure 2.4.4.1 | top | pdf |

MIF coding of atom and bond properties for thiabutyrolactone.

Hierarchical data may require the use of nested loop structures (see the _display_* loop in Fig. 2.4.4.2[link]). Note that the packet for _display_id of 7 has two sets of _display_conn_ values giving connections to atom sites 1 and 4 (the other connections to site 7 appear in the next two packets). Data items that appear in looped lists are identified in the MIF dictionary (see Chapter 4.8[link] ) as having the attribute _list set to either `yes' or `both'. Other relationships between looped data items are also specified in the dictionary.

[Figure 2.4.4.2]

Figure 2.4.4.2 | top | pdf |

MIF coding of atom properties (including 3D coordinates), bond properties and display information for (+)-3-bromocamphor.

2.4.4.3. Save frames

| top | pdf |

Save frames are employed in a MIF to encapsulate grouped data for efficient cross-referencing. If a set of data needs to appear repeatedly in a data application, it is efficient to place this data into an addressable save frame. Molecular fragments, such as amino-acid units, are a case in point. A save frame is bounded by the statement save_framecode and terminated by a save_ statement. It can be referenced within the parent data block using the value $framecode where the framecode matches the string in the save_framecode. Note that all data names must be unique within a save frame, but the same data names may appear in other save frames or in the parent data block. Save frames may not contain other save frames but save-frame references ($framecode) may appear in other save frames.

Save frames can be used in a MIF for many purposes. A simple application, the storage of alternative 3D conformational representations describing cyclohexane, is illustrated in Fig. 2.4.4.3[link]. Within the STAR syntax, save-frame references ($framecode) may occur before or after the save-frame definition within any data block. MIF preserves this basic STAR syntax. Save frames are particularly useful for defining commonly referenced structural templates and examples of this facility are discussed and illustrated (Figs. 2.4.7.1[link] and 2.4.7.2[link]) in Section 2.4.7[link].

[Figure 2.4.4.3]

Figure 2.4.4.3 | top | pdf |

Atom and bond properties for cyclohexane, together with 3D coordinate representations of three alternative conformations: chair, boat and twisted boat.

2.4.4.4. Data blocks

| top | pdf |

A data block is a sequence of unique data items or save frames. It is opened with a data_blockcode statement and closed by another data-block statement or a global_ statement (see below). The blockcode string identifies the block within the file. Examples of data blocks are shown in Figs. 2.4.4.2[link], 2.4.4.3[link] and 2.4.6.1[link]. Each data block in a file must have a unique blockcode.

2.4.4.5. Global blocks

| top | pdf |

A global block is similar to a data block except that it is opened with a global_ statement and contains data that are common or `default' to all subsequent data blocks in a file. Global data items remain active until re-specified in a subsequent data block or global block.

In some applications it may be efficient to place data that are common to all data blocks within a global block. In particular, save frames may be defined within global blocks and then referenced in subsequent data blocks [this statement corrects an error in Hall & Spadaccini (1994[link])]. Examples of global data are shown in Figs. 2.4.7.1[link] and 2.4.7.2[link], in which a variety of frequently referenced structural units are encapsulated within save frames specified in global blocks.

References

Hall, S. R. (1991). The STAR File: a new format for electronic data transfer and archiving. J. Chem. Inf. Comput. Sci. 31, 326–333.
Hall, S. R., Allen, F. H. & Brown, I. D. (1991). The Crystallographic Information File (CIF): a new standard archive file for crystallography. Acta Cryst. A47, 655–685.
Hall, S. R. & Cook, A. P. F. (1995). STAR Dictionary Definition Language: initial specification. J. Chem. Inf. Comput. Sci. 35, 819–825.
Hall, S. R. & Spadaccini, N. (1994). The STAR File: detailed specifications. J. Chem. Inf. Comput. Sci. 34, 505–508.








































to end of page
to top of page