International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G, ch. 5.7, pp. 562-564

Section 5.7.3. CIF and other journals

P. R. Strickland,a M. A. Hoylanda and B. McMahona*

aInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail:  bm@iucr.org

5.7.3. CIF and other journals

| top | pdf |

Not every journal will be able to benefit to the same extent from the handling of CIFs. For many journals, structure reports will be secondary to the main purpose of most articles, and CIF data will more usually be deposited as supplementary or supporting documents, while only a summary (if anything) of the structure will be reported in the article body.

Nevertheless, the ability to extract data from CIFs automatically and the ability of much crystallographic software to read CIFs mean that even journals that do not specialize in crystallography can provide a production stream that includes careful checking of crystal structure data. The IUCr continues to develop checkcif as a service which can be used by other publishers to enhance their checking of crystal structures, and there is considerable interest in this approach.

All journals publishing the results of crystal structure determinations may easily collect the supporting data in CIF format and transfer the files to public databases, improving the accuracy and efficiency of the database-building procedures.

5.7.3.1. Including CIF data in an article

| top | pdf |

For journals other than those specializing in full-scale structure reports, including CIF data in tables or reports of structures within general articles is rather more problematic. The translation of CIF data into XML seems to be a promising route to explore, as journals and reference volumes are increasingly being typeset from XML files. Traditionally, publishing has emphasized content markup that leads to a particular typographic representation. Modern trends are towards markup that tags the content by purpose, with the representation directed by external `style files'. Consider Fig. 5.7.3.1[link], which shows the typeset representation of a set of data items in a CIF for a structural paper.

[Figure 5.7.3.1]

Figure 5.7.3.1 | top | pdf |

Typesetting of structural data. The contents of the CIF (a) are transformed into a typeset representation (b) that omits, annotates or reorders the incoming data according to context and the style rules of the journal.

First, it can be seen that several CIF data items are omitted from the printed representation, such as the International Tables space-group number and the Hall symbol for the space group. For compactness, the printed data value does not have a legend or annotation if the meaning of an item is clear from the context; thus, the crystal system and Hermann–Mauguin space-group symbol are printed without any accompanying text. The journal may also omit information that is implicit given other data; thus the cell angles are not printed for an orthorhombic cell. On the other hand, units, which are implicit in the definition of a CIF data item, are printed. Related items are grouped together in a single expression, as in the case of the [\theta] range or the crystal dimensions. In some cases, numerical values have been rounded to meet the journal's policy.

All of these transformations are matters of style, but it can be seen that they are not always trivial mappings to single data names. The style files determining the transformation from a detailed explicit data tabulation in the initial CIF may need to implement complex logical tests to suit the requirements of the journal.

Fig. 5.7.3.2[link] shows the same extract in [\hbox{\TeX}], the markup and typesetting language that was used for several years to produce Acta Cryst. C. It can be seen from this extract that the actual markup maps very closely to the initial CIF. All the cell parameters, including the cell angles, are present in the source file. The expansion of the macros (e.g. \cellalpha) executes the logic required to determine whether the value is to be printed and generates the additional text surrounding the value. Each data name is mapped to a distinct macro (even if the macros themselves have identical or near-identical internal structure), which preserves the semantic labelling of the original CIF. These macros are maintained in a separate file referenced and executed by every invocation of the typesetting program.

[Figure 5.7.3.2]

Figure 5.7.3.2 | top | pdf |

Part of a [\hbox{\TeX}] file used to print the article shown in Fig. 5.7.3.1[link](b).

In contrast, Fig. 5.7.3.3[link] shows part of the SGML now used to typeset Acta Cryst. C and to generate HTML versions of the articles online. It is immediately seen that the markup emphasizes typographic style and positioning, and there is no explicit labelling by semantic element. Additional labelling is now found in the document structure; the individual items are marked up as `list items' (<li>), but the arrangement of this list into a tabular form is a feature of the typesetting engine, not the SGML.

[Figure 5.7.3.3]

Figure 5.7.3.3 | top | pdf |

Part of the SGML file used to print the article shown in Fig. 5.7.3.1[link](b).

It is clear that the [\hbox{\TeX}] macros provide a representation of the contents of the CIF that could easily be converted back to the initial input CIF. At present, such bidirectional translation is not possible from the SGML file.

Clearly, therefore, a mapping to SGML that preserved semantic markup would be preferable. It is most likely that suitable bidirectional translations would be based on XML.

5.7.3.2. CIF and XML

| top | pdf |

XML is a specific concrete implementation of SGML suitable for generation of online browsable content. Mature style transformation mechanisms for XML exist and others are under active development.

Section 5.3.8.2.1[link] describes one transformation to XML in the biological structures field, designed primarily for database interchange rather than publication. This transformation preserves the underlying data model of an mmCIF very closely, and one might anticipate similar XML transformations for small-molecule CIF applications and for publications. It is even possible that the XML transformations referred to in Chapter 5.3[link] could be used for publishing articles if suitable style transformations are developed, but this has not been tested yet.

One difficulty with a simple CIF-to-XML transformation is that it could be easily adapted to the publication of structure reports in dedicated journals, but would not necessarily be compatible with other XML implementations developed by an unspecialized publishing house. This could be avoided by the registration of an XML name space covering transformed CIF data and the production of portable stylesheet transformations that could be adopted and modified to meet the requirements of different publishing houses. As yet, we know of no initiatives in this direction.

XML name spaces have been registered to safeguard the development of subject-specific methods of representation as part of a project by the International Union of Pure and Applied Chemistry (Becker, 2001[link]). One markup language that falls within the scope of this project is Chemical Markup Language (CML) (Murray-Rust & Rzepa, 1999[link], 2001[link]).

Further discussions of the relationship between CIF and XML representations and a proposal for extensions to certain CIF data values to accommodate the wider range of data structures permitted in XML are given by Bernstein (2000[link]).

References

Becker, E. D. (2001). Secretary General's Report. Chem. Int. 23, 135.
Bernstein, H. J. (2000). xmlCIF: a proposal for faithful representation of Extensible Markup Language (XML) documents within Crystallographic Information File (CIF) data sets. http://www.bernstein-plus-sons.com/software/xmlCIF/ .
Murray-Rust, P. & Rzepa, H. S. (1999). Chemical markup, XML and the Worldwide Web. 1. Basic principles. J. Chem. Inf. Comput. Sci. 39, 928–942.
Murray-Rust, P. & Rzepa, H. S. (2001). Chemical markup, XML and the Worldwide Web. 2. Information objects and the CMLDOM. J. Chem. Inf. Comput. Sci. 41, 1113–1123.








































to end of page
to top of page