International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G, ch. 5.3, p. 517

Section 5.3.6.2. PyCifRW: CIF reading and writing in Python

B. McMahona*

aInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail: bm@iucr.org

5.3.6.2. PyCifRW: CIF reading and writing in Python

| top | pdf |

PyCifRW (Hester, 2006[link]) is a simple CIF input/output utility written in Python. It does not validate content against dictionaries, but it does provide a robust parser that has been extensively tested against various test files containing subtle syntactic features and against the collection of over 18 000 macromolecular CIFs available from the Protein Data Bank. The parser was implemented using the Yapps2 parser generator (Patel, 2002[link]) and is based on the draft Backus–Naur form (BNF) developed during a community exercise to review the CIF specification of Chapter 2.2[link] .

As with the Perl library discussed above, PyCifRW presents an object-oriented set of classes and methods. Two classes are provided, CifFile and CifBlock.

A CifFile object provides an associative array of CifBlock objects, accessed by data-block name.

The methods available for the CifFile type are: ReadCif(filename), which initializes or reinitializes a file to contain the CIF contents; GetBlocks(), which returns a list of the data-block names in the file; NewBlock(blockname, [block contents]), which adds a new data block to the file object; and WriteOut(comment), which returns the contents of the current file as a CIF-conformant string, with an optional comment at the beginning. For the NewBlock method, the optional block contents must be a valid CifBlock object, as described below. The NewBlock method returns the name of the new block created, which will not be the requested blockname if a data block of the same name already exists (this conforms to the STAR and CIF requirement that data-block names must be unique within a file).

A CifBlock object represents the contents of a data block. The methods available to retrieve or manipulate the contents are: GetCifItem(itemname), which will return the value of the data item with data name given by itemname (and which can be a single value or an array of looped values); AddCifItem(data), which adds data to the current block, where data represents either a data name and an associated single value, or, for the case of looped data, a tuple containing an array of data names and an array of arrays of associated data values; RemoveCifItem(dataname), to remove the specified data item from the current block; and GetLoop(dataname), which returns a list of all data items occurring in the loop containing the data name provided. If dataname does not represent a looped data item, an error is returned.

The GetLoop method is important for the proper handling of looped data, and care is taken to handle loops robustly and efficiently. Items that are initially looped together are kept in the same loop structure.

Both CifFile and CifBlock objects act as Python mapping objects, which has the advantage that the value of a data item can be read or changed using an intuitive square-bracket notation. For example, a program can retrieve the value of a data item named _my_data_item_name in block `myblockname' of a previously opened file cf using the following syntax: [Scheme scheme9] The returned value is either a single item, or a Python array of values if the data item occurs in a loop. Values are set in an analogous way and other common operations with mapping objects are also implemented.

References

Hester, J. R. (2006). A validating CIF parser: PyCIFRW. J. Appl. Cryst. 39, 621–625.Google Scholar
Patel, A. J. (2002). Yapps: Yet Another Python Parser System. http://theory.stanford.edu/~amitp/yapps/ .Google Scholar








































to end of page
to top of page