Tables for
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G, ch. 3.2, pp. 98-102

Section 3.2.3. Analysis

S. R. Hall,a* P. M. D. Fitzgeraldb and B. McMahonc

aSchool of Biomedical and Chemical Sciences, University of Western Australia, Crawley, 6009, Australia,bMerck Research Laboratories, Rahway, New Jersey, USA, and cInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
Correspondence e-mail:

3.2.3. Analysis

| top | pdf |

The categories relevant to the structural analysis are as follows:

Refinement techniques and results (§[link])
REFINE group
The reflections used in the refinement (§[link])
REFLN group

In the small-molecule and inorganic studies for which the core dictionary was designed, phasing and structure solution are almost routine, and the dictionary provides few specific fields for recording the details of the structure solution process: _atom_sites_solution_primary, _atom_sites_solution_secondary and _atom_sites_solution_hydrogens (Section[link]); _computing_structure_solution (Section[link]); and _publ_section_exptl_solution (Section[link]). (In contrast, the macromolecular CIF includes extensive details of phasing.) Refinement, however, still allows for a wide range of techniques, practices and interpretation, and there are a large number of data names to allow a full account of the refinement strategy to be given. To complement this, several categories exist to provide a detailed listing and annotation of the structure factors and their treatment according to shells of resolution or other sorting criteria. Structure refinement

| top | pdf |

The data items in these categories are as follows:

(a) REFINE [Scheme scheme26]

(b) REFINE_LS_CLASS [Scheme scheme27]

The bullet ([\bullet]) indicates a category key. The arrow ([\rightarrow]) is a reference to a parent data item. The dagger ([\dagger]) indicates a deprecated item, which should not be used in the creation of new CIFs.

Example[link] shows how the data names in the REFINE category are used. Most of the dictionary entries are detailed and fully explanatory, so only a few points that might require special care are mentioned here.

Example Summary of refinement results.

[Scheme scheme28]

Two groups of older data names have been superseded by new names that are functionally equivalent, but represent a more correct terminology. One group is of names that include the component `_obs' used to indicate `observed' reflections; this has been replaced by the component `_gt' indicating that the measured values are greater than a threshold recorded elsewhere (as the value of _reflns_threshold_expression). The other group replaces the component `_esd' (for estimated standard deviation) with `_su' (for standard uncertainty).

A number of data names describe the extinction coefficient and the method used to determine it. Note that a default value ( Zachariasen) is given in the dictionary for the method ( _refine_ls_extinction_method); this only makes sense if this data item is missing from the data block but a value of _refine_ls_extinction_coef is present. This can complicate the design of software to read CIFs, which might assign to any missing data name a default value given by the dictionary.

Care is also needed with _refine_ls_hydrogen_treatment, which describes the treatment of hydrogen atoms in the refinement. Clearly, the data item only has meaning if there were hydrogen atoms in the model (although, since in this case the default value is undef for `undefined', it could be argued that the default is appropriate even when hydrogen atoms were not included in the model).

The weighting scheme used in the refinement is described by the two data names _refine_ls_weighting_scheme and _refine_ls_weighting_details. The first of the two can take only one of the three values sigma (weights assigned based on measured standard uncertainties), unit (unit or no weights applied) or calc (calculated weights applied). The actual mathematical expression used in the weighting scheme should be stated in _refine_ls_weighting_details.

A wide variety of `residual structure-factor difference measures', referred to as R factors, are used in crystallography as indicators of refinement quality. The core CIF dictionary contains definitions for the three most commonly used R factors. The `conventional R factor' is defined as[R = {{ \textstyle\sum|F({\rm meas}.)-F({\rm calc}.)|} \over {\textstyle\sum|F({\rm meas}.)|}},]where [F(\rm{meas}.)] and [F(\rm{calc}.)] are the measured and calculated structure factors, respectively. In the data item _refine_ls_R_factor_all, the sum used in the calculation is taken over all the reflections collected, whereas in the data item _refine_ls_R_factor_gt, the sum is taken over reflections with a value greater than the limit specified by _refine_threshold_expression. In both cases, the reflections included in the calculation may be limited to those between specified resolution limits.

This R factor is calculated from the F values, regardless of whether the structure-factor coefficient [|F|], [|F|^2] or I was actually used in the refinement, and is often taken as a convenient indicator of the relative quality of a structure determination. As most structure refinements used to be performed on [|F|], it allows a structure determined today to be compared with an older study.

Many refinements are now carried out on [|F|^2], although some may still use the absolute value of the structure factor $|F|$ or the net intensity I. The weighted residual factor wR and goodness of fit S for a refinement should be reported according to the coefficients actually used in the refinement. For example, the weighted residual over all reflections, _refine_ls_wR_factor_all, is defined as[wR=\Bigg({{\sum w[Y({\rm meas}.)-Y({\rm calc}.)]^2} \over {\sum wY({\rm meas}.)^2 }}\Bigg)^{1/2},]where w represents the weights and Y represents the structure-factor coefficient, either [|F|], [|F|^2] or I as specified by _refine_ls_structure_factor_coef.

This distinction between the conventional R factor, which is invariably calculated using F values, and the wR and S factors also holds for similar expressions defined on subsets of the reflections, e.g. _reflns_class_wR_factor_all.

Note that data names are also provided for reporting unweighted residuals on [|F|^2] or I, but these are rarely used in practice, with the exception of R(I) in Rietveld refinements against powder data, where it is generally called the Bragg R factor, RBragg or RB.

The data items in the REFINE_LS_CLASS category are similar to several in the general REFINE category, but correspond to values for separate reflection classes as described in the REFLNS_CLASS category. The data name _refine_ls_class_code identifies the individual classes through a direct match with a corresponding value of _reflns_class_code. Reflection measurements

| top | pdf |

The categories describing the reflections used in the refinement are as follows:

REFLN group
Individual reflections (§[link])
Groups of reflections (§[link])

The main category in this group is REFLN, which stores the list of reflections used in the structure refinement process, their associated structure factors and information about how each reflection was handled. The distinction between the REFLN (singular) category and the REFLNS (plural) category parallels the distinction between the categories DIFFRN_REFLN and DIFFRN_REFLNS: data items in the REFLN category store information about individual reflections, while data items in the REFLNS category store information about the complete set of reflections, or about subsets of reflections selected by shells of resolution, scaling factors or other criteria. Individual reflections

| top | pdf |

The data items in this category are as follows:

REFLN [Scheme scheme30]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ([\rightarrow]) is a reference to a parent data item. The dagger ([\dagger]) indicates a deprecated item, which should not be used in the creation of new CIFs.

Example[link] shows a typical structure-factor listing produced by a refinement program. This kind of structure-factor listing is suitable for deposition with a journal or a database. The Miller indices for each reflection are accompanied by the calculated and measured values of the quantity used in the refinement, and the standard uncertainty derived from the measurement. There is also an indication of whether each reflection was included in the refinement and in the calculation of R factors.

Example Structure-factor listing.

[Scheme scheme29]

In this example, the squared structure factors [|F|^2] are listed. When refinement is performed against the structure factors F or the intensities I, the data items _refln_F_calc or _refln_intensity_calc and the corresponding data names for the measured values and uncertainties should be used.

Individual calculated structure-factor components [A=|F|\cos\varphi] and [B=|F|\sin\varphi] may also be listed, along with the phase [\varphi], using the data names _refln_A_calc, _refln_B_calc and _refln_phase_calc. Corresponding measured values have equivalent *_meas names.

The _refln_include_status flag is used to indicate whether reflections were used in the refinement and in the calculation of R factors, and if they were not used, to give the reason for exclusion of the reflection from the refinement. The flag o, which indicates that a reflection was used in the refinement, was originally chosen to indicate that the value of the reflection was higher than the limit specified by _reflns_observed_criterion and that the reflection was thus `observed'. The data item _reflns_observed_criterion is now deprecated in favour of _reflns_threshold_status, and the value o is now taken to indicate not only that the reflection has an intensity suitable for inclusion in the refinement, but also that the reflection satisfies all other criteria used to select reflections for inclusion in the refinement.

Various other flags indicate reflections that were not included in the refinement. Reflections outside the range of d spacings bounded by the values _refine_ls_d_res_high and _refine_ls_ d_res_low are flagged with h or l, respectively. Reflections within the resolution limits but below the intensity threshold are flagged with <. Systematically absent reflections are flagged with -. Sometimes a value can be identified as having a systematic error; these reflections can be flagged with x. However, great care must be taken in excluding reflections with apparently `anomalous' structure factors (i.e. where the measured values are substantially different from the calculated ones), so as not to introduce bias into the refinement.

The flag _refln_refinement_status is used specifically to indicate whether a reflection was included in or excluded from the refinement. Use of _refln_include_status to provide more information about each reflection is greatly preferred.

Other data names in this category allow the recording of specific information about each reflection, such as the symmetry reinforcement factor [\varepsilon], the number of reflections symmetry-equivalent under the Laue symmetry, the d spacing, the mean path length through the crystal [\smash{\bar t}], the [(\sin\theta)/\lambda] value and, in the case of Laue experiments, the mean wavelength of the radiation. (For polychromatic radiation, the wavelength information might instead be given by _refln_wavelength_id, which is a code identifying a matching entry in the DIFFRN_RADIATION category.)

Other codes provide links to identifiers in other categories. The _refln_class_code identifies a set of reflections binned as described by entries in the REFLNS_CLASS category. _refln_scale_group_code identifies groups of reflections to which the same structure-factor scaling has been applied.

Note that the values of the Miller indices in this list must correspond to the cell defined by the lengths and angles recorded in the CELL category; they may, however, be different from the Miller indices in the DIFFRN_REFLN list if a transformation of the original cell has taken place. In this case, the transformation matrix is given using the _diffrn_reflns_transf_matrix_* items.

The usual use of a CIF as an archive of a completed structure determination implies that the values given in the REFLN list are derived from the final cycle of refinement, but this is not a formal requirement. Care should be taken when preparing a CIF for archiving that the structural model corresponds to the refinement cycle summarized in the accompanying REFLN table, especially if the file is constructed from fragments output from different programs. Groups of reflections

| top | pdf |

The data items in these categories are as follows:

(a) REFLNS [Scheme scheme31]

(b) REFLNS_CLASS [Scheme scheme32]

(c) REFLNS_SCALE [Scheme scheme33]

(d) REFLNS_SHELL [Scheme scheme34]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The dagger ([\dagger]) indicates a deprecated item, which should not be used in the creation of new CIFs.

The data items in the REFLNS category describe properties or attributes of the complete set of reflections used in the structure refinement. Several are derivative and may be obtained from the information in the reflections list, but it is convenient to present them separately so that they do not need to be calculated again. They can also be used to check the consistency of the reflections list.

The _reflns_limit_* data items define the upper and lower bounds on the Miller indices and on the interplanar d spacings.

The _reflns_threshold_expression is a text field describing the criterion applied to mark individual reflections as `significantly intense' (i.e. distinct from the background level). This is typically expressed as a multiple of the standard uncertainty on the quantity used in refinement, e.g. I>2u(I).

The number of reflections with values higher than the threshold is reported in _reflns_number_gt. The total number of reflections measured is given by _reflns_number_total. Although the use of these data names appears to be obvious, different practices have been used in the past to report total numbers (e.g. by neglecting symmetry-equivalent reflections) and the definitions in the dictionary should be consulted. Both numbers may contain Friedel-equivalent reflections (those which are symmetry-equivalent under the Laue symmetry but inequivalent under the crystal class).

The proportion of Friedel-related reflections present is reported separately by _reflns_Friedel_coverage, defined as [(N_C-N_L)/N_L], where [N_C] is the number of reflections obtained on averaging under the symmetry of the crystal class and [N_L] is the number obtained on averaging under the Laue class. The definition in the dictionary provides examples of how the value of this data name may be used as an indicator of the fraction of the available reciprocal space sampled in the diffraction experiment.

The deprecated data names _reflns_observed_criterion and _reflns_number_observed reflect the old use of `observed' as a term describing significantly intense reflections. They should not be used in the creation of new CIFs, but are retained to ensure that the information can be extracted from old CIFs.

The free-text field _reflns_special_details can be used to discuss any aspects of the reflections list not covered by other data names. It is recommended that information about the averaging of symmetry-equivalent reflections (including Friedel pairs) should be given here.

The REFLNS_CLASS category is used to summarize the properties of subsets of the reflection list. The data names are analogous to several in the REFLNS and REFINE categories, but are applied to individual classes of reflections labelled by _reflns_class_code and described by _reflns_class_description (see Example[link]).

Example Description of subsets of the reflection list.

[Scheme scheme35]

Individual reflections in the structure-factor listing can be recognized through the matching value of _refln_class_code as belonging to a particular class labelled by _reflns_class_code.

Although classes can be assigned according to arbitrary criteria, the specific case for which the REFLNS_CLASS category was designed was the partitioning of the reflection list into contributions from different components in incommensurately modulated structures. However, the formalism is general and other binning strategies can be described. Note, however, that the specific case of processing of reflections by shells of resolution (in macromolecular crystallography, for example) is handled explicitly by the REFLNS_SHELL category.

The category REFLNS_SCALE provides a listing of the scale factors applied to individual reflections sharing a common value of _refln_scale_group_code. Each value is indexed by the matching identifier _reflns_scale_group_code of this category.

The REFLNS_SHELL category describes the properties of separate resolution shells of reflections and is a special case of the binning of reflections into classes (compare REFLNS_CLASS above).

Each shell is defined by an upper and lower resolution limit ( _reflns_shell_d_res_high and *_low), and for each shell there are data names for the number of reflections measured and exceeding a threshold of significance, for the percentage of geometrically possible reflections collected, and for the ratios of the mean intensities to their standard uncertainties.

Rmerge values are also defined for each shell of resolution (both for all measured reflections and for significantly intense ones).

This category also contains a number of deprecated data names reflecting older terminology and notation. Such data names should not be used in creating new CIFs, but will need to be recognized by CIF-reading software in order to process old CIFs.

to end of page
to top of page