Tables for
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G, ch. 3.6, pp. 158-159

Section Overall description of the refinement

P. M. D. Fitzgerald,a* J. D. Westbrook,b P. E. Bourne,c B. McMahon,d K. D. Watenpaughe and H. M. Bermanf

aMerck Research Laboratories, Rahway, New Jersey, USA,bProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA,cResearch Collaboratory for Structural Bioinformatics, San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA,dInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England,eretired; formerly Structural, Analytical and Medicinal Chemistry, Pharmacia Corporation, Kalamazoo, Michigan, USA, and fProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA
Correspondence e-mail: Overall description of the refinement

| top | pdf |

The data items in these categories are as follows:

(a) REFINE [Scheme scheme60]

(b) REFINE_FUNCT_MINIMIZED [Scheme scheme61]

The bullet ([\bullet]) indicates a category key. The arrow ([\rightarrow]) is a reference to a parent data item. Items in italics have aliases in the core CIF dictionary formed by changing the full stop (.) to an underscore (_) except where indicated by the [\sim] symbol. Data items marked with a plus (+) have companion data names for the standard uncertainty in the reported value, formed by appending the string _esd to the data name listed.

There is already an extensive set of data names in the REFINE category of the core dictionary, and Section[link] should be read with the present section. The only data items discussed in this section are entries in the mmCIF dictionary that do not have a counterpart in the core CIF dictionary. Analogues of a number of R factors in the core CIF dictionary have been added to the mmCIF dictionary to express these same R factors independently for the free and working sets of reflections. The remaining new data items have more specialized roles, which are discussed below.

The data item _refine.entry_id has been added to the REFINE category to provide the formal category key required by the DDL2 data model.

Many macromolecular structure refinements now use the statistical cross-validation technique of monitoring a `free' R factor (Brünger, 1997[link]). Rfree is calculated the same way as the conventional least-squares R factor, but using a small subset of reflections that are not used in the refinement of the structural model. Thus Rfree tests how well the model predicts experimental observations that are not themselves used to fit the model.

The mmCIF dictionary provides data names for Rfree and for the complementary Rwork values for the `working' set of reflections, which are the reflections that are used in the refinement. Separate data items are provided for unweighted and weighted versions of each R factor. A fixed percentage of the total number of reflections is usually assigned to the free group, and this percentage can be specified. Further details about the method used for selecting the free reflections can be given using _reflns.R_free_details. The estimated error in the Rfree value may also be given, along with the method used for determining its value.

The purposes of having a set of reflections that are not used in the refinement are to monitor the progress of the refinement and to ensure that the R factor is not being artificially reduced by the introduction of too many parameters. However, as the refinement converges, the working and free R factors both approach stable values. It is common practice, particularly in structures at high resolution, to stop monitoring Rfree at this point and to include all the reflections in the final rounds of refinement. It is thus worth noting a distinction between _refine.ls_R_factor_obs and _refine.ls_R_factor_R_work: _refine.ls_R_factor_obs relates to a refinement in which all reflections more intense than a specified threshold were used, while _refine.ls_R_factor_R_work relates to a refinement in which a subset of the observed reflections were excluded from the refinement and were used to calculate the free R factor. The dictionary allows the use of both values if a free R factor were calculated for most of the refinement, but all of the observed reflections were used in the final rounds of refinement; the protocol for this may be explained in _refine.details. When a full history of the refinement is provided using data items in the REFINE_HIST category, it is preferable to specify a change in protocol using data items in this category.

Other data items help to provide an assessment of the quality of the refinement. The scale-independent correlation coefficient between the observed and calculated structure factors may be recorded for the reflections included in the refinement using the data item _refine.correlation_coeff_Fo_to_Fc. There is a similar data item for the reflections that were not included in the refinement.

Overall standard uncertainties for positional and displacement parameters can be recorded according to a number of conventions. A maximum-likelihood residual for the positional parameters can be given using _refine.overall_SU_ML and the corresponding value for the displacement parameters can be given using _refine.overall_SU_B. Diffraction-component precision indexes for the displacement parameters based on the crystallographic R factor (the Cruickshank DPI; Cruickshank, 1999[link]) can be given using _refine.overall_SU_R_Cruickshank_DPI. The corresponding value for Rfree can be given using _refine.overall_SU_R_free.

The quality of a data set used for the refinement of a macromolecular structure is often given not only in terms of the scaling residuals, but also in terms of the data redundancy (the ratio of the number of reflections measured to the number of crystallographically unique reflections). Data items are provided to express the redundancy of all reflections, as well as those that have been marked as `observed' (i.e. exceeding the threshold for inclusion in the refinement). The percentage of the total number of reflections that are considered observed is another metric of the quality of the data set, and a data item is provided for this ( _refine.ls_percent_reflns_obs).

The limited resolution of many macromolecular data sets makes it inappropriate to refine anisotropic displacement factors for each atom. For these low- to medium-resolution studies, an overall anisotropic displacement model may be refined. The data items _refine.aniso_B* are provided for recording the unique elements of the matrix that describes the refined anisotropy.

The two-parameter method for modelling the contribution of the bulk solvent to the scattering proposed by Tronrud is used in several refinement programs. The data items _refine.solvent_model_* can be used to record the scale and displacement factors of this model, and any special aspects of its application to the refinement.

The average phasing figure of merit can be given for the working and free reflections. Unusually high or low values of displacement factors or occupancies can be a sign of problems with the refinement, so data items are provided to record the high, low and mean values of each. Further indicators of the quality of the refinement are found in the REFINE_ANALYZE category (Section[link]).

The data items in the REFINE_FUNCT_MINIMIZED category allow a brief description of the function minimized during refinement to be given (Example[link]). It is not possible to reconstruct the functioned minimized during the refinement by automatic parsing of the values of these data items, but the details given in them may still be helpful to someone reading the mmCIF.

Example Results of the overall refinement of an HIV-1 protease structure (PDB 5HVP) described using data items in the REFINE and REFINE_FUNCT_MINIMIZED categories.

[Scheme scheme62]


Brünger, A. T. (1997). Free R value: cross-validation in crystallography. Methods Enzymol. 277, 366–396.
Cruickshank, D. W. J. (1999). Remarks about protein structure precision. Acta Cryst. D55, 583–601.

to end of page
to top of page