International
Tables for
Crystallography
Volume G
Definition and exchange of crystallographic data
Edited by S. R. Hall and B. McMahon

International Tables for Crystallography (2006). Vol. G, ch. 3.6, pp. 152-158

Section 3.6.6.1. Phasing

P. M. D. Fitzgerald,a* J. D. Westbrook,b P. E. Bourne,c B. McMahon,d K. D. Watenpaughe and H. M. Bermanf

aMerck Research Laboratories, Rahway, New Jersey, USA,bProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA,cResearch Collaboratory for Structural Bioinformatics, San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA,dInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England,eretired; formerly Structural, Analytical and Medicinal Chemistry, Pharmacia Corporation, Kalamazoo, Michigan, USA, and fProtein Data Bank, Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, Department of Chemistry and Chemical Biology, 610 Taylor Road, Piscataway, New Jersey, USA
Correspondence e-mail:  paula_fitzgerald@merck.com

3.6.6.1. Phasing

| top | pdf |

The categories describing phasing are as follows:

PHASING group
Overall description of phasing (§3.6.6.1.1[link])
 PHASING
Phasing via molecular averaging (§3.6.6.1.2[link])
 PHASING_AVERAGING
Phasing via isomorphous replacement (§3.6.6.1.3[link])
 PHASING_ISOMORPHOUS
Phasing via multiple-wavelength anomalous dispersion (§3.6.6.1.4[link])
 PHASING_MAD
 PHASING_MAD_CLUST
 PHASING_MAD_EXPT
 PHASING_MAD_RATIO
 PHASING_MAD_SET
Phasing via multiple isomorphous replacement (§3.6.6.1.5[link])
 PHASING_MIR
 PHASING_MIR_DER
 PHASING_MIR_DER_REFLN
 PHASING_MIR_DER_SHELL
 PHASING_MIR_DER_SITE
 PHASING_MIR_DER_SHELL
Phasing data sets (§3.6.6.1.6[link])
 PHASING_SET
 PHASING_SET_REFLN

The data items in the PHASING category group can be used to record details about the phasing of the structure and cover the various methods used in the phasing process. Many data items are provided for multiple isomorphous replacement (MIR) and multiple-wavelength anomalous dispersion (MAD). More limited sets of data items are provided for phasing using molecular averaging and phasing via using a structure that is isomorphous to the present structure. The current version of the mmCIF dictionary does not provide specific data items for recording the details of phasing via molecular replacement.

3.6.6.1.1. Overall description of phasing

| top | pdf |

The single data item in this category is as follows:

PHASING [Scheme scheme38]

The bullet ([\bullet]) indicates a category key.

Phasing of macromolecular structures often involves the application of more than one of the methods described in the PHASING section of the mmCIF dictionary, such as when phases generated from a multiple isomorphous replacement experiment are improved by molecular averaging. The PHASING category is used to list the methods that were used.

At present, the category contains a single data item, the purpose of which is to specify the method employed in the structure determination. It may have one or more of the values listed in the dictionary (Example 3.6.6.1[link]).

Example 3.6.6.1. The methods used to generate the phases for a hypothetical structure described with the data item in the PHASING category.

[Scheme scheme40]

3.6.6.1.2. Phasing via molecular averaging

| top | pdf |

The data items in this category are as follows:

PHASING_AVERAGING [Scheme scheme39]

The bullet ([\bullet]) indicates a category key. The arrow ([\rightarrow]) is a reference to a parent data item.

When more than one copy of a molecule is present in the asymmetric unit, phases can be improved by averaging an electron-density map over the multiple images of the molecule. In some special cases with very high noncrystallographic symmetry, de novo phases have been derived by iterative application of molecular averaging, but more often averaging is used to improve phases determined by another method.

There are many protocols used for phasing with averaging and they are very varied. It was not thought to be appropriate to specify data items for any one approach in the current version of the mmCIF dictionary. The data items that are provided allow a text-based description of the protocol to be given; a formalism for recording a fully parsable description of molecular averaging needs to be developed for future revisions of the dictionary.

Data items in the PHASING_AVERAGING category allow free-text descriptions to be given of the method used for structure determination or phase improvement using averaging over multiple observations of the molecule in the asymmetric unit and of any specific details of the application of the method to the current structure determination (Example 3.6.6.2[link]). Note that the reference to the method is to be used to describe the method itself, and not as a reference to a software package; references to software packages would be made using data items in the SOFTWARE category.

Example 3.6.6.2. Phase improvement with molecular averaging for a hypothetical structure described with data items in the PHASING_AVERAGING category.

[Scheme scheme41]

3.6.6.1.3. Phasing via isomorphous replacement

| top | pdf |

The data items in this category are as follows:

PHASING_ISOMORPHOUS [Scheme scheme42]

The bullet ([\bullet]) indicates a category key. The arrow ([\rightarrow]) is a reference to a parent data item.

Phases for many macromolecular structures are obtained from a previous determination of the same structure in the same crystal lattice. Examples of this are the determination of the structure of a point mutant or the determination of a structure in which a ligand is bound to an active site that was empty in the previous structure determination. In these cases, the new structure is essentially isomorphous with the parent structure, hence this method of phasing is termed `isomorphous phasing' in the mmCIF dictionary. It is not to be confused with multiple isomorphous phasing (MIR), a phasing technique that involves the use of heavy-atom derivatives. MIR phasing is discussed in Section 3.6.6.1.5[link].

Not much information is needed to characterize isomorphous phasing. The `parent' structure (the structure used to generate the initial phases for the present structure) is described in a free-text field and a second free-text field can be used to give details of the application of the method to the determination of the present structure (for instance, the removal of solvent or a bound ligand). In Example 3.6.6.3[link], the parent structure is the PDB entry 5HVP and the structure that is the subject of the present data block is identified as `HVP+CmpdA'. _phasing_isomorphous.method allows any formal techniques that were used in the application of the method to the present structure determination to be described, for example rigid-body refinement. Note that this data item is not to be used to reference a software package; this would be done using data items in the SOFTWARE category.

Example 3.6.6.3. Isomorphous replacement phasing of an HIV-1 protease structure described using data items in the PHASING_ISOMORPHOUS category.

[Scheme scheme43]

3.6.6.1.4. Phasing via multiple-wavelength anomalous dispersion

| top | pdf |

The data items in these categories are as follows:

(a) PHASING_MAD [Scheme scheme44]

(b) PHASING_MAD_CLUST [Scheme scheme45]

(c) PHASING_MAD_EXPT [Scheme scheme46]

(d) PHASING_MAD_RATIO [Scheme scheme47]

(e) PHASING_MAD_SET [Scheme scheme48]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ([\rightarrow]) is a reference to a parent data item.

PHASING_MAD and related categories are used to provide information about phasing using the multiple-wavelength anomalous dispersion (MAD) technique. The data model used for MAD phasing in the current version of the mmCIF dictionary is that of Hendrickson, as exemplified in the structure determination of N-cadherin (Shapiro et al., 1995[link]; Example 3.6.6.4[link]). In current practice, MAD phasing is often treated as a special case of MIR phasing and the PHASING_MIR categories would be more appropriate to describe the results.

Example 3.6.6.4. MAD phasing of the structure of N-cadherin (Shapiro et al., 1995[link]) described using data items in the PHASING_MAD and related categories.

[Scheme scheme49]

Unlike the PHASING_MIR categories, there is no provision in the current mmCIF model of MAD phasing for analysis of the overall phasing statistics and the contribution to the phasing of each data set by bins of resolution, and no provision for giving a list of the phased reflections. This will need to be addressed in future versions of the mmCIF dictionary.

The relationships between categories describing MAD phasing are shown in Fig. 3.6.6.1[link].

[Figure 3.6.6.1]

Figure 3.6.6.1 | top | pdf |

The family of categories used to describe MAD phasing. Boxes surround categories of related data items. Data items that serve as category keys are preceded by a bullet ([\bullet]). Lines show relationships between linked data items in different categories with arrows pointing at the parent data items.

Data items in the PHASING_MAD category allow a brief overview of the method that was used to be given and allow special aspects of the phasing strategy to be noted; data items in this category are analogous to the data items in the other overview categories describing phasing techniques.

In the data model for MAD phasing used in the present version of the mmCIF dictionary, a collection of data sets measured at different wavelengths can be used to construct more than one set of phases. These phase sets will produce electron-density maps with different local properties. The model of the structure is often constructed using information from a collection of these maps. The collections of multiple phase sets are referred to as `experiments' and the groups of data sets that contribute to each experiment are referred to as `clusters'. Data items in PHASING_MAD_EXPT identify each experiment and give the number of contributing clusters. Additional data items record the phase difference between the structure factors due to normal scattering from all atoms and from only the anomalous scatterers, the standard uncertainty of this quantity, the mean figure of merit, and a number of other indicators of the quality of the phasing.

Data items in the PHASING_MAD_CLUST category can be used to label the clusters of data sets and give the number of data sets allocated to each cluster. In Example 3.6.6.4[link] two experiments are described. The first experiment contains two clusters, one of which contains four data sets and the second of which contains five data sets. The second experiment contains a single cluster of five data sets. Note that the author has chosen informative labels to identify the clusters (`four wavelength', `five wavelength'). Carefully chosen labels can help someone reading the mmCIF to trace the complex relationships between the categories.

Data items in the PHASING_MAD_RATIO category can be used to record the ratios of phasing statistics (Bijvoet differences) between pairs of data sets in a MAD phasing experiment, within shells of resolution characterized by _phasing_MAD_ratio.d_res_high and *.d_res_low.

The data sets used in the MAD phasing experiments are described using data items in the PHASING_MAD_SET category. Each data set is characterized by resolution shell and wavelength, and by the [f'] and [f''] components of the anomalous scattering factor at that wavelength. The actual observations in each data set and the experimental conditions under which they were made are recorded using data items in the PHASING_SET and PHASING_SET_REFLN categories.

3.6.6.1.5. Phasing via multiple isomorphous replacement

| top | pdf |

The data items in these categories are as follows:

(a) PHASING_MIR [Scheme scheme50]

(b) PHASING_MIR_SHELL [Scheme scheme51]

(c) PHASING_MIR_DER [Scheme scheme52]

(d) PHASING_MIR_DER_REFLN [Scheme scheme53]

(e) PHASING_MIR_DER_SHELL [Scheme scheme54]

(f) PHASING_MIR_DER_SITE [Scheme scheme55]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ([\rightarrow]) is a reference to a parent data item. Data items marked with a plus (+) have companion data names for the standard uncertainty in the reported value, formed by appending the string _esd to the data name listed.

PHASING_MIR and related categories provide information about phasing by methods involving multiple isomorphous replacement (MIR). These same categories may also be used to describe phasing by related techniques, such as single isomorphous replacement (SIR) and single or multiple isomorphous replacement plus anomalous scattering (SIRAS, MIRAS). The relationships between the categories describing MIR phasing are shown in Fig. 3.6.6.2[link].

[Figure 3.6.6.2]

Figure 3.6.6.2 | top | pdf |

The family of categories used to describe MIR phasing. Boxes surround categories of related data items. Data items that serve as category keys are preceded by a bullet ([\bullet]). Lines show relationships between linked data items in different categories with arrows pointing at the parent data items.

As with the other overview categories described in this section, the PHASING_MIR category contains data items that can be used for text-based descriptions of the method used and any special aspects of its application. There are also items for describing the resolution limit of the reflections that were phased, the figures of merit for all reflections and for the acentric reflections phased in the native data set, and the total numbers of reflections and their inclusion threshold in the native data set. Statistics for the phasing can be given by shells of resolution using data items in the PHASING_MIR_SHELL category.

An MIR phasing experiment involves one or more derivatives. The remaining categories in this group are used to describe aspects of each derivative (Example 3.6.6.5[link]). A derivative in this context does not necessarily correspond to a data set; for instance, the same data set could be used to one resolution limit as an isomorphous scatterer and to a different resolution (and with a different sigma cutoff) as an anomalous scatterer. These would be treated as two distinct derivatives, although both derivatives would point to the same data sets via _phasing_MIR_der.der_set_id and _phasing_MIR_der.native_set_id (see Fig. 3.6.6.2[link]).

Example 3.6.6.5. Phasing of the structure of bovine plasma retinol-binding protein (Zanotti et al., 1993[link]) described using data items in the PHASING_MIR and related categories.

[Scheme scheme56]

Data items in the PHASING_MIR_DER category can be used to identify and describe each derivative. The resolution limits for the individual derivatives need not match those of the overall phasing experiment, as the phasing power of each derivative as a function of resolution will vary. Many of the statistical descriptors of phasing given in the PHASING_MIR category are repeated in this category, as derivatives vary in quality and their contribution to the phasing must be assessed individually. These same statistical measures can be given for shells of resolution in the PHASING_MIR_DER_SHELL category.

Data items in the PHASING_MIR_DER_REFLN category can be used to provide details of each reflection used in an MIR phasing experiment. The pointer _phasing_MIR_der_refln.set_id links the reflection to a particular set of experimental data and _phasing_MIR_der_refln.der_id points to a particular derivative used in the phasing (as mentioned above, derivatives in this context do not equate to data sets). The phase assigned to each reflection and the measured and calculated values of its structure factor can be given. (It is not necessary to include the measured values of the structure factors in this list, since they are accessible in the PHASING_SET_REFLN category, but it may be convenient to present them here). Data items are also provided for the A, B, C and D phasing coefficients of Hendrickson & Lattman (1970[link]).

The heavy atoms identified in each derivative can be listed using data items in the PHASING_MIR_DER_SITE category. Most of the data names are clear analogues of similar items in the ATOM_SITE category; an exception is _phasing_MIR_der_site.occupancy_anom, which specifies the relative anomalous occupancy of the atom type present at a heavy-atom site in a particular derivative.

3.6.6.1.6. Phasing data sets

| top | pdf |

The data items in these categories are as follows:

(a) PHASING_SET [Scheme scheme57]

(b) PHASING_SET_REFLN [Scheme scheme58]

The bullet ([\bullet]) indicates a category key. Where multiple items within a category are marked with a bullet, they must be taken together to form a compound key. The arrow ([\rightarrow]) is a reference to a parent data item.

Data items in the PHASING_SET family of categories are homologous to items with related names in the CELL and DIFFRN families of categories. The PHASING_SET categories were added to the mmCIF data model so that intensity and phase information for the data sets used in phasing could be stored in the same data block as the information for the refined structure. It is not necessary to store all the experimental information for each data set (e.g. the raw data sets or crystal growth conditions); it is assumed that the full experimental description of each phasing set would be recorded in a separate data block (see Example 3.6.6.6[link]).

Example 3.6.6.6. The phasing sets used in the structure determination of bovine plasma retinol-binding protein (Zanotti et al., 1993[link]) described with data items in the PHASING_SET and PHASING_SET_REFLN categories.

[Scheme scheme59]

Data items in the PHASING_SET category identify each set of diffraction data used in a phasing experiment and can be used to summarize relevant experimental conditions. Because a given data set may be used in a number of different ways (for example, as an isomorphous derivative and as a component of a multiple-wavelength calculation), it is appropriate to store the reflections in a category distinct from either the PHASING_MAD or PHASING_MIR family of categories, but accessible to both these families (and any similar categories that might be introduced later to describe new phasing methods). Figs. 3.6.6.1[link] and 3.6.6.2[link] show how reference is made to the relevant sets from within the PHASING_MAD and PHASING_MIR categories.

Each phasing set is given a unique value of _phasing_set.id. The other PHASING_SET data items record the cell dimensions and angles associated with each phasing set, the wavelength of the radiation used in the experiment, the source of the radiation, the detector type, and the ambient temperature.

Data items in the PHASING_SET_REFLN category are used to record the values of the measured structure factors and their uncertainties. Several distinct data sets may be present in this list, with reflections in each set identified by the appropriate value of _phasing_set_refln.set_id.

References

Hendrickson, W. A. & Lattman, E. E. (1970). Representation of phase probability distributions for simplified combination of independent phase information. Acta Cryst. B26, 136–143.
Shapiro, L., Fannon, A. M., Kwong, P. D., Thompson, A., Lehmann, M. S., Grubel, G., Legrand, J. F., Als-Nielsen, J., Colman, D. R. & Hendrickson, W. A. (1995). Structural basis of cell–cell adhesion by cadherins. Nature (London), 374, 327–337.
Zanotti, G., Berni, R. & Monaco, H. L. (1993). Crystal structure of liganded and unliganded forms of bovine plasma retinol-binding protein. J. Biol. Chem. 268, 10728–10738.








































to end of page
to top of page