Tables for
Volume F
Crystallography of biological macromolecules
Edited by M. G. Rossmann and E. Arnold

International Tables for Crystallography (2006). Vol. F, ch. 4.3, p. 102   | 1 | 2 |

Section 4.3.6. Avoiding protein heterogeneity

D. R. Daviesa* and A. Burgess Hickmana

aLaboratory of Molecular Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0560, USA
Correspondence e-mail:

4.3.6. Avoiding protein heterogeneity

| top | pdf |

Protein heterogeneity can arise from many sources, including proteolysis, oxidation and post-translational modifications, and can have a severe effect on crystal quality or can prevent crystallization altogether. Limited proteolysis has frequently been used to modify proteins for crystallization, in order to avoid heterogeneity from proteolysis occurring during expression and to remove relatively unstructured regions that might hinder crystallization. Some examples are given below.

Windsor et al. (1996[link]) crystallized a complex of interferon γ with the extracellular domain of the interferon γ cell surface receptor. To obtain satisfactory crystals, it was necessary to re-engineer the receptor with an eight-amino-acid residue deletion at the N-terminus to avoid the observed heterogeneity owing to proteolysis, since 2–10% of the purified protein was cleaved during expression.

Crucial to the structure determination of the complex of transducin-α bound to GTPγS (Noel et al., 1993[link]) was the systematic examination of proteolysis of the intact protein (Mazzoni et al., 1991[link]). This work revealed a cluster of protease-sensitive sites near residues Lys17–Lys25. Homogeneous material consisting of residues 26–350 of activated rod transducin, [\hbox{G}_{{\rm t}\alpha}], was obtained by proteolysis of the full-length protein with endoproteinase LysC; the truncated protein was subsequently used to solve the structure.

Hickman et al. (1997[link]) identified a site near the C-terminus of HIV-1 integrase that was susceptible to proteolytic cleavage during protein expression, resulting in severe protein heterogeneity in which up to 30% of the purified protein was cleaved. The proteolysis site was identified by mass spectrometry analysis, and several point mutations on either side of this site were made and evaluated for their effect on proteolysis. Substitution of either Gly or Lys for Arg284 eliminated the protease sensitivity, yielding homogeneous material.

Some proteins have surface cysteines that are susceptible to oxidation and can be adventitiously cross-linked via a disulfide bridge that does not exist in the native protein. If there are relatively few cysteines, this problem may be circumvented by mutating the individual cysteines to determine which ones are responsible. Conversely, cysteines can be introduced into proteins to enhance the binding of interacting molecules (see also Section 4.3.8[link]). An elegant example of the latter case is provided by the recent structure of HIV-1 reverse transcriptase (Huang et al., 1998[link]), which was mutated to introduce a cysteine in a position near the known binding side of the double-stranded DNA substrate. Using an oligonucleotide with a modified base that contained a free thiol group, cross-links were specifically introduced between the protein and the DNA; this covalently linked complex was used to obtain crystals that contained the incoming nucleoside triphosphate, a crystallographic problem that had defied other solutions.

Post-translationally modified proteins, such as glycoproteins, present some of the most difficult problems in X-ray crystallography, since the carbohydrate side chains are usually flexible and often heterogeneous. In some cases, enzymes can be used to trim the carbohydrate and produce a protein suitable for crystallization. Alternatively, the protein sequence can be altered so that unwanted glycosylation does not occur. A combination of approaches was used by Kwong et al. (1998[link]) to determine the structure of the HIV-1 envelope glycoprotein, gp120, a protein which is extensively modified in vivo. The N- and C-termini were truncated, 90% of the carbohydrate was removed by deglycosylation and two large, flexible loops of the protein were replaced by tripeptides. The resulting simplified version of the glycoprotein retained its ability to bind the CD4 receptor, and crystals were ultimately obtained of a ternary complex of the envelope glycoprotein, a two-domain fragment of CD4 and an antibody Fab.

Occasionally, an mRNA sequence will fortuitously result in a false initiation of translation, resulting in a truncated form co-purifying with the intended protein. In attempting to crystallize a trimethoprim-resistant form of dihydrofolate reductase, Dale et al. (1994[link]) observed that a fragment of the protein was being expressed through false initiation of translation, beginning at Ala43. They also found most of the protein in inclusion bodies and recovery was poor. They noticed that there was a putative Shine–Dalgarno sequence ten nucleotides up from the AUG codon of Met42, which could result in the expression of a smaller protein. They replaced the middle base of the Shine–Dalgarno sequence, GGGAA, with GGCAA and removed unusual codons from the first 18 amino acids. These two changes resulted in a 20-fold increase in expression level, together with removal of the contaminating fragment. Similar heterogeneity problems owing to translation initiation at an internal Shine–Dalgarno sequence upstream of Met50 were observed during expression of full-length recombinant HIV-1 integrase and were also resolved by altering the DNA to eliminate the Shine–Dalgarno sequence without changing the sequence of encoded amino acids (Hizi & Hughes, 1988[link]).


Dale, G. E., Broger, C., Langen, H., D'Arcy, A. & Stüber, D. (1994). Improving protein solubility through rationally designed amino acid replacements: solubilization of the trimethoprim-resistant type S1 dihydrofolate reductase. Protein Eng. 7, 933–939.
Hickman, A. B., Dyda, F. & Craigie, R. (1997). Heterogeneity in recombinant HIV-1 integrase corrected by site-directed mutagenesis: the identification and elimination of a protease cleavage site. Protein Eng. 10, 601–606.
Hizi, A. & Hughes, S. H. (1988). Expression of the Moloney murine leukemia virus and human immunodeficiency virus integration proteins in Escherichia coli. Virology, 167, 634–638.
Huang, H., Chopra, R., Verdine, G. L. & Harrison, S. C. (1998). Structure of a covalently trapped catalytic complex of HIV-1 reverse transcriptase: implications for drug resistance. Science, 282, 1669–1675.
Kwong, P. D., Wyatt, R., Robinson, J., Sweet, R. W., Sodroski, J. & Hendrickson, W. A. (1998). Structure of an HIV gp120 envelope glycoprotein in complex with the CD4 receptor and a neutralizing human antibody. Nature (London), 393, 648–659.
Mazzoni, M. R., Malinski, J. A. & Hamm, H. E. (1991). Structural analysis of rod GTP-binding protein, Gt. J. Biol. Chem. 266, 14072–14081.
Noel, J. P., Hamm, H. E. & Sigler, P. B. (1993). The 2.2 Å crystal structure of transducin-α complexed with GTPγS. Nature (London), 366, 654–663.
Windsor, W. T., Walter, L. J., Syto, R., Fossetta, J., Cook, W. J., Nagabhushan, T. L. & Walter, M. R. (1996). Purification and crystallization of a complex between human interferon γ receptor (extracellular domain) and human interferon γ. Proteins, 26, 108–114.

to end of page
to top of page