Tables for
Volume F
Crystallography of biological macromolecules
Edited by E. Arnold, D. M. Himmel and M. G. Rossmann

International Tables for Crystallography (2012). Vol. F, ch. 4.3, pp. 130-131   | 1 | 2 |

Section 4.3.3. Engineering proteins with enhanced solubility

Z. S. Derewendaa*

aDepartment of Molecular Physiology and Biological Physics, University of Virginia School of Medicine, Charlottesville, VA 22908–0736, USA
Correspondence e-mail:

4.3.3. Engineering proteins with enhanced solubility

| top | pdf |

The solubility of a protein is the primary essential prerequisite for its crystallization. It should be noted that the expression `low solubility' is often used indiscriminately to describe quite different phenomena, including a propensity to aggregate and precipitate upon overexpression owing to misfolding, amyloid formation and finally genuine low in vitro solubility, i.e. low protein concentration in equilibrium with the solid phase, of otherwise fully folded and stable proteins (Trevino et al., 2008[link]). Here, the strategies and methods that specifically address the latter case are discussed, i.e. precipitation at low concentrations of properly folded proteins.

It has been well established that even single-site mutations of surface residues can dramatically affect the solubility of a protein and its crystallizability (McElroy et al., 1992[link]). Consequently, the intuitively obvious approach is to mutate solvent-exposed hydro­phobic amino acids to hydrophilic residues. In this way, the low solubility of the catalytic domain of HIV-1 integrase was addressed by screening 29 mutants in which hydrophobic residues were systematically mutated to hydrophilic amino acids; of the variants tested, the single-site mutant F185K showed a dramatically improved solubility and ultimately yielded X-ray-quality crystals (Dyda et al., 1994[link]; Jenkins et al., 1995[link]). In the case of leptin, the product of the obese gene, the solubility-enhancing W100E mutation proved to be critical for crystallization of the protein (Zhang et al., 1997[link]). Recently, a screen of several variants of human apo­lipoprotein D identified a triple mutant (W99H, I118S, L120S) which was much more soluble than the wild-type protein and which was ultimately used to obtain well diffracting crystals (Nasreen et al., 2006[link]; Eichinger et al., 2007[link]).

While engineering enhanced solubility using site-directed mutagenesis is potentially a powerful approach, in the absence of structural information it is a challenge to predict which hydrophobic residues are solvent-exposed and might therefore constitute useful targets for mutagenesis. Moreover, even if structural information is available for a homologue or the target itself, it may not be clear what type of mutation actually works best, forcing the investigator to rely on extensive screening. This uncertainty arises from the fact that hydrophobicity scales for individual amino acids cannot be used directly to evaluate the increase or decrease of protein solubility as a consequence of a specific mutation. Furthermore, there have been few rigorous studies of the effects of specific mutations on protein solubility. A notable example is a study on ribonuclease SA in which the solvent-exposed Thr76 was replaced by 19 other amino acids and the solubility of all of the variants was carefully evaluated (Trevino et al., 2007[link]). Those variants that contained Asp, Arg, Glu and Ser were the most soluble. Unexpectedly, even though a lysine might be expected to confer higher solubility than a serine or alanine, the T76S mutation actually led to a significantly higher solubility than T76K, while the T76A variant was only marginally less soluble than T76K (Trevino et al., 2007[link]). The authors of the study concluded that mutating Asn and Gln to their respective acids may constitute the most robust strategy of enhancing solubility. Interestingly, one of the first examples of rational enhancement of solubility, i.e. the study of trimethoprim-resistant type S1 hydrofolate reductase (Dale et al., 1994[link]), used this very strategy: the amide-containing side chains were systematically substituted with carboxylic amino acids and one specific variant, a double mutant N48E, N130D, was found to exhibit markedly increased solubility and ultimately yielded crystals that were suitable for crystallographic analysis.

Somewhat ironically, large charged residues such as glutamate that confer higher solubility on the target protein may at the same time impede crystallization because they increase the total surface side-chain entropy, making the surface recalcitrant to engaging in crystal contact-mediating interactions. Thus, variants engineered for increased solubility may simultaneously show a decreased propensity to crystallize.

Some of the above uncertainties can be overcome with an alternative approach of directed evolution and phenotypic selection methods, in which soluble mutants are directly selected from vast protein libraries (Farinas et al., 2001[link]; Farinas, 2006[link]; Pédelacq et al., 2002[link]; Waldo, 2003[link]; Cabantous et al., 2005[link]). Several different variations of this method have been reported (Waldo, 2003[link]). For example, the target protein may be fused to the N-terminus of a reporter protein such as the green fluorescent protein (GFP; Waldo et al., 1999[link]) or direct detection methods can be employed to identify soluble variants (Peabody & Al-Bitar, 2001[link]). While elegant and potentially very effective, directed evolution has not yet been widely adopted for the generation of crystallizable proteins.

Solubility problems are not always caused by excessively exposed hydrophobic surfaces. In some cases, the root of the problem is aggregation caused by exposed free cysteines. Reduced cysteines can be identified by alkylation with N-­ethylmaleimide or iodoacetamide under anaerobic conditions and subsequent electrospray mass spectrometry (Niessing et al., 2004[link]). Several examples illustrate how this approach is helpful in generating samples that are suitable for crystallization. In mitogen-activated protein (MAP) kinase p38α, a single-site mutation (C162S) prevented aggregation and yielded a crystallizable variant (Patel et al., 2004[link]). Similarly, a double mutant (C95K, C142S) of foot-and-mouth disease virus 3C protease showed none of the aggregation problems that plagued the wild-type protein and was subsequently crystallized (Birtley & Curry, 2005[link]). It is noteworthy that in this case an alternative strategy involving mutations of the exposed hydrophobic residues Met81, Leu82 and Val140 did not eliminate aggregation (Birtley & Curry, 2005[link]). In a number of cases aggregation problems were traced to multiple free cysteines. In She2p, an RNA-binding protein, four cysteines (Cys14, Cys68, Cys106 and Cys180) were mutated to serines in order to overcome oxidation and aggregation (Niessing et al., 2004[link]). In an extreme case, that of human maspin, which is a serpin with antitumour activities, all unpaired cysteines were mutated (C20S, C34A, C183S, C205S, C214S, C297S, C373S) in an effort to obtain a soluble crystallizable variant (Al-Ayyoubi et al., 2004[link]).


Al-Ayyoubi, M., Gettins, P. G. & Volz, K. (2004). Crystal structure of human maspin, a serpin with antitumor properties: reactive center loop of maspin is exposed but constrained. J. Biol. Chem. 279, 55540–55544.
Birtley, J. R. & Curry, S. (2005). Crystallization of foot-and-mouth disease virus 3C protease: surface mutagenesis and a novel crystal-optimization strategy. Acta Cryst. D61, 646–650.
Cabantous, S., Pedelacq, J. D., Mark, B. L., Naranjo, C., Terwilliger, T. C. & Waldo, G. S. (2005). Recent advances in GFP folding reporter and split-GFP solubility reporter technologies. Application to improving the folding and solubility of recalcitrant proteins from Mycobacterium tuberculosis. J. Struct. Funct. Genomics, 6, 113–119.
Dale, G. E., Broger, C., Langen, H., D'Arcy, A. & Stuber, D. (1994). Improving protein solubility through rationally designed amino acid replacements: solubilization of the trimethoprim-resistant type S1 dihydrofolate reductase. Protein Eng. 7, 933–939.
Dyda, F., Hickman, A. B., Jenkins, T. M., Engelman, A., Craigie, R. & Davies, D. R. (1994). Crystal structure of the catalytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases. Science, 266, 1981–1986.
Eichinger, A., Nasreen, A., Kim, H. J. & Skerra, A. (2007). Structural insight into the dual ligand specificity and mode of high density lipoprotein association of apolipoprotein D. J. Biol. Chem. 282, 31068–31075.
Farinas, E. T. (2006). Directed evolution approaches for protein engineering. Comb. Chem. High Throughput Screen. 9, 235–236.
Farinas, E. T., Bulter, T. & Arnold, F. H. (2001). Directed enzyme evolution. Curr. Opin. Biotechnol. 12, 545–551.
Jenkins, T. M., Hickman, A. B., Dyda, F., Ghirlando, R., Davies, D. R. & Craigie, R. (1995). Catalytic domain of human immunodeficiency virus type 1 integrase: identification of a soluble mutant by systematic replacement of hydrophobic residues. Proc. Natl Acad. Sci. USA, 92, 6057–6061.
McElroy, H. H., Sisson, G. W., Schottlin, W. E., Aust, R. M. & Villafranca, J. E. (1992). Studies on engineering crystallizability by mutation of surface residues of human thymidylate synthase. J. Cryst. Growth, 122, 265–272.
Nasreen, A., Vogt, M., Kim, H. J., Eichinger, A. & Skerra, A. (2006). Solubility engineering and crystallization of human apolipoprotein D. Protein Sci. 15, 190–199.
Niessing, D., Huttelmaier, S., Zenklusen, D., Singer, R. H. & Burley, S. K. (2004). She2p is a novel RNA binding protein with a basic helical hairpin motif. Cell, 119, 491–502.
Patel, S. B., Cameron, P. M., Frantz-Wattley, B., O'Neill, E., Becker, J. W. & Scapin, G. (2004). Lattice stabilization and enhanced diffraction in human p38 alpha crystals by protein engineering. Biochim. Biophys. Acta, 1696, 67–73.
Peabody, D. S. & Al-Bitar, L. (2001). Isolation of viral coat protein mutants with altered assembly and aggregation properties. Nucleic Acids Res. 29, E113.
Pédelacq, J. D., Piltch, E., Liong, E. C., Berendzen, J., Kim, C.-Y., Rho, B.-S., Park, M. S., Terwilliger, T. C. & Waldo, G. S. (2002). Engineering soluble proteins for structural genomics. Nat. Biotechnol. 20, 927–932.
Trevino, S. R., Scholtz, J. M. & Pace, C. N. (2007). Amino acid contribution to protein solubility: Asp, Glu, and Ser contribute more favorably than the other hydrophilic amino acids in RNase Sa. J. Mol. Biol. 366, 449–460.
Trevino, S. R., Scholtz, J. M. & Pace, C. N. (2008). Measuring and increasing protein solubility. J. Pharm. Sci. 97, 4155–4166.
Waldo, G. S. (2003). Genetic screens and directed evolution for protein solubility. Curr. Opin. Chem. Biol. 7, 33–38.
Waldo, G. S., Standish, B. M., Berendzen, J. & Terwilliger, T. C. (1999). Rapid protein-folding assay using green fluorescent protein. Nature Biotechnol. 17, 691–695.
Zhang, F., Basinski, M. B., Beals, J. M., Briggs, S. L., Churgay, L. M., Clawson, D. K., DiMarchi, R. D., Furman, T. C., Hale, J. E., Hsiung, H. M., Schoner, B. E., Smith, D. P., Zhang, X. Y., Wery, J.-P. & Schevitz, R. W. (1997). Crystal structure of the obese protein leptin-E100. Nature (London), 387, 206–209.

to end of page
to top of page