International
Tables for
Crystallography
Volume F
Crystallography of biological macromolecules
Edited by E. Arnold, D. M. Himmel and M. G. Rossmann

International Tables for Crystallography (2012). Vol. F, ch. 24.1, pp. 827-832   | 1 | 2 |
https://doi.org/10.1107/97809553602060000896

Chapter 24.1. The Worldwide Protein Data Bank

H. M. Berman,a* K. Henrick,b G. Kleywegt,b H. Nakamurac and J. Markleyd

aRCSB Protein Data Bank, Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA,bProtein Data Bank in Europe (PDBe), EMBL–EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK,cPDBj, Institute for Protein Research, Osaka University, Japan, and dBioMagResBank, University of Wisconsin–Madison, Madison, Wisconsin 53706, USA
Correspondence e-mail:  berman@rcsb.rutgers.edu

The Protein Data Bank (PDB) is the single, freely available, global archive of structural data for biological macromolecules. It is maintained by the wwPDB consortium consisting of the Research Collaboratory for Structural Bioinformatics (RCSB PDB), the Protein Data Bank in Europe (PDBe), the PDB Japan (PDBj) and the BioMagResBank (BMRB). This chapter describes the organization of the wwPDB, the systems in place for data deposition, annotation and distribution, and a summary of the services provided by the wwPDB member sites.

24.1.1. Introduction

| top | pdf |

The Protein Data Bank (PDB) was established at Brookhaven National Laboratory (BNL) (Bernstein et al., 1977[link]) in 1971 as an archive for biological macromolecular crystal structures (see Chapter 24.1[link] in the first edition of this volume). It represents one of the earliest community-driven molecular biology data collections. In the beginning, the archive held seven structures, and with each passing year a handful more were deposited. In the 1980s, the number of deposited structures began to increase dramatically. This was due to technological improvements made in all aspects of the crystallographic process, the addition of structures determined by other methods and changes in com­munity views about data sharing. By the early 1990s, many journals required a PDB ID for publication; now virtually all journals require the PDB deposition of not only coordinates, but also experimental data. As of 31 May 2011, the archive contained more than 73 000 structures.

The initial goal of the PDB was to archive structures determined by X-ray crystallography as submitted by the authors. Today, structures are deposited in the PDB that have been determined using X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and, more recently, cryo-electron microscopy (cryoEM). In addition to the structural biologists who deposit data, the international PDB community includes a diverse group of researchers in the biomedical sciences (bio­logists, chemists, physicists etc.), as well as educators and students at all levels. In order to ensure that the PDB remains a single uniform archive freely accessible via a collection of ftp tools, the worldwide PDB (wwPDB; http://www.wwpdb.org/ ) was formed in 2003 (Berman et al., 2003[link], 2009[link]) with an agreement between three deposition centres: the Research Collaboratory for Structural Bioinformatics (RCSB PDB) in the United States; the Protein Data Bank in Europe (PDBe; previously known as the Macromolecular Structure Database, MSD) at the European Bio­informatics Institute (EBI); and the PDB Japan (PDBj). The Bio­MagResBank (BMRB) in the United States joined in 2006. In this chapter, we describe the collection, validation, annotation and distribution of data by the wwPDB and provide a brief description of the services provided by the member sites.

24.1.2. Data acquisition and processing

| top | pdf |

24.1.2.1. Content of the data collected by the wwPDB

| top | pdf |

The PDB archive consists of entries containing three-dimensional Cartesian coordinates and information specific to the method of structure determination. Related experimental data include structure factors for X-ray experiments and chemical shifts and constraints derived from NMR experiments. NMR depositions in the PDB contain links to the BMRB archive of web-accessible constraints, chemical shifts and other data relevant to NMR structures (Ulrich et al., 1989[link]). In the near future, volumes for electron microscopy will become a part of the PDB archive. Table 24.1.2.1[link] lists the general information that the wwPDB collects for all structures as well as the information specific to X-ray, NMR and cryoEM experiments.

Table 24.1.2.1| top | pdf |
Content of data in the PDB

(a) Content of all depositions
Source – specifications such as genus, species, strain or variant of gene (cloned or synthetic); expression vector and host, or description of method of chemical synthesis
Sequence – full sequence of all macromolecular components
Chemical structure of cofactors and prosthetic groups
Names of all components of the structure
Broad description of the function and/or the composition of the structure
Literature citations for the structure submitted
Three-dimensional coordinates
Keywords and experimental method
 
(b) Additional items for X-ray structure determinations
Temperature factors and occupancies assigned to each atom
Crystallization conditions, including pH, temperature, solvents, salts, methods
Crystal data, including the unit-cell dimensions and space group
Presence of noncrystallographic symmetry
Data-collection information describing the methods used to collect the diffraction data including instrument, wavelength, temperature and processing programs
Data-collection statistics including data coverage, redundancy, Rmerge, <I/σ(I)>, data above 1σ, 2σ, 3σ levels and resolution limits
Refinement information including (free) R factor, resolution limits, number of reflections, method of refinement, σ cutoff, geometry r.m.s.d., σ
Structure factors – h, k, l, Fobs, σ(Fobs), intensity, σ(intensity), flag for free R test
 
(c) Additional items for NMR structure determinations
For an ensemble, the model number for each coordinate set that is deposited and an indication whether one should be designated as a representative
Data-collection information describing the types of methods used, instrumentation, magnetic field strength, console, probe head, sample tube
Sample conditions, including solvent, macromolecule concentration ranges, concentration ranges of buffers, salts, antibacterial agents, other components, isotopic composition
Experimental conditions, including temperature, pH, pressure and oxidation state of structure determination, and estimates of uncertainties in these values
Non-covalent heterogeneity of sample, including self-aggregation, partial isotope exchange, conformational heterogeneity resulting in slow chemical exchange
Chemical heterogeneity of the sample (e.g., evidence for deamidation or minor covalent species)
A list of NMR experiments used to determine the structure including those used to determine resonance assignments, NOE/ROE data, dynamical data, scalar coupling constants, and those used to infer hydrogen bonds and bound ligands. The relationship of these experiments to the constraint files are given explicitly
Constraint files used to derive the structure as described in Task Force recommendations
Links (where available) to associated files at BMRB containing chemical shift, filtered constraint and other NMR data
 
(d) Additional data items for cryo-electron microscopy
Sample preparation, including aggregation state, concentration, buffer, pH, sample support and description of vitrification procedure
3D reconstruction procedure, including method, nominal and actual pixel size, resolution, contrast-transfer-function correction method, magnification calibration and text description of the reconstruction procedure
3D model fitting procedure, including fitting-procedure type and program used, PDB IDs of models used
3D model refinement, including whether refinement is done in real space or reciprocal space, type of protocol, refinement target criteria
If sample is a 2D or 3D array, symmetry and repeat parameters
If the entry contains a virus, details of the virus host including type, species and growth cell, as well as details of the virus including type, isolate and International Committee on Taxonomy of Viruses ID

The definitions of the data items collected are in the PDB Exchange Dictionary (PDBx) (Westbrook, Henrick et al., 2005[link]), which is based on the mmCIF syntax. The mmCIF dictionary contains thousands of terms that define the macromolecular structure and the crystallographic experiment (Fitzgerald et al., 2005[link]). PDBx contains additional terms for NMR and cryoEM. The BMRB-developed NMR-STAR dictionary specifies additional data items linked to NMR structures. These terms were developed in collaboration with the depositors who are experts in these methods. Terms needed for tracking and other information management purposes are also included in PDBx.

24.1.2.2. Data deposition sites

| top | pdf |

Data are deposited to the PDB via one of the wwPDB member sites. Because it is critical that the final archive is kept uniform, the wwPDB works to make sure that the content and format of the final files, as well as the methods used to check them, are the same.

Each deposition to the PDB is represented by a PDB ID – a four character code. The PDB ID is assigned arbitrarily and is an immutable reference to the entry. PDB IDs are never reused and are used as a link between the structure and the literature reference that describes that structure. A unique and immutable integer tag identifies experimental NMR data that are processed and archived at the BMRB. Hyperlinks in the related PDB and BMRB files provide seamless access to all information.

The RCSB PDB (http://deposit.pdb.org/ ) and PDBj (http://pdbdep.protein.osaka-u.ac.jp/ ) use the program ADIT for data deposition and validation. PDBe processes and annotates data that are submitted via AutoDep (http://pdbe.org/deposit ) (Tagari et al., 2006[link]). The wwPDB members are developing a single tool that will be used by all sites for deposition and annotation.

BMRB sites at Madison (http://deposit.bmrb.wisc.edu/bmrb-adit/ ) and Osaka-PDBj (http://nmradit.protein.osaka-u.ac.jp/bmrb-adit/ ) use ADIT-NMR to collect both coordinate and experimental data. NMR experimental data deposited at PDBe are forwarded to BMRB for further processing. NMR experimental data are processed by BMRB and the coordinate data by the other wwPDB sites.

Coordinates and experimental data files from all the sites are sent to the RCSB PDB for inclusion in the archive.

24.1.2.3. Validation and annotation

| top | pdf |

Validation refers to procedures for assessing the quality of the experimental data, the deposited atomic model and the fit of the model to the data. Annotation refers to the process of adding information to the entry (e.g., cross-references to other biological data resources). The wwPDB partners have convened three Validation Task Forces (VTFs) to advise them on how best to validate X-ray, NMR and cryoEM structures. Their recommendations will be implemented as part of the joint deposition and annotation system currently under development.

At present, annotation involves the following components:

Covalent bond distances and angles: Proteins are compared against standard geometry values from Engh & Huber (1991[link]); nucleic acid bases are compared against standard values from Clowney et al. (1996[link]); sugars and phosphates are compared against standard values from Gelbin et al. (1996[link]).

Stereochemical validation: All chiral centres of proteins and nucleic acids are checked for correct stereochemistry.

Atom nomenclature: The nomenclature of all atoms is checked for compliance with IUPAC standards (IUPAC–IUB Joint Com­mission on Biochemical Nomenclature, 1983[link]; Markley et al., 1998[link]) and adjusted if necessary.

Close contacts: The distances between all atoms within the asymmetric unit of crystal structures and the unique molecule of NMR structures are calculated. For crystal structures, contacts between symmetry-related molecules are checked as well.

Ligand and atom nomenclature: Residue and atom nomenclature are compared against a standard dictionary (http://www.wwpdb.org/ccd.html ) for all ligands as well as standard residues and bases. Unrecognized ligand groups are flagged and any discrepancies in known ligands are listed as extra or missing atoms. New ligands are added to the dictionary as they are deposited.

Sequence comparison: The sequence provided by the depositor is compared against the sequence derived from the coordinate records. This information is displayed in a table where any differences or missing residues are annotated. During the annotation process the sequence database references provided by the author are checked for accuracy. If no reference is given, a sequence search against UniProtKB (The UniProt Consortium, 2011[link]) is carried out to find the best match. Any conflict between the depositor's sequence and the sequence derived from the coordinate records is further resolved and annotated by comparison with other sequence databases as needed. Once an entry has been released, the cross-references and mappings to sequence and other databases are maintained and kept up to date in the separate SIFTS resource (http://pdbe.org/sifts ), a joint project of PDBe and UniProt.

Distant waters: The distances between all water oxygen atoms and all polar atoms (oxygen and nitrogen) of the macromolecules, ligands and solvent in the asymmetric unit are calculated. Distant solvent atoms are repositioned using crystallographic symmetry such that they fall within the solvation sphere of the macromolecule.

Geometry: The torsion-angle distributions and peptide-bond deviations from cis and trans conformation are checked. Derived data, such as site records, helix and sheet records for secondary structure, are annotated.

Checks of NMR data by the BMRB: NMR constraints are checked for consistency with the three-dimensional structure and atom nomenclature, chemical shifts are checked for possible referencing errors and outliers.

In almost all cases, serious errors detected by these checks are corrected through annotation and correspondence with the authors.

The wwPDB continuously reviews its annotation methods and will continue to integrate new procedures as they become available and are accepted as community standards.

24.1.2.4. Data processing statistics

| top | pdf |

Fig. 24.1.2.1[link] shows the growth of PDB data since the archive began, indicating how the complexity of structures released into the archive has increased over time.

[Figure 24.1.2.1]

Figure 24.1.2.1 | top | pdf |

Growth chart of the PDB showing the total number of structures available in the PDB archive per year (as of 31 May 2011) and highlighting example structures from different time periods.

As of 31 May 2011, the PDB contains 73 503 publicly accessible structures; of these entries, 64 043 were determined by X-ray methods, 8902 by NMR and 370 by electron microscopy (Fig. 24.1.2.2[link]).

[Figure 24.1.2.2]

Figure 24.1.2.2 | top | pdf |

(a) Distribution of structures by experimental type (as of 31 May 2011). As of this date, the PDB has 73 503 publicly accessible structures; of these entries, 64 043 were determined by X-ray methods, 8902 by NMR and 370 by electron microscopy. (b) Distribution of structures by structure type (as of 31 May 2011). Of the currently available entries, 68 048 are proteins, 2257 are nucleic acids and 3159 are protein/nucleic acid complexes; 39 are other types of molecules.

24.1.2.5. Data uniformity

| top | pdf |

A key goal of the wwPDB is to make the archive as consistent and error-free as possible. Before files are released, all new depositions are reviewed by annotators. Errors found subsequent to release by authors and PDB users are addressed as rapidly as possible and the entries are revised.

The wwPDB member sites collaborate to review all the data in the PDB archive (Henrick et al., 2008[link]; Lawson et al., 2008[link]). In particular, the monomer components and ligands were examined to construct a dictionary that includes the stereochemistry, nomen­clature, ideal coordinates, model coordinates, and SMILES and InChI representations. The Chemical Component Dictionary (http://www.wwpdb.org/ccd.html ) con­tains the definitions for all components found in the PDB archive. All instances of these components have been matched against this dictionary and, where necessary, corrections have been made to the PDB files. The sequences, citations and some experimental data items were also corrected. More recently, the archive been reviewed and remediated with the objective of tackling complex problems, including the representation of biological assemblies, residual B factors, peptide inhibitors and antibiotics, and entries in nonstandard crystal frames. Documentation about ongoing remediation efforts is available from http://wwpdb.org .

24.1.3. Data access

| top | pdf |

24.1.3.1. ftp

| top | pdf |

The `PDB archive' is the collection of flat files maintained in three formats (Westbrook & Fitzgerald, 2009[link]) (http://www.wwpdb.org/docs.html ): the legacy PDB file format (Bernstein et al., 1977[link]); the PDB exchange format (PDBx) that follows the mmCIF syntax (Fitzgerald et al., 2005[link], Westbrook, Henrick et al., 2005[link]) (http://mmcif.pdb.org ); and the PDBML/XML format (Westbrook, Ito et al., 2005[link]) that is a direct translation of the PDB exchange format. In addition to coordinate data, the archive contains experimental data (structure factors, NMR restraints and chemical shifts). Each wwPDB site distributes the same PDB archive via ftp. The archive is updated weekly.

Time-stamped snapshots of the PDB archive are added each year to ftp://snapshots.wwpdb.org . They provide a frozen copy of the archive as it appeared at that time for research and historical purposes. Scripts are available to download all, or part, of a snap­shot automatically.

24.1.3.2. Websites

| top | pdf |

In addition to providing access to the PDB archive, each wwPDB site offers services and resources that provide different views and analyses of the structural data contained within the PDB archive.

24.1.3.2.1. RCSB PDB

| top | pdf |

The RCSB PDB (http://www.pdb.org ) (Berman et al., 2000[link]) provides a website with several functionalities. In addition to downloading files in all formats, a variety of search capabilities are offered. These include searching by PDB ID, author or keyword; searching within a relational database that is populated by the data in the PDBx/PDBML files; or browsing characteristics that have been integrated from external resources (Deshpande et al., 2005[link]). These include biological process, cellular component and molecular function as defined by gene ontology (The Gene Ontology Consortium, 2000[link]); enzyme classification; medical subject headings (MeSH terms); source organism as defined by the NCBI taxonomy; gene location from Entrez (Wheeler et al., 2006[link]); and folds as classified by SCOP (Murzin et al., 1995[link]) and CATH (Orengo et al., 1997[link]). The structure summary report for each structure is itself searchable. A variety of tabular reports can be made for groups of structures with information about the experiment, the chemistry and bio­logy, and citations. Static pictures and interactive graphics are provided for each structure and protein–ligand interaction from the website. Some of these interactive views use the molecular biology toolkit (MBT, http://mbt.sdsc.edu ; Moreland et al., 2005[link]). A variety of summary statistics are available as histograms and in tabular form. Advanced searches can be performed on the structure data in combination with the data integrated from over 30 different external sources. Multimedia tutorials are available describing many other features.

A service called MyPDB notifies users via email when the PDB releases structures that match customized queries stored at the RCSB PDB site. For application developers, a variety of web services are available to directly access subsets of data and RCSB PDB resources.

Also published at the RCSB PDB site is the Molecule of the Month; authored and illustrated by David S. Goodsell, this feature describes a molecule in such a way that non-specialists can learn about the relationship between the structure and function of a particular molecular system.

24.1.3.2.2. PDBj

| top | pdf |

PDBj (http://www.pdbj.org ) maintains a variety of services for querying, displaying and analysing PDB data (Standley et al., 2008[link]). Structure Navigator can be used to locate PDB entries that are similar to a query structure or PDB ID. Structural similarity is defined by the newly developed score used in the stand-alone program ASH (Alignment of Structural Homologs), which can be run or downloaded from http://www.pdbj.org/ASH/ . ASH shows high sensitivity and selectivity in recognizing CATH or SCOP domains, without requiring large amounts of CPU (Standley et al., 2004[link], 2005[link], 2007[link]). Sequence Navigator is a BLAST interface to PDBj. The input to Sequence Navigator can be a PDB ID or an amino-acid sequence. GIRAF (Kinjo & Nakamura, 2009[link]) is an efficient search method for the local atomic alignments of possible ligand binding sites and protein–protein interfaces (http://giraf.pdbj.org ). It is not constrained by sequence homo­logy, sequence order or protein fold, and hence is applicable to the cases where no structural homologues can be found to find clues about the possible function of a protein.

Most of the query services described above can be executed as web services using the Simple Object Application Protocol (SOAP), REST or through web browsers. jV version 3 (formerly known as PDBjViewer) is a molecular-graphics program to display proteins and nucleic acids (Kinoshita & Nakamura, 2004[link]). eProtS, the Encyclopedia of Protein Structures, is an educational resource for learning about biological functions and structural characteristics of protein molecules. eProtS contains explanations for a few hundred protein molecules that are particularly interesting. eProtS is implemented as a Wiki so that motivated structural biologists can contribute articles for the proteins whose structures they have solved. Two derived databases are also available from PDBj. ProMode is a database of normal mode analyses (NMA) of proteins. eF-site (electrostatic surface of Functional site) is a database for molecular surfaces of protein functional sites, displaying the electrostatic potentials and hydrophobic properties together on the Connolly surfaces of the active sites, for analyses of the molecular-recognition mechanisms. eF-seek is a service to search for similar molecular surfaces among known active sites.

24.1.3.2.3. PDBe

| top | pdf |

PDBe (http://pdbe.org ) (Velankar et al., 2010[link], 2011[link]) aims to bring structure to biology, i.e., to become an integrated resource of structure-related data that provides relevant services, information and tools tailored to the needs of expert and non-expert users alike (Velankar & Kleywegt, 2011[link]). Besides offering basic and advanced search and retrieval systems, PDBe focuses on advanced services, ligand-related resources, integration with other biological data resources, validation and enriched presentation of experimental data. Advanced PDBe services include PDBeMotif (http://pdbe.org/motif ), a service for anal­ysing detailed molecular interactions and correlating them with sequence or structure patterns (Golovin & Henrick, 2008[link]); PDBeFold (http://pdbe.org/fold ), a powerful interactive structure-alignment tool (Krissinel & Henrick, 2004[link]); PDBePISA (http://pdbe.org/pisa), a quaternary-structure prediction and analysis service (Krissinel & Henrick, 2007[link]); and PDBeXplore (http://pdbe.org/browse), a tool that allows browsing and analysis of the structural archive based on familiar chemical and biological classification systems such EC, CATH and Pfam (Velankar & Kleywegt, 2011[link]).

Together with UniProt, PDBe also maintains the SIFTS resource (http://pdbe.org/sifts ), which provides entry and residue-level mapping for proteins that are found in PDB entries, as well as cross-reference information to resources such as GO, Pfam, InterPro, CATH and SCOP (Velankar et al., 2005[link]). SIFTS data are used by many of the world's major structural bioinformatics resources (e.g., RCSB PDB, Pfam, UniProt, SCOP) as a source of linking data about structure, sequence and function for PDB entries.

The PDBe website is also a rich source of meta-information about the PDB (statistics, highlights), educational material and Quips (short interactive articles and tutorials about `quite interesting PDB structures'). PDBe further provides tools that can be included in other websites (`widgets') and that provide up-to-date information. Examples of these are PDBprints (http://pdbe.org/prints ) and PDBportfolio (http://pdbe.org/portfolio ), which provide key information and images, respectively, about individual PDB entries, and UniPDB (http://pdbe.org/unipdb ), which shows the coverage of UniProt sequences by PDB entries. Finally, PDBe provides specialized information and services for the NMR (http://pdbe.org/nmr ) and cryoEM (http://pdbe.org/em ) communities.

24.1.3.2.4. BMRB

| top | pdf |

The BMRB (http://www.bmrb.wisc.edu ) (Ulrich et al., 1989[link]) collects and disseminates experimental NMR data and joined the wwPDB in 2006 (Markley et al., 2008[link]). The BMRB archive can be searched by database ID, keywords, author, molecule name, sequence, sample conditions or other characteristics. The website contains statistics and resources for the NMR community. BMRB archives data connected with an NMR structure as recommended by the wwPDB NMR Task Force for structural-biology depositions: structural constraints, assigned chemical shifts, and NOESY spectra and/or peak lists. BMRB also supports more com­prehensive NMR data depositions by members of the structural-genomics community, which may include additional primary (time-domain) spectra and intermediate results.

The BMRB provides software and tools for validating NMR data and associated structures prior to deposition and tools for visualizing various types of NMR data with or without an associated 3D structure. The BMRB is responsible for processing constraint files and for providing constraints that are consistent with structural models and converted to machine-readable NMR-STAR format to the PDB archive. Filtered restraints, further processed for use by test calculation protocols and longitudinal analyses and validations, are available online (Doreleijers et al., 2005[link]).

24.1.4. Future

| top | pdf |

Structural biology is a rapidly evolving field that constantly provides new challenges to the collection, curation and distribution of macromolecular structure data. Since 1998, the rate of depositions has tripled and the total holdings have increased by a factor of more than eight. The structures have increased in complexity, with many molecular machines now part of the archive. Data-harvesting tools have made it possible to collect and include more information about each structure. The challenge is to be able to continue to collect data rapidly while maintaining quality and uniformity across the archive.

The maintenance and further development of the PDB archive is a worldwide effort. The willingness of the global community to share ideas, software and data makes it a unique resource for biological research.

Acknowledgements

The RCSB PDB is supported by the National Science Foundation, the National Institute of General Medical Sciences, the Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes and Digestive and Kidney Diseases. Funding for PDBe is provided by EMBL–EBI, the Wellcome Trust, NIH, BBSRC and the European Union. Some earlier developments at PDBe were funded by the MRC and CCP4. PDBj is supported by the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency, and the Ministry of Education, Culture, Sports, Science and Technology. The BMRB is sup­ported by the National Library of Medicine.

This chapter is an updated version of Berman et al. (2009). Reprinted with permission of John Wiley & Sons, Inc.

References

Berman, H., Henrick, K. & Nakamura, H. (2003). Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980.
Berman, H. M., Henrick, K., Nakamura, H. & Markley, J. L. (2009). Structural Bioinformatics, 2nd ed., edited by J. Gu & P. E. Bourne, pp. 293–303. Hoboken: John Wiley & Sons, Inc.
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). The Protein Data Bank. Nucleic Acids Res. 28, 235–242.
Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). The Protein Data Bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535–542.
Clowney, L., Jain, S. C., Srinivasan, A. R., Westbrook, J., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: nitrogenous bases. J. Am. Chem. Soc. 118, 509–518.
Deshpande, N., Addess, K. J., Bluhm, W. F., Merino-Ott, J. C., Townsend-Merino, W., Zhang, Q., Knezevich, C., Xie, L., Chen, L., Feng, Z., Green, R. K., Flippen-Anderson, J. L., Westbrook, J., Berman, H. M. & Bourne, P. E. (2005). The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res. 33, D233–D237.
Doreleijers, J. F., Nederveen, A. J., Vranken, W., Lin, J., Bonvin, A. M., Kaptein, R., Markley, J. L. & Ulrich, E. L. (2005). BioMagResBank databases DOCR and FRED containing converted and filtered sets of experimental NMR restraints and coordinates from over 500 protein PDB structures. J. Biomol. NMR, 32, 1–12.
Engh, R. A. & Huber, R. (1991). Accurate bond and angle parameters for X-ray protein structure refinement. Acta Cryst. A47, 392–400.
Fitzgerald, P. M. D., Westbrook, J. D., Bourne, P. E., McMahon, B., Watenpaugh, K. D. & Berman, H. M. (2005). International Tables for Crystallography, Vol. G, Definition and Exchange of Crystallographic Data, edited by S. R. Hall & B. McMahon, pp. 295–443. Dordrecht: Springer.
Gelbin, A., Schneider, B., Clowney, L., Hsieh, S.-H., Olson, W. K. & Berman, H. M. (1996). Geometric parameters in nucleic acids: sugar and phosphate constituents. J. Am. Chem. Soc. 118, 519–528.
Golovin, A. & Henrick, K. (2008). Msdmotif: exploring protein sites and motifs. BMC Bioinformatics, 9, 312.
Henrick, K., Feng, Z., Bluhm, W. F., Dimitropoulos, D., Doreleijers, J. F., Dutta, S., Flippen-Anderson, J. L., Ionides, J., Kamada, C., Krissinel, E., Lawson, C. L., Markley, J. L., Nakamura, H., Newman, R., Shimizu, Y., Swaminathan, J., Velankar, S., Ory, J., Ulrich, E. L., Vranken, W., Westbrook, J., Yamashita, R., Yang, H., Young, J., Yousufuddin, M. & Berman, H. M. (2008). Remediation of the Protein Data Bank archive. Nucleic Acids Res. 36, D426–D433.
IUPAC–IUB Joint Commission on Biochemical Nomenclature (1983). Abbreviations and symbols for the description of conformations of polynucleotide chains. Eur. J. Biochem. 131, 9–15.
Kinjo, A. R. & Nakamura, H. (2009). Comprehensive structural clas­sification of ligand-binding motifs in proteins. Structure, 17, 234–246.
Kinoshita, K. & Nakamura, H. (2004). eF-site and PDBjViewer: database and viewer for protein functional sites. Bioinformatics, 20, 1329–1330.
Krissinel, E. & Henrick, K. (2004). Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Cryst. D60, 2256–2268.
Krissinel, E. & Henrick, K. (2007). Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. 372, 774–797.
Lawson, C. L., Dutta, S., Westbrook, J. D., Henrick, K. & Berman, H. M. (2008). Representation of viruses in the remediated PDB archive. Acta Cryst. D64, 874–882.
Markley, J. L., Bax, A., Arata, Y., Hilbers, C. W., Kaptein, R., Sykes, B. D., Wright, P. E. & Wüthrich, K. (1998). Recommendations for the presentation of NMR structures of proteins and nucleic acids. IUPAC–IUBMB–IUPAB Inter-Union Task Group on the Standardization of Data Bases of Protein and Nucleic Acid Structures Determined by NMR Spectroscopy. J. Biomol. NMR, 12, 1–23.
Markley, J. L., Ulrich, E. L., Berman, H. M., Henrick, K., Nakamura, H. & Akutsu, H. (2008). BioMagResBank (BMRB) as a partner in the Worldwide Protein Data Bank (wwPDB): new policies affecting biomolecular NMR depositions. J. Biomol. NMR, 40, 153–155.
Moreland, J. L., Gramada, A., Buzko, O. V., Zhang, Q. & Bourne, P. E. (2005). The Molecular Biology Toolkit (MBT): a modular platform for developing molecular visualization applications. BMC Bioinformatics, 6, 21.
Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia, C. (1995). SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540.
Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B. & Thornton, J. M. (1997). CATH – a hierarchic classification of protein domain structures. Structure, 5, 1093–1108.
Standley, D. M., Kinjo, A. R., Kinoshita, K. & Nakamura, H. (2008). Protein structure databases with new web services for structural biology and biomedical research. Brief. Bioinform. 9, 276–285.
Standley, D. M., Toh, H. & Nakamura, H. (2004). Detecting local structural similarity in proteins by maximizing number of equivalent residues. Proteins, 57, 381–391.
Standley, D. M., Toh, H. & Nakamura, H. (2005). Gash: An improved algorithm for maximizing the number of equivalent residues between two protein structures. BMC Bioinformatics, 6, 221.
Standley, D. M., Toh, H. & Nakamura, H. (2007). Ash structure alignment package: sensitivity and selectivity in domain classification. BMC Bioinformatics, 8, 116.
Tagari, M., Tate, J., Swaminathan, G. J., Newman, R., Naim, A., Vranken, W., Kapopoulou, A., Hussain, A., Fillon, J., Henrick, K. & Velankar, S. (2006). E-MSD: improving data deposition and structure quality. Nucleic Acids Res. 34, D287–D290.
The Gene Ontology Consortium (2000). Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29.
The UniProt Consortium (2011). Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res. 39, D214–D219.
Ulrich, E. L., Markley, J. L. & Kyogoku, Y. (1989). Creation of a nuclear magnetic resonance data repository and literature database. Protein Seq. Data Anal. 2, 23–37.
Velankar, S., Alhroub, Y., Alili, A., Best, C., Boutselakis, H. C., Caboche, S., Conroy, M. J., Dana, J. M., van Ginkel, G., Golovin, A., Gore, S. P., Gutmanas, A., Haslam, P., Hirshberg, M., John, M., Lagerstedt, I., Mir, S., Newman, L. E., Oldfield, T. J., Penkett, C. J., Pineda-Castillo, J., Rinaldi, L., Sahni, G., Sawka, G., Sen, S., Slowley, R., Sousa da Silva, A. W., Suarez-Uruena, A., Swaminathan, G. J., Symmons, M. F., Vranken, W. F., Wainwright, M. & Kleywegt, G. J. (2011). PDBe: Protein Data Bank in Europe. Nucleic Acids Res. 39, D402–D410.
Velankar, S., Best, C., Beuth, B., Boutselakis, C. H., Cobley, N., Sousa da Silva, A. W., Dimitropoulos, D., Golovin, A., Hirshberg, M., John, M., Krissinel, E. B., Newman, R., Oldfield, T., Pajon, A., Penkett, C. J., Pineda-Castillo, J., Sahni, G., Sen, S., Slowley, R., Suarez-Uruena, A., Swaminathan, J., van Ginkel, G., Vranken, W. F., Henrick, K. & Kleywegt, G. J. (2010). PDBe: Protein Data Bank in Europe. Nucleic Acids Res. 38, D308–D317.
Velankar, S. & Kleywegt, G. J. (2011). The Protein Data Bank in Europe (PDBe): bringing structure to biology. Acta Cryst. D67, 324–330.
Velankar, S., McNeil, P., Mittard-Runte, V., Suarez, A., Barrell, D., Apweiler, R. & Henrick, K. (2005). E-MSD: an integrated data resource for bioinformatics. Nucleic Acids Res. 33, D262–D265.
Westbrook, J., Henrick, K., Ulrich, E. L. & Berman, H. M. (2005). International Tables for Crystallography, Vol. G, Definition and Exchange of Crystallographic Data, edited by S. R. Hall & B. McMahon, pp. 195–198. Dordrecht: Springer.
Westbrook, J., Ito, N., Nakamura, H., Henrick, K. & Berman, H. M. (2005). PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics, 21, 988–992.
Westbrook, J. D. & Fitzgerald, P. M. D. (2009). Structural Bioinformatics, 2nd ed., edited by P. E. Bourne & J. Gu, pp. 271–291. Hoboken: John Wiley & Sons, Inc.
Wheeler, D. L., Barrett, T., Benson, D. A., Bryant, S. H., Canese, K., Chetvernin, V., Church, D. M., DiCuccio, M., Edgar, R., Federhen, S., Geer, L. Y., Helmberg, W., Kapustin, Y., Kenton, D. L., Khovayko, O., Lipman, D. J., Madden, T. L., Maglott, D. R., Ostell, J., Pruitt, K. D., Schuler, G. D., Schriml, L. M., Sequeira, E., Sherry, S. T., Sirotkin, K., Souvorov, A., Starchenko, G., Suzek, T. O., Tatusov, R., Tatusova, T. A., Wagner, L. & Yaschenko, E. (2006). Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 34, D173–D180.








































to end of page
to top of page