Technical Highlight - July 2014
Short description: Protein crystallographers must be careful to prevent avoidable errors in the PDB.
The Protein Data Bank (PDB) is an invaluable resource for all biologists, especially those conducting drug design or data-mining studies. The PDB currently contains over 85,000 crystallographic structures. Although the majority of these depositions are of high quality, preventable errors in a number of structures could affect the PDB's credibility as a source of reliable structural information.
Dauter and colleagues (PSI MCSG and NYSGRC) recently analyzed a number of typical errors that are prevalent in the PDB. These errors were divided into four general categories: inconsistent data presentation, non-parsimonious modeling, ignoring evidence, and ignoring prior knowledge.
The authors presented examples of each type of error and sought to remind crystallographers to invoke Bayesian reasoning when assessing the correctness of structural models. Namely, experimenters should take into account both the primary evidence available (i.e., crystallographic data) and their own prior knowledge (i.e., the laws of chemistry and physics) when building and refining structures. Furthermore, despite the availability of increasingly automated methods of structure determination, experimenters should rigorously and objectively evaluate the details of all structures before carefully depositing them in the PDB.
Additionally, Dauter and colleagues advocated that original authors and unaffiliated researchers continue to re-refine and redeposit models when improvement is possible. These corrected structures should take advantage of the PDB's little-used REMARK and CAVEAT codes to indicate the reasons for the model's update and to alert other users about questionable features, respectively. The authors concluded that the next generation of protein crystallographers must be adequately trained in order to prevent the majority of these errors, and ultimately to ensure the highest standards of structure determination and the continued health of the PDB.
Z. Dauter et al. Avoidable errors in deposited macromolecular structures: an impediment to efficient data mining.
IUCrJ. 1, 179-193 (2014). doi:10.1107/S2052252514005442