Structural biology research depends on high-quality structure models, often derived from X-ray crystallography. The Protein Data Bank (PDB) is the primary source for such models, but has some drawbacks for large-scale and high-throughput studies as the deposited models were created by different people in different eras using different methods, and many models contain solvable flaws. For over a decade, the PDB-REDO databank drastically reduces these drawbacks. Namely, the databank collects optimised PDB entries that are re-created from their original experimental data using a fully automated protocol. This procedure increases comparability of the structure models while also removing many imperfections.
Thus far, PDB-REDO had limited provenance tracking and used proprietary data formats for metadata. Moreover, because the PDB-REDO databank is a living entity, entries get replaced frequently, e.g. to incorporate algorithmic advances from our ongoing research. Unfortunately, this caused models used in structural biology research to ‘disappear’, thereby affecting their re-use and scientific reproducibility. Such limitations of PDB-REDO remodelled structures were addressed in this EOSC-Life WP1 project:
The updated data structure of the PDB-REDO databank also allowed us to create a cloud-ready API for research dataset generation. It allows users to select structure models based on provenance data and/or structural and model validation parameters stored as metadata. A graphical interface to this API is available here (Fig 1). Search results can be stored as a JSON structure and used as a dataset descriptor for scientific publications.
The combined results of this project have made the PDB-REDO databank a stronger resource for structural biology research inside and outside of the European Open Science Cloud.
We have long-standing connections between PDBe and PDB-REDO. Now with the improved FAIRness and overall structure of the PDB-REDO databank we can make more extensive connections between our resources. This will create more added value to our combined userbases.
– Sameer Velankar, PDBe
The PDB-REDO project has always supported Open Science and focused on accessibility and availability. EOSC-Life has given us the resources and support to bring our databank to the next level of usability and FAIRness which allows us to better serve our existing users and make our databank more suited for new users in the field of structural biology and bioinformatics.
– Robbie P. Joosten, PDB-REDO