The Life Cycle of Structural Biology Data

Output Type: Other Output
Output Status: Not Endorsed
Review Period End:
DOI:
Group: Structural Biology IG
Standards:
Regions:
Language:

Non RDA Author(s)
Adopters

Abstract

Supporting Output Title: The Life Cycle of Structural Biology Data

Corresponding author: Chris Morris, STFC, Daresbury Laboratory, WA4 4AD

Contributors: Claudia Alen, Lucia Banci, Alexandre Bonvin, Pablo Conesa, Alfonso Duarte, John Helliwell, Yogesh Gupta, Rob Hooft, John Markley, Brian Matthews, Gaetano Montelione, Antonio Rosato, Sameer Velankar, Matthew Viljoen, Geerten Vuister, John Westbrook, Martyn Winn, and Christine Zardecki.

Research data is acquired, interpreted, published, reused, and sometimes eventually discarded. This document reports how structural biologists perform these tasks, and recommends improvements to the infrastructure available to them.

Download The Life Cycle of Structural Biology Data report

Executive Summary

Research data is acquired, interpreted, published, reused, and sometimes eventually discarded. Understanding this life cycle better will help the development of appropriate infrastructural services, ones which make it easier for researchers to preserve, share, and find data.

Structural biology is a discipline within the life sciences, one that investigates the molecular basis of life by discovering and interpreting the shapes of macromolecules. Structural biology has a strong tradition of data sharing, expressed by the founding of the Protein Data Bank (PDB) in 1971 (PDB, 1971). In the early years, data submissions to the archive were made by mailing decks of punched cards. The culture of structural biology is therefore already in line with perspective of the European Commission that data from publicly funded research projects are public data (COM(2011) 882 final).

This report is based on the data life cycle as defined by the UK Data Archive. This is the most clearly defined workflow that the authors are aware of. It identifies six stages: creating data, processing data, analysing data, preserving data, giving access to data, re-using data. Each will be discussed below. However, the data infrastructure for structural biology is not a perfect match for this workflow. For clarity, ʻpreserving dataʼ and ʻgiving access to dataʼ are discussed together. We also add a final stage to the life cycle, ʻdiscarding dataʼ.

Changes in research goals and methods have led to some changes in the requirements for IT infrastructure. A common data infrastructure is required, giving a simple user interface and simple programmatic access to scattered data. Progress on these tasks will support the development of workflows that facilitate the use of datasets from different facilities and techniques. The automatic acquisition of metadata can help. Large experimental centres already provide a highly professional data infrastructure. For smaller centres this is onerous – it is desirable that a standard package is provided enabling them to use the European e-infrastructure resources, in a way that integrates with other structural biology resources.

Impact Statement
Primary Field or Expertise
Mathematics
Explanation of Sustainable Development Goals
Citations
Output

Life-Cycle-Report.pdf

Download

Primary Domain: Natural Sciences
RDA Pathways:
Group Technology focus:
Regions:
Stakeholders:
Sustainable Development Goals:

No comments found.

The Life Cycle of Structural Biology Data

Non RDA Author(s)

Adopters

Abstract

Executive Summary

Impact Statement

Primary Field or Expertise

Explanation of Sustainable Development Goals

Citations