Data Rescue Interest Group

26 Feb 2015

Data Rescue Interest Group

The natural sciences possess a rich heritage of data spanning the entire era of research, encompassing both modern electronic formats and analogue ones (paper, film, books, pro-formas, charts, maps, photographic plates) or primitive magnetic tapes. The information in older data is critical for quantifying changes and trends and differentiating between natural or anthropogenic-induced changes. Unfortunately, most historical data have not been converted into electronic datasets. Those data cannot be accessed by present-day research - to the serious detriment of models that predict future changes. Even those that have been 'digitized' (whether catalogues of measurements or the actual observations) are rarely in interoperable, even easily readable, formats. Though essential to research for their unique time-stamp, heritage data are often in deteriorating state, abandoned, or effectively lost (if not actually destroyed). Their profile is unacceptably low; even the RDA does not have an organized effort focussing on this crucial topic.

Since 2010, CODATA has hosted a Task Group for "Data At Risk" (DAR-TG). DAR-TG led an enthusiastic Plenary #3 BoF session, and we are now proposing a new RDA IG for "Data Rescue" with affiliation to DAR-TG. DAR-TG's own international membership will set the IG at once on an international footing, and its history will endow the young IG with ready-made experience and contacts. The joint organization will benefit considerably from the broader exposure which the RDA will offer, such that the sum of the two will unquestionably be greater than the sum of separate entities working in parallel (even in competition). Both DAR-TG and CODATA itself are very happy with this proposed development.

The objectives of the new IG will be (a) to ferret out and catalogue known data-rescue efforts as exemplars of what can be achieved, and thereby raising the profile of Data Rescue in the world at large, (b) to establish an advisory system for 'digitizing' and associated tasks, and (c) to communicate with relevant RDA IGs (Education, Metadata, History & Ethnography, Long Tail of Data, and the domain-specific standard-setters). It will also consider extending its remit to 'adding value' to older digital data. It will provide a unique forum for sharing experience, expertise and ideas. Its overarching goal will be to convince scientists and policy makers of the immense value of implementing data rescue, and by enabling interoperability, access and (thence) wide application, to grow the activity into an essential and routine element of all research.

The Data Rescue IG will be particularly valuable for science researchers who require data from the past, and (by sharing best practice, hardware and software) for archivists and others with responsibilities to oversee the preservation of historical information in the humanities and social sciences.

Case studies tabulated by the DAR-TG to date range from modest individual or small-group attempts to extend a domain-specific research data-base backwards in time in order to study natural evolution, to transnational organizations that align and open access to broad categories such as rocks, fossils or the ocean's characteristics, for the benefit of all academic research; some (such as 'Old Weather', or the recovery of 'lost' tapes from early space missions) also appeal strongly to the general public. A small selection of those stories features in the introductory article to an issue of GeoResJ dedicated to "data rescue"[1]. Collecting, cataloguing and sharing information on all such projects, regardless of outcome, will thus be a key mission.

[1] "When are Old Data New Data?"
Griffin, R.E.M., and the CODATA Task Group for "Data At Risk",
2015, GeoResJ, in press.

Proposed Chairs of the IG:

R. Elizabeth Griffin, Dominion Astrophysical Observatory, Victoria, BC, Canada

David Gallagher, National Snow & Ice Data Center, Boulder, CO, USA

Lesley Wyborn, Australian National University, ACT, Australia

Supporting proposers include members from a selected group of judges of the biennial Elsevier/IEDA Award for Data Rescue, and the membership of the CODATA "Data At Risk" Task Group.


TAB Review:

  • Jean Bernard Minster's picture

    Author: Jean Bernard Minster

    Date: 12 Mar, 2015

    The principle of data rescue is cirtical in many fields that depoend on long time series.   Formulating approaches that are sustainable and can be funded is an essential step.

  • Elizabeth Griffin's picture

    Author: Elizabeth Griffin

    Date: 03 Apr, 2015

    < Formulating approaches that are sustainable and can be funded is an essential step >

    Without question the funding is essential, as for just about everything else! What is even more important, though, and which constitutes an essential pre-requisite, is to convince the world (scientific communities but also others) of the overarchiving NEED to rescue those data in the first place. Even when funding is available, thoughtless comments like, "Who wants old data when we can get so many better ones nowadays?" are entirely misleading. As you know, historical data are not simply "old"; they are unique, and no amount of modern sophistication in data gathering can turn back the clock!

  • Chris Muller's picture

    Author: Chris Muller

    Date: 27 Apr, 2015

    There are those who believe that not only are important data collections at risk, but in many cases the technical and organizational tools to perform the rescue process can be at risk. Perhaps it's a small lab where a retiring professor or a government project manager and staff once recovered old (and now ancient) datasets in an arcane format.  The special equipment and software may just be discarded. It would be very helpful if there were a non-profit focused on creating an inventory of such capabilities; keeping in touch with those in charge of them to ensure they don't  disappear; husbanding (wifing?) some of the capabilities in the event they were about to be lost; helping those in need of data rescue to locate the resources, and if needed, assist in or perform the work.

    A reasonable idea?  An article on this theme will be printed in the Against the Grain magazine in September. Slides from an earlier presentation are attached.


    PDF icon DP2014earlydig.pdf3.1 MB

  • Chris Muller's picture

    Author: Chris Muller

    Date: 13 May, 2016

    A few of us in the CODATA Data at Risk Task Group - a sub-group which I've taken the liberty of calling "the IPDRC committee" have put together a summary document. This is still a work in progress, and the most recent version is attached. Any comments and suggestions about this theme are very welcome.

    PDF icon IPDRCsummary.pdf101.03 KB

submit a comment