28 Oct 2016

Open data without barriers

By Demetris Avraam, RDA EU Early Career Grant Winner – University of Bristol

Open data access and open science in general is a big gain of the data revolution. In modern sciences the accessibility and sharing of data is necessity but often is challenging especially in disciplines where the data include sensitive information and are protected by laws and regulations. The Research Data Alliance is a community that successfully brings experts from different backgrounds together, to discuss the challenges and opportunities and share their ideas through different Working and Interest Groups which then provide recommendations to enable data sharing without barriers. The common goal of data sharing is assessed through the RDA groups from all possible perspectives; from the legal and ethical site, to the development of infrastructures and technologies and other computational and statistical solutions. Also a number of RDA groups focuses on the barriers of data sharing in more specific domains such as health and medical sciences, life and social sciences, agriculture and many other disciplines. Last September, I had the opportunity to attend the 8th RDA Plenary in Denver which was a great experience for me and an interesting meeting for all of us who work on the evolving area of data sciences.

One of the most interesting sessions that I have attended during my time in Denver, was a discussion on ‘Open data as a public good and the responsibilities of scientists’. In that session a panel of three keynote speakers discussed the responsibilities that the scientists have to make the data and their research publicly available for reuse by others and the inequalities of data sharing in national and international levels. The first speaker of the session, Victoria Stodden from University of Illinois, discussed the importance of transparency and reproducibility of scientific research; she highlighted the need for transparency not only in the data but also on the methodologies and software implementations and the importance of replication of the entire data processing from experiments and data collection to the analysis and data reporting. The second speaker, Myron Gutmann from University of Colorado Boulder, discussed the needed balance between the open data as public good and the responsibilities of scientists to find solutions for the protection of privacy, confidentiality, intellectual property and the recognition of data integration risk. The third speaker of the session, Takashi Onishi from Toyohashi University of Technology, presented the recommendations of the Science Council of Japan which proposed the establishment of an infrastructure to allow the management and openness of interdisciplinary research data, the establishment of data strategies by research communities, and the career design for data producers and data curators.

Following the talks, a remarkable discussion took place between the audience and the speakers who raised the importance of research reproducibility and the challenges of open sciences. At this point, I would like to highlight two points which came out from the discussion. First point, is the common agreement that the reproducibility of research is important and this relates also to the increasing demand from publishers to require the sharing of data, software, protocols and experimental methods to accompany research and journal articles. Second point, is the requirement to find ways as community to ensure the sustainability of infrastructures and software that are developed to deal with all those barriers of data sharing. Those points are crucial, as grant providers fund applied projects on data analysis more easily than projects that focus on the development and maintenance of the technologies required for those analyses, and this issue adds an extra barrier to the researchers who build the solutions in order to solve all the other pre-existing barriers and to allow the data sharing in protected frameworks.     

During the RDA Plenary, I had the opportunity to attend several other presentations from people of different working and interest groups and discussed those challenges and opportunities. I presented a poster and discussed my work with other people with common research interests. In my poster, I demonstrated DataSHIELD which is a computational solution that allows the analysis of sensitive individual-level data and the co-analysis of such data from several studies simultaneously without physically pooling the data. It is a useful tool especially in the fields where the barriers of data sharing are insurmountable, and those related to ethico-legal issues, to big size of data if we refer to Big data and to control maintenance of intellectual properties.

I had also the opportunity to be more active in two RDA groups that have common interests to my work, the RDA/NISO Privacy Implications of Research Data Sets Interest Group and the Working Group Data Security and Trust and to see the recommendations, implementations and adoptions from other RDA groups.

The first session on recommendations and adoptions on the second day of RDA plenary was focused on the recommendations of two RDA working groups and cases of implementation by two other WG. Peter McQuilton, from University of Oxford, presented the challenges and the recommendations from the BioSharing Registry WG and he demonstrated the BioSharing, a web-based portal that monitors the development and evolution of standards and data policies. Stefano Nativi, from National Research Council of Italy, presented the outputs of the Brokering Governance WG and stated the challenges in accessing data from different sources, in identifying objects from publishing data, in combining different data processing models and in executing applications on distributed infrastructures. Thomas Zastrow, from Max Plank Computing and Data Facility, talked about the Data Foundation and Terminology WG and he demonstrated the Fedora Commons an open source repository system for the management and dissemination of digital content. Leslie McIntosh, from Washington University, talked about the adoption of Data Citation WG into Biomedical Science Infrastructure and talked about a grant that the group received and how they used it to implement the recommendations of the group and to engage other i2b2 community adoptees.

The entire International Data Week and especially the RDA Plenary was a great experience to me who as an early career researcher I am quite new in the field of data science. The Plenary offered the chance to early career researchers to present their work and get advices and knowledge from experts from different fields (from academia, industry and government) and from different countries across the world. I highly recommend to anyone with interests in research data to attend the next RDA Plenaries and to involve with any of the working and interest groups. I look forward to see all of you at the 9th RDA Plenary Meeting which will take place from 5th to 7th of April 2017, at the Barcelo Sants Hotel in Barcelona, organised by the Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS) with the support of RDA Europe.

