Big Data IG


Group details

Case Statement: 
IG Established

RDA Big Data Interest Group Charter


The ultimate goal of RDA Big Data Interest Group is to produce a set of recommendation documents to advise diverse research communities with respect to:

How to select an appropriate Big Data solution for a particular science application with optimal value? and

What are the best practices in dealing with various data and computing issues associated with such a solution?


In order to achieve our mission, we need to attain the following objectives, while take duly into account related activities and results, such as those of International Organization for Standardization (ISO), Open Geospatial Consortium (OGC), and US National Institute of Standards and Technology (NIST) Big Data Public Working Group (NBD-PWG), as well as other relevant organizations and undertakings.

  • Clarifying, and sometimes defining, terminologies related to Big Data.

    • Any Big Data solution for scientific research will involve many relevant disciplines such as computing hardware/software infrastructure and architecture, data management/curation, analyses and algorithm, etc. Discussions will be more effective when there is no confusion in terminology.

    • The efforts of the RDA Terminologies WG, as well as other relevant efforts (e.g. from NDB-PWG), will be consulted and feedback will be provided whenever necessary.

  • Characterizing leading Big Data technologies.

    • Investigations will be carried out 1) directly through spin-off Working Groups (WGs) and 2) in collaboration with other RDA groups, to characterize the technologies.

    • The characterization of a Big Data technology will include its strengths, weaknesses, and limitations. In other words, what it is good for, in what sort of environment, for what kind of analyses/algorithms.

    • Example evaluation criteria include, but not limited to,

      • Performance, resource utilization, and scalability,

      • Usability,

      • Flexibility and extensibility, and

      • Propensity in supporting scientific collaborations.

  • Collaborating with external entities through IG member involvements.

    • These external organizations and enterprises include, but not limited to, ISO, OGC, NIST, EarthCube, EarthServer, or even individual research projects.

    • These interactions with external entities will ensure BDIG is up to date with new developments and enable us to leverage others’ efforts.

    • Examples of such interactions include: the establishment of connection with NIST NBD-PWG activities, participation in its discussion of Big Data reference architecture, as well as the exercise of common Big Data use cases collected by NBD-PWG and BDIG; similar activities with OGC on Big Geo Data and the ISO SQL WG on flexible retrieval from Big Science Data.

  • Producing a set of recommendation documents based on results obtained from activities in attaining above objectives.

    • This set of documents will include:

      • A systematic classification of algorithms pertinent to the characterization of Big Data technologies,

      • Characterizations of Big Data technologies investigated, especially their value characteristics in each category of use cases,

      • Frequency of each class of algorithms and/or queries used by workflows in various use cases, delineated by science domains/subdomains, and

      • Feasible combinations of analysis algorithms, analytical tools, data and resource characteristics and scientific queries.

    • These recommendation documents are aimed to serve as a best practice guide for scientific groups/communities interested in investing in Big Data technologies.


Big Data Interest Group, BDIG,  is open to all RDA members to participate. The following participants are especially relevant:

  • Domain scientists wishing to utilize Big Data solutions for their research and/or applications,

  • Data specialists with experience in data production, curation, analysis, and management, especially involving large volumes and varieties of data,

  • Computational scientists or software engineers with special interests in data analysis techniques and algorithm analysis, especially pertaining to Big-Data-relevant technologies and tools,

  • Experts, or aspiring experts, of various Big Data technologies and tools,

  • Computational infrastructure and architecture experts in fields such as distributed computing, high-performance computing, and database systems,

  • Data scientists with a blended interest involving some subsets the activities mentioned above, in partcular with share, use, and reuse of open scientific data collections, and

  • Managers involved in any combination of the activities mentioned above.

Interaction Mechanism

BDIG will utilize capabilities provided by the RDA platform to communicate and collaborate effectively to achieve its goals. These include:

  • Monthly telecoms/webex to with planned agenda to discuss specific issues

  • Asynchronous collaboration using google docs, wiki and email list servers

  • Semiannual RDA Plenary meetings to hold sessions for F2F interactions amongst members and to inform other RDA members of its ongoing activities.


BDIG will be considered a success if the Interest Group:

  • Develops recommendation documents accepted within and beyond RDA

  • Visibility and uptake of results can be demonstrated within RDA, but also beyond RDA (such as OGC and ISO, discussions and outreach of results at specific domain-specific events like the IGARSS, EGU, AGU, etc.)

All BDIG documents will be publicly available on the RDA server.


[1] NIST Big Data Use Cases:

[2] EarthServer Big Data Use Cases:

[3] OGC Big Data Domain Working Group:


Wiki Contents

Recent Activity

06 Jun 2019

Invitation to join BigDataStack Technologies for Shipping webinar - 26 June 14.00 CEST

Dear all,

The BigDataStack project is organising a webinar on Big Data Technologies for Shipping. The webinar will show the added value of Big Data for the Shipping Industry on the basis of the Use-Case lead by Danaos, a leading international maritime player with more than 60 container ships. We’ll address how BigDataStack algorithms optimize and help cut costs on maintenance and spare parts inventory planning and dynamic routing.

16 May 2019

Plenary 14: Call for Sessions, Collocated Events, Posters and Registration Now Open!

Taking place from 23-25 October 2019, the 14th RDA Plenary will take place in Dipoli, the nature-immersed building of Aalto University, Helsinki, in Finland, “one of the happiest countries in the world”, states Per Öster, CSC-IT Center For Science Director and Co-Chair of the P14 Programme Committee.


21 Mar 2019

Big Data Storage and Data Virtualization

The major objective of this paper is to present Big Data Storage techniques and Data Virtualization. The Data virtualization servers have focused on making big data processing easy. They can hide the complex and technical interfaces of big data storage technologies, such as Hadoop and NoSQL, and they can present big data as if it is stored in traditional SQL systems. This allows us as developers to use our own existing skills and to deploy our traditional ETL, reporting, and analytical tools that all support SQL.

24 Jan 2019

Big Data - Definition, Importance, Examples & Tools

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

continue reading

19 Apr 2018

Re: [rda-bigdata-ig][array_database] ADA:WG report available for public comment

Dear Sandro,
your contributions are much appreciated. Over the 1.5 years - where you also
participated in some meetings - additions were simple, just by updating the wiki
yourself. Now, after submission for checking, changes should concentrate on
fixing issues popping up. That said, I will still try to accommodate changes as
much as feasible, just send me a list of the changes. I will collect all during
the comment period and do a bulk update.
best regards,
Dr. Peter Baumann

18 Apr 2018

Re: [rda-bigdata-ig][array_database] ADA:WG report available for public comment

Dear Peter,
as member of the Array Database WG and PI of the big data(cube) framework Ophidia, I would like to give my contribution to the report.
In particular, I plan to:
1) update the info about Ophidia and fix some errors in the text/tables related to it;
2) contribute to the general sections too;
3) provide an application perspective, with special regard to the climate domain.
Is there a doc version of the report, which I can edit in track change mode and send back to you?

17 Apr 2018

ADA:WG report available for public comment

Dear all,
let me announce the Array Database report, available for public comment at
Notably, some careful readers have already spotted errata, these are being
collected and fixed in one go afterwards. Kindly let us know if you spot typos
or other kinds of errors, in order to achieve the highest possible quality.
happy reading,