RDA Plenary 3: Big Data Analytics Interest Group
Chairs: Morris Riedel, Rahul Ramachandran, Peter Baumann
The Big Data Analytics IG took place at the second day of the third RDA Plenary on Thursday 27th of March 2014 in Croke Park conference centre.
The session started as the chairs around the room’s big round table were all full and there were more people coming to the room and made the second and third rows of the audiences. Regarding the importance of Big Data analytics and since this is a hot topic, it seems natural to see a lot of people being interested in this session.
Morris Riedel, co-chair of the session, started with introducing the goals of the session for this time as:
1. checking to see if any of the participants has a good use case for the IG
2. checking if there is a possibility for collaboration and paper publication
In addition, the agenda of the session has been reviewed. But before going any further with the agenda, all of the participants have been asked to introduce themselves for more introductions and say what their personal interest in analytics is. Some of the participant’s interests were as following:
· use analytics in real cases
· multimedia analysis
· finding biases
· IT architecture
· dirty data, and corrupted data
· problem and performance of data analysis
· efficient big data infrastructure
Morris has then explained about discussions in previous plenary meetings and which material they have already provided in the wiki for group members to read. Some highlights from the previous talks and wiki materials are:
· There is a need to have smart analytics.
· Some terms are around for long time, so what is the difference today concerning big data?
· Differences of analytics & analysis
· Data analysis supports the search for ‘causality’
· Big data analytics is focused on ‘correlation’
· Big Data analytic (clustering, classification, …) is what scientific computing and big data have in common
There were two use-cases chosen to show this difference:
· First use-case: Event tracking analytics: data sets from satellites(events with changing geolocations)
· Second use-case: Automatic outlier detection in big data (PANGAEA), open for one month in B2SHARE
Then, the first speaker, Guiseppe, presented three use cases from solid earth analytics (seismic analytics) and pointed out the characteristic of the data set and difficulties of analyzing the data for each case. Cases were differ from near real-time analysis of continuous streams of data to check for events like earthquake, to offline analysis on gathered data for pattern recognition and “synthetic” data for event predictions, in which very large data are simulated.
Next speaker was Stephan Decker who gave a talk on the Insight Centre for Data Analytics. He presented their experiences of working with industry and some of the works that are done in Ireland regarding Big Data Analytics.
The third speaker, Peter Baumann, had presented some use cases regarding multi-dimensional arrays and stated that different communities have data with different dimensions. As an example, in climate data modeling, there exists cube data since satellite data are dense. There were also some of the databases mentioned, which has already implemented the multidimensional arrays, like SciDB, Mone, PostGIS Raster Oracle, and array model on top of Hadoop.
As the last planned speaker, Wo Chang, talked about how to capture a workflow? He mentioned that they want to identify different use-cases to study, generalize and ease the way people learn from data sets to from an infrastructure.
Afterward, Phil Archer, a volunteer speaker, from W3C talked shortly to present what is his purpose of being there, and what W3C can offer to help for this IG.
At the end the, Idea of having a by invitation hands-on workshop in RDA US workshop was presented, and concluded that it might be better to have this session in Amsterdam.
I think the session could successfully absorb and meet the needs of its targeted audiences. At the end of the session, Morris asked if the people found the session interesting and want to follow up its activities? In return, he got a lot of positive feedbacks and more than 10 new people wanted to subscribe to the group.
Author: Pravin Ganore
Date: 28 Dec, 2016
Are there any chances in the future that big data and cloud computing can get combined on a larger scale?