BoF WG Big Data Infrastructure BDI
RDA Plenary 3
Croke Park Conference Centre, Dublin, Ireland
Wednesday 26th March 2014, 15:30 – 17:00
By the time the session on Big Data Infrastructure (BDI) began at 15:30, every seat in the room was filled. As time went on, more and more delegates joined the session to hear Wo Chang and learn about the working group’s activities since the last plenary.
Wo explained that the working group started in the US and are now interested to hear the European view on the subject. He therefore welcomed questions and suggestions throughout.
Firstly, the reasons for creating the working group were explained. It is important for those implementing big data applications to have a simple, effective and cheap infrastructure. The platform should enable breakthroughs and allow for changes in technology. Users don’t want to deal with the technical parts of big data analysis, a user friendly application enables them to concentrate on the analytics. However, there are many problems associated with creating such a platform.
One main issue raised was that there is no ‘one size fits all’ solution when it comes to big data infrastructure. There are many federated platforms catering for different needs within different disciplines which can cause problems when scientists do not note which tools or operating system they have used. Computations cannot always be verified and calculations cannot be reproduced when so many different platforms are being used. Therefore, the main dilemma for BoF WG BDI is how to create a standardised, generic platform for data scientists.
A second concern was the cost of storing, moving and analysing data. Often, data is acquired far more rapidly than it can be processed and a lot of time is spent on cleaning up data. The costs involved in this can be very high. Everyone wants a platform that is cheap, fast, effective and trustworthy meaning that BoF WG BDI should create an infrastructure that will keep costs to a minimum.
Wo explained that the group had collected 51 use cases (which can be found at http://bigdatawg.nist.gov/usecases.php). It was argued, however, that the number of use cases collected does not make a difference, it may be more useful to proceed with a single use case at first so that a specific problem can be identified and addressed.
It was suggested that BoF WG BDI should collaborate with IG Big Data Analytics (BDA) allowing them to share use cases and work together on solving these problems. On Day Three of the plenary, the two groups got together and had a joint discussion on how they may be able to help each other. It was decided that people from different disciplines with similar problems should be brought together because from a big data infrastructure point of view ‘a data set is a data set’ no matter what the data relates to. BoF WG BDI plan to provide platforms for IG BDA allowing them to run algorithms and IG BDA will provide the analytics. The two groups will exchange technical details and come to a conclusion between them. The end goal is to create five to six unique applications relating to different use cases.
So, the next step for BoF WG BDI is to concentrate on a small number of use cases. They will work together with IG BDA to solve problems in analytics and attempt to capture unique applications and identify any patterns or interactions between different domain specific algorithms. The problem of how big data infrastructure can fit for everyone is undefined but the working group will continue to work towards their goal of establishing best practice implementation guidelines for how to deploy and manage big data applications.