BoF Text and Data Mining: Defining the Challenges and Actions

 
Date & time: Thursday 3rd March - 09:00 - 10:30 - Working Meeting Session 6
Group name: Text and Data Mining
Meeting title: Text and Data Mining: Defining the Challenges and Actions

Text and data mining (TDM) is the process of deriving new information from vast quantities of machine readable materials (facts, data and ideas). TDM technology can trawl this existing data to finding new patterns, new correlations, new insights into information we already have but just aren’t humanly able to consume in the same space of time. It could help to solve some of society’s problems and has the potential for huge returns. However, there are many challenges to overcome before TDM can reach its full potential. Some of these challenges are already known, such as the lack of metadata standards and a proper legal framework for TDM. But many of the challenges are still unknown. 

This group defines the technical, legal, policy and organisational challenges that TDM poses in the EU and worldwide, and will work towards borderless solutions. 

This group is set up by the EU funded project OpenMinTeD, which aims to build an eInfrastructure making text and data mining (TDM) of EU content possible. The project started in June 2015 and is in the process of mapping the current opportunities and challenges for TDM worldwide. 

This will be the first get-together of this group. Any institution interested in making their data available for text and data mining is invited to become a part of it.

Project website: www.openminted.eu 

List of expert organisations involved: 
1. ATHENA RESEARCH AND INNOVATION CENTER IN INFORMATION COMMUNICATION & KNOWLEDGE TECHNOLOGIES (ARC)
2. THE UNIVERSITY OF MANCHESTER (UNIVERSITY OF MANCHESTER)
3. TECHNISCHE UNIVERSITAET DARMSTADT (UKP-TUDA)
4. INSTITUT NATIONAL DE LA RECHERCHE AGRONOMIQUE (INRA)
5. EUROPEAN MOLECULAR BIOLOGY LABORATORY (EMBL)
6. AGRO KNOW IKE
7. STICHTING LIBER 
8. UNIVERSITEIT VAN AMSTERDAM
9. THE OPEN UNIVERSITY
10. ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE (EPFL)
11. FUNDACION CENTRO NACIONAL DE INVESTIGACIONES ONCOLOGICAS
CARLOS III (CNIO) 
12. THE UNIVERSITY OF SHEFFIELD (USFD)
13. GESIS - LEIBNIZ INSTITUT FUR SOZIALWISSENSCHAFTEN 
14. GREEK RESEARCH AND TECHNOLOGY NETWORK S.A. (GRNET)
15. FRONTIERS MEDIA 
16. THE UNIVERSITY OF STIRLING (UoS)

Many data institutions (publishers of scientific publications, databases of scientific publications and/or cultural heritage, national libraries, research libraries, data centers) are interested in making their data available for TDM, but they see a lot of challenges in actually doing it. Data institutions in the EU can learn a lot from countries such as China and Japan, where institutions are far more ahead and have already found solutions for TDM problems that Europeans are still struggling with. This meeting would like to bring data institutions with different levels of TDM expertise together, and aims to produce a list of technical, legal, policy and organizational challenges that content providers worldwide face in making their data available for text and data mining. The organisers will work with the participants towards solutions, and will also encourage the participants to come up with solutions among each other. 

After the meeting, the group will live on as an online discussion group to exchange best TDM practices. The group will also get together at future Openminted workshops, scheduled later this year in May (Slovenia), June (Helsinki) and September (tbd) 2016.

Agenda (moderation: Hege van Dijke)

1. Presentation: Introduction to Text and Data Mining (Hege van Dijke) 15 minutes
2. TDM best practice presentation (Openminted consortium member) 15 minutes
3. Introduction round: experiences with TDM so far (all participants) 30 minutes
4. Interactive session defining TDM challenges (all participants, in groups of 6) 15 minutes
5. Interactive session working towards TDM solutions to the defined challenges (all participants, in groups of 6) (40 min)
6. Summary of the results (one member per group of 6) 30 min
7. Overall conclusion (Hege van Dijke) 5 min

The target audience are those working at organisations that provide data (publishers of scientific publications, databases of scientific publications and/or cultural heritage, national libraries, research libraries, data centers), and are interested in making their data available for text and data mining. This meeting is especially interesting for data managers and data officers that have already looked into the opportunities of TDM for their organisation.
Group chair serving as contact person
Hege van Dijke