Skip to main content

Notice

We are in the process of rolling out a soft launch of the RDA website, which includes a new member platform. Existing RDA members PLEASE REACTIVATE YOUR ACCOUNT using this link: https://rda-login.wicketcloud.com/users/confirmation. Visitors may encounter functionality issues with group pages, navigation, missing content, broken links, etc. As you explore the new site, please provide your feedback using the UserSnap tool on the bottom right corner of each page. Thank you for your understanding and support as we work through all issues as quickly as possible. Stay updated about upcoming features and functionalities: https://www.rd-alliance.org/rda-web-platform-upcoming-features-and-functionalities/

Dark Data and FAIR

  • Creator
    Discussion
  • #69236

    Jack Casey
    Member

    Dear RDA members,
     
    I’m writing in my capacity as a postdoctoral researcher for Juan Manuel Durán’s Making Dark Data FAIR project (Delft University of Technology – The Netherlands).
     
    As part of our investigation, we are attempting to ascertain how much data held by academic institutions (or otherwise) is dark (i.e, non-reusable). As part of building an overall picture, we’d like to ask members of the interest group if they could find the time to contact their systems admin and request some information regarding the amount of dark data on their servers.
     
    Essentially, we need the following statistics from your administrator:
     

    De-registered users: users that have been deleted from the user administration database in the past 5 years

    We need to know how many accounts are/were held by de-registered users
    We also need to know the ratio of de-registered users and total users in your system

    Inactive users: a registered user that has not had access to his/her account or to the system in the past five years

    We need to know how many accounts are from inactive users
    We also need to know the ratio of inactive users and total users in your system

     
    From past experience, systems admins are happy to supply such information, firstly because there are no privacy concerns (none of this information concerns individual users), and secondly because it’s fairly simple technically to acquire the information.
     
    If possible, we’d ask that you make contact with your systems administrator yourselves (rather than putting us in contact with them), only for the reason that we’re doubtful they would be willing to supply such information to us directly (given that we don’t work at your institution). However, if your system administrator has any questions regarding the statistics to be collected, please put him/her in contact with us.
     
    My email is j.j.casey@tudelft.nl. Please feel free to get in touch. Thankyou in advance!
     
    Best wishes
    Jack

  • Author
    Replies
  • #90066

    Andy Turner
    Member

    Dear Jack (cc rda-bigdata-ig),
    I work at the University of Leeds – a reasonable large and complicated organisation.
    Are you after statistics about all users of the computer system irrespective of whether they are/were visitors, undergraduate or postgraduate students, or staff?
    I am not sure if there is a way to produce counts for different types of user, but I could ask at the same time.
    Some points for consideration:
    * Some users work/role will have almost nothing to do with research data.
    * Some users might simultaneously be students and staff or have changed user type in the last 5 years.
    I am a little bit concerned that I ask for the data creating work and the data that comes back will be of very little use. How are you planning to use the data that comes back? If this were a research project, there would normally be a participant information sheet and some ethical review process. Please will you supply some additional information to help me decide whether or not I should submit an information request to our (somewhat overloaded) IT request system?
    Best wishes,
    Andy
    – Show quoted text -From: ***@***.***-groups.org on behalf of Jack J Casey via Big Data IG
    Sent: 07 January 2021 11:44
    To: Big Data IG
    Subject: [rda-bigdata-ig] Dark Data and FAIR
    Dear RDA members,
    I’m writing in my capacity as a postdoctoral researcher for Juan Manuel Durán’s Making Dark Data FAIR project (Delft University of Technology – The Netherlands).
    As part of our investigation, we are attempting to ascertain how much data held by academic institutions (or otherwise) is dark (i.e, non-reusable). As part of building an overall picture, we’d like to ask members of the interest group if they could find the time to contact their systems admin and request some information regarding the amount of dark data on their servers.
    Essentially, we need the following statistics from your administrator:
    * De-registered users: users that have been deleted from the user administration database in the past 5 years
    * We need to know how many accounts are/were held by de-registered users
    * We also need to know the ratio of de-registered users and total users in your system
    * Inactive users: a registered user that has not had access to his/her account or to the system in the past five years
    * We need to know how many accounts are from inactive users
    * We also need to know the ratio of inactive users and total users in your system
    From past experience, systems admins are happy to supply such information, firstly because there are no privacy concerns (none of this information concerns individual users), and secondly because it’s fairly simple technically to acquire the information.
    If possible, we’d ask that you make contact with your systems administrator yourselves (rather than putting us in contact with them), only for the reason that we’re doubtful they would be willing to supply such information to us directly (given that we don’t work at your institution). However, if your system administrator has any questions regarding the statistics to be collected, please put him/her in contact with us.
    My email is ***@***.***. Please feel free to get in touch. Thankyou in advance!
    Best wishes
    Jack

    Full post: https://www.rd-alliance.org/group/big-data-ig/post/dark-data-and-fair
    Manage my subscriptions: https://www.rd-alliance.org/mailinglist
    Stop emails for this post: https://www.rd-alliance.org/mailinglist/unsubscribe/72345

  • #90052

    Jack Casey
    Member

    Hi Andy,
    Thanks for your response.
    Having discussed this with the PI of the project, we agree with your assessment. The request was a bit vague, and we’d prefer statistics on postgraduate students and staff if possible, as we’d consider those users to be more likely to be engaged in research.
    Just a bit of background on where we’re coming from: our lead PI is the author of a paper that measures dark data in HPC facilities, we’re asking for this sort of anonymised statistics to continue our research (by building a broader picture of the state of data at academic institutions). This is part of a larger project in which we assess the FAIR data principles from a conceptual standpoint – having an idea of how big a problem dark data actually is helps motivate our position.
    This is indeed a research project. It’s funded by the EOSC secretariat. I’ve been told we don’t need to follow an ethical review process as we’re not asking for personalised information, just statistics regarding the state of aggregate data held by an institution. That said, if you’d like some form of confirmation that the statistics won’t be misused, the PI will happily sign a formal letter with a letterhead.
    Please let me know if there’s anything else I can clarify, and I hope to hear from you.
    Best wishes
    Jack

Log in to reply.