Reproducibility IG Activity Overview P22 Asynchronous Discussion: Highlighting CODECHECK

P22 Asynchronous Discussion: Highlighting CODECHECK

Creator

Discussion
May 29, 2024 at 1:45 pm #141454

Limor Peer
Participant

VP22 is over, we are still posting summaries from groups — in RDA and external to RDA — who do amazing work in the reproducibility space.

You can join the Reproducibility IG Slack #rda-plenary-22 channel (all our main postings are pinned in the slack channel) OR you can participate through the RDA Reproducibility IG mailing list by replying to the messages sent out. Feel free to comment or ask questions!

Today, I’m posting about CODECHECK.

CODECHECK is an organization that conducts independent execution of computations underlying research articles. The independent time-stamped runs conducted by codecheckers award a “certificate of executable computation” and increase availability, discovery and reproducibility of crucial artefacts for computational sciences.

Stephen Eglen has provided additional information on their vision for reproducibility and its challenges in the thread below. Please feel free to continue the conversation within the thread!

Briefly, tell us about your work/organization and how it’s related to computational reproducibility; What are you trying to address and how?

CODECHECK (codecheck.org.uk) is our system for checking whether the computations underlying a research article are reproducible. In particular, we re-run someone else’s code with their data to check that we can generate the same (*, for some definition of “same”) outputs that appear in the manuscript. We write a certificate summarising our findings and then ensure that all artifacts (code, data, paper) are freely available. It therefore encourages reproducibility and openness by default.

What is your/your organization’s vision when it comes to computational reproducibility (e.g., all scholarship is computationally reproducible by default)?

Certainly by default we would like to say that computational work should be reproducible. I think that in some domains this should be readily achievable, but in many areas this is not yet feasible (see point 3). A simpler vision for now is that I think more journals should reward authors for providing code/data (or reproducible papers). I’d like to see us create an independent organisation which runs something like codecheck as a service for e.g. a large fraction of STEM. Journals or institutions could subscribe to this service to pay for full-time codecheckers.

What are some of the challenges you see to achieving this vision?

a) As compute needs grow, e.g. in machine learning, reproducing work may take significant compute that our codecheck reviewers may not have access to.

b) in clinical domains, openly sharing data is problematic.

There are workarounds for each of (a,b), e.g. providing dummy/synthetic data or small examples.

c) asking reviewers or journals to do more work is further increasing the workload on already overworked researchers.

What would you like to ask the members of our Interest Group?

a) would anyone like to join us in the search for funding to create the independent organisation that runs codecheck as a service?

b) how do we meaningfully give credit to those that undertake such codecheck reviews if we continue with the same model as peer review (i.e. people do it as a service to their community)?
Creator

Discussion

Reproducibility IG

Group Organizers

P22 Asynchronous Discussion: Highlighting CODECHECK