Reproducibility IG Activity Overview P22 Asynchronous Discussion: Highlighting cascad

P22 Asynchronous Discussion: Highlighting cascad

Tagged: P22

Creator

Discussion
May 21, 2024 at 5:08 pm #136616

Lauren Cadwallader
Participant

We hope you are enjoying P22 and have had a chance to attend a few of the sessions so far! We are diving into our asynchronous discussion, which you can join here in the mailing list or through our slack in the #rda-plenary-22 channel. Our next featured group is cascad.

cascad

Christophe Hurlin and Christophe Pérignon founded cascad (http://www.cascad.tech) in 2019 with a double objective: (1) to help individual researchers (mainly in economics and other quantitative social sciences) signal the reproducible nature of their research by granting reproducibility certificates and (2) to help other scientific actors (e.g., academic journals, universities, funding agencies, scientific consortia, data providers) verify the reproducibility of the research they publish, fund, or contribute to the production of.

cascad is a nonprofit research laboratory funded by the French National Center for Scientific Research (CNRS) along with several universities and research institutions. While cascad is based in France, it collaborates with researchers and academic journals from all around the world. Its workforce comprises full-time reproducibility engineers, part-time graduate students, and a group of faculty oversees the operations and promotes the services offered.

Dr. Christophe Pérignon has provided more information about cascad as well as references to key research papers within the thread below. If you would like to ask questions or discuss these responses, we invite you to do so within the thread.

Thread each question & response below

What is your/your organization’s vision when it comes to computational reproducibility (e.g., all scholarship is computationally reproducible by default)?

The establishment of cascad was driven by two firm beliefs. First, we believe that for science to be taken seriously, there needs to be a serious commitment to reproducibility. To put it simply, if we want the chain of science to be strong and useful to society, reproducibility should not be its weakest link. Second, we hold the conviction that merely making code and data publicly accessible does not fully address the reproducibility challenge. We have come to this resolute belief after launching and managing RunMyCode (www.runmycode.org), a repository for code and data used by various economics and management journals. In this role, we frequently observed researchers failing to share all the essential components (code, data, explanations) necessary to regenerate their results. This was frequently due to hurdles such as copyright issues, non-disclosure agreements (NDAs), or concerns related to data privacy. Moreover, even when all components were available, other researchers frequently struggled to execute them, and occasionally failed entirely.

What are some of the challenges you see to achieving this vision?

Launching and operating a third-party reproducibility verification service is costly. Colliard et al. (2023) decomposed the total costs between the fixed costs corresponding to the IT infrastructure (including software) and the variable costs corresponding to labor costs, computing costs, and the costs of accessing data. In a calibration exercise based on the actual number of papers published by 12 leading economics journals, they show that exploiting economies of scale could lower the average verification cost per paper from $763 (separated verification teams) to $330 (one single verification team for the 12 journals).

Our experience at cascad suggests that in addition to accessing restricted data, the most challenging and time-consuming task is to reconstruct the computing environment used by the original authors. Another challenge in practice is to be able to locate the results in the regenerated logfile because a surprisingly large fraction of code still does not automatically generate tables and figures (see Pérignon et al., 2024). These challenges suggest that one way to reduce verification costs is to increase automation in the verification process, raise awareness among researchers, and increase their coding skills.

The question of who should pay for the extra cost associated with reproducibility checks is also key. In the case of voluntary pre-submission verifications, it seems natural that the researchers requesting such certification will cover the associated costs. In the case of mandatory pre-publication checks, we propose that the cost should be shared between journals and research funding agencies. This subsidy from research funding agencies is justified by the public good and externality effects of producing reproducible research (see Colliard et al., 2023).

What would you like to ask the members of our Interest Group?

We would like to ask them to contact us if they want to partner with us. While we face growing demand, our financial resources and staff remain limited, and we are not in a position yet to deliver at scale.

References:

Pérignon, C. (2024) The Role of Third-Party Verification in Research Reproducibility, Harvard Data Science Review, forthcoming

Pérignon, C. et al. (2019) Certify Reproducibility with Confidential Data, Science

Colliard, J.E. et al. (2023) The Economics of Computational Reproducibility, Working Paper SSRN

cascad.tech
CASCaD - Certification agency for scientific code & data
CASCaD - Certification agency for scientific code & data
Creator

Discussion

Reproducibility IG

Group Organizers

P22 Asynchronous Discussion: Highlighting cascad