The Next Frontier: Making Research More Reproducible

Rosenberg, David E.; Filion, Yves; Teasley, Rebecca; Sandoval-Solis, Samuel; Hecht, Jory S.; van Zyl, Jakobus E.; McMahon, George F.; Horsburgh, Jeffery S.; Kasprzyk, Joseph R.; Tarboton, David G.

doi:10.1061/(ASCE)WR.1943-5452.0001215

Free access

Editorial

Apr 9, 2020

The Next Frontier: Making Research More Reproducible

Publication: Journal of Water Resources Planning and Management

Volume 146, Issue 6

https://doi.org/10.1061/(ASCE)WR.1943-5452.0001215

PDF

Science and engineering rest on the concept of reproducibility. An important question for any study is: are the results reproducible? Can the results be recreated independently by other researchers or professionals? Research results need to be independently reproduced and validated before they are accepted as fact or theory. Across numerous fields like psychology, computer systems, and water resources, there are problems reproducing research results (Aarts et al. 2015; Collberg et al. 2014; Hutton et al. 2016; Stagge et al. 2019; Stodden et al. 2018). This editorial examines the challenges to reproduce research results and suggests community practices to overcome these challenges. Coordination is needed among the authors, journals, funders, and institutions that produce, publish, and report research. Making research more reproducible will allow researchers, professionals, and students to more quickly understand and apply research in follow-on efforts and advance the field.

Real and perceived challenges to reproduce research results include the following:

•

The skill and effort required for authors to prepare, organize, and share their data, models, code, and directions to reproduce article figures, tables, and other results.

•

Some authors fear that other researchers will scoop them on follow-up studies, they cannot support their materials after publication, or no one else will use their materials.

•

Authors cannot share proprietary or sensitive materials or materials containing protected intellectual property.

•

Some workflows use stochastic, high-performance computing, big data, or methods with long run times that are too big to share or reproduce bit for bit.

•

It takes time and expertise to reproduce others’ results, and users may encounter unclear directions or missing materials.

•

Funders and universities value publication of novel, peer-reviewed journal articles rather than data sets, documentation, or reproduction of others’ efforts.

•

Promoting and rewarding reproducibility may unintentionally push researchers toward simpler, easier to reproduce methods, rather than studies that are more complex and far reaching but harder to reproduce.

Recent guidance by the National Academies of Sciences, Engineering, and Medicine (NAS 2019), Institute of Education Sciences, US Department of Education, and US National Science Foundation (NSF and IES 2018) describe reproducibility as a continuum (Fig. 1). The goal is to push work up the continuum to make data, models, code, directions, and other digital artifacts used in the research available for others to reuse (availability). Then, use shared artifacts to exactly reproduce published results (reproducibility, sometimes called bit or computational reproducibility). Finally, use artifacts with existing and new data sets to replicate findings across sites or domains (replicability). For example, the Journal of Water Resources Planning and Management policy to specify the availability of data, models, and code (Rosenberg and Watkins 2018) primarily targets availability in the reproducibility continuum. This policy has subsequently been adopted by nearly all of the 30+ journals published by ASCE.

We must also emphasize benefits to making materials available and organizing materials so others can reproduce the results:

•

Increase impact by increasing the number of persons who can access, use, and extend work.

•

Improve trust in the outcomes and findings from work.

•

Benchmark and compare new proposed methods and models with existing methods.

•

Organize research materials in perpetuity for oneself and future users, including future students who will later extend prior research.

•

Reduce and streamline effort authors spend to respond to individual user requests for article materials.

•

Transform the experience of reading science articles into opportunities to learn by doing. Engaging with article data, models, code, and directions lets users more fully comprehend, experiment, extend, remember, and eventually cite materials.

Each benefit also provides professionals better access to published research and can help narrow the gap between research and practice.

We should also look for ideas in the very small number of journals—such as Biostatistics, ACM Transactions on Mathematical Software, Journal of the American Statistical Association, and American Journal of Political Science—that already have reproducibility policies. Many of these journals created a new role of associate editor of reproducibility (AER). The AER has the responsibility to either reproduce manuscript results themselves or invite an external person to reproduce results. The AER can describe the value of reproducing a study and recommend how authors can make their results more reproducible. New roles of AER and reproducibility reviewers provide opportunities to engage more people in the journal peer-review process and to improve the quality of results.

Researchers and professionals use varied experimental, modeling, open-source, proprietary, deterministic, stochastic, local, cloud, and computationally intensive methods whose results may not be reproduced using today’s tools. We provide a checklist of recommended practices for authors, journals, funders, and institutions to push their research up the reproducibility continuum. Fig. 2 lists our favorite practices.

Fig. 2. Our favorite practices to make results more reproducible.

Checklist of Practices to Improve Reproducibility

Build Reproducibility into the Project from the Start

•

Budget time, money, local or cloud storage, and other resources to make results reproducible.

•

Select tools such as version control (Git or GitHub) and containerization (e.g., Docker, Sciunit) to make reproducibility practices easier and less time-consuming.

•

Where possible, choose open-source tools to reduce the financial and time costs for others to reproduce results.

•

Prepare data licensing agreements or institutional review board (IRB) protocols to allow for the future release of anonymized versions of proprietary or private data.

•

When possible, set up repository materials to run on a cloud-based system (e.g., using a Jupyter Hub) so users can run code and reproduce results directly on the web rather than download, setup, and execute on a local machine.

Choose a Repository

•

When possible, choose a single open repository common for your field. Example repositories for water resources work include an institutional repository, HydroShare, Harvard Dataverse, Figshare, Dryad, and GitHub.

•

Choose a repository that is consistent with the content you are sharing. For example, GitHub is well suited for sharing source code, whereas HydroShare is better suited for data and models.

•

Choose a repository that meets funder requirements and applicable data-sharing laws.

•

Where possible, bundle all content—input data, models, code, results, directions, requirements, and other materials—in a single repository rather than spread across multiple repositories.

State the Level of Reproducibility Users Should Expect and List Requirements

•

State the level of reproducibility the user should expect.

•

List which portions of the workflow can be reproduced and which cannot.

•

List all required hardware needed to reproduce results.

•

List all required software and code library dependencies (packages and versions) needed to reproduce results. Specify software that requires a license to purchase.

•

List the skills and training needed by a person to reproduce work.

Make All Materials Available in the Repository

•

Provide all data, model(s), and code(s) for the workflow needed to reproduce the results in a publicly accessible repository.

•

Provide directions to install and run code and use materials to generate study results. Use a human-readable format that will persist through time (e.g., plain, ASCII, or rich text).

•

Write directions assuming people who will use the content are technically proficient in the field but not familiar with the data, models, code, software, programming language, methods, or study materials.

•

Provide metadata and describe the structure of the content (e.g., folder and file structure) and supplemental materials.

•

Describe all tabular data (e.g., CSV, Excel, or text files). Describe columns and units of measurement.

•

Specify which code segments produce each figure, table, or other results in the article.

•

Remove (or indicate) code references to local paths that are machine (user) specific. Where possible, use relative path names rather than local paths.

•

Provide links to download all software, libraries, or platforms.

•

Provide the input data to and results from each step that uses proprietary (licensed) software or data. Provide directions for how the software was used to generate the results.

Write Code to Automate Manual Computation Steps

•

Automate manual computation steps with code and scripts to increase the likelihood that others can reproduce results and reduce their time to complete those steps.

•

Comment each step of the code or script to make materials readable and easy to follow.

•

Follow coding standards and conventions for the language used (e.g., indentation, commenting, declarations, statements, and white space).

Make Proprietary, Private, Sensitive, Computationally Intensive, and Stochastic Data, Models, and Code More Reproducible

•

Where possible, make available the input and output data from every step of a workflow that uses proprietary software, is data intensive, requires a long run time, or cannot be rerun by other users. Provide inputs and outputs for each step so that a user who cannot access the underlying materials can still complete the workflow and reproduce overall results. For example, if an optimization study uses the proprietary General Algebraic Modeling System (GAMS) (Rosenthal 2014), make the input data (.csv), model text file (.gms), output (.gdx and.xlsx) and postprocessing (.r) files available (Fig. 3).

•

Note the underlying random seeds for steps that include randomness (stochasticity).

•

Make copies of proprietary data and results available in commonly used file formats (e.g., .gdx in Fig. 3 to .xlsx, CSV).

•

Anonymize proprietary, private, or sensitive data sets (e.g., data sets containing personally identifiable information or data protected by IRB requirements) so you can share.

•

Indicate where and how the user can obtain copies of proprietary or private data.

•

Where proprietary or big data sets cannot be shared, provide an additional example of the workflow with a nonproprietary or illustrative data set.

Fig. 3. Example of making a proprietary workflow more reproducible with a GAMS optimization model.

Make It Easy for People to Find the Content

•

Provide a digital object identifier (DOI) encoded as a URL that links directly to your repository. Some repository hosts, such as HydroShare or Dryad, provide a DOI when publishing materials. For other hosts such as GitHub, use a service like Zenodo to create a snapshot of the content and then generate a DOI.

•

In the manuscript, cite the repository directly in the text. Include full citation information, including the DOI, in the reference information.

Verify Your Results Are Reproducible

•

Ask a colleague, student, or other person not affiliated with the study to reproduce study results. This person could be a new student who needs to get up to speed on the methods or someone else interested in the study or results.

•

Ask this colleague, student, or other person to use a reproducibility survey tool (e.g., Stagge et al. 2019) to provide feedback on the repository, directions, and results that they reproduced.

•

If the person can reproduce results, acknowledge their effort in the manuscript’s data availability or results reproducibility section.

•

If the person had difficulty reproducing results, make changes to the repository to address their difficulties.

Follow Good Examples

1.

Adopt the practices of six articles that Stagge et al. (2019) awarded badges to for full and partial reproducible results. For example, these papers

•

Provided all model and code in a Github (Buscombe 2017; Neuwirth 2017; Xu et al. 2017), institutional (Yu et al. 2017), or HydroShare (Horsburgh et al. 2017) repository.

•

Had an easy-to-find README file that explained the contents and gave directions to setup and run code (Buscombe 2017; Horsburgh et al. 2017; Neuwirth 2017; Xu et al. 2017).

2.

Document workflows, make code reusable, and bundle code, data, and documentation (Hutton et al. 2016).

Encourage Journals to Promote Reproducible Results

•

Develop policies to verify reproducible results and make these policies clear to editors, authors, and reviewers.

•

Encourage authors to self-assess the reproducibility of their work prior to submission (see “Verify Your Results Are Reproducible”).

•

Assess reproducibility of submissions and provide feedback to authors.

•

Define what constitutes an article with reproducible results and recognize these articles (e.g., with badges or other incentives).

•

State the expected level of reproducibility of published articles. Track and report articles with reproducible results over time.

•

Hold competitions to compare reproducible results across articles and synthesize best practices.

•

Create journal awards, such as for outstanding effort to make complex results more reproducible or outstanding effort to reproduce results.

•

Publish reproducible papers as open access, free to the authors.

Encourage Funders to Promote Reproducible Results

•

Determine and state the expected level of reproducibility in funder repositories.

•

Require reproducibility in requests for proposals.

•

Encourage authors to self-assess reproducibility prior to submission (see “Verify Your Results Are Reproducible”).

•

Assess reproducibility of submissions and provide feedback to authors.

•

Verify work fulfills funder requirements for sharing data and results.

•

Support development of new tools to make it faster and easier for authors to make their research products more reproducible.

Encourage Universities, Agencies, and Institutions to Promote Reproducible Results

•

Train students and employees in reproducible practices (e.g., software and data carpentry workshops, hydroinfomatics courses).

•

Determine and state the expected level of reproducibility in institutional repositories.

•

Require reproducible practices for theses, dissertations, and project reports.

•

Develop standards and initiatives for open data and open source code.

•

Recognize faculty, researchers, and students that reproduce and extend other’s work.

Reproducibility is a core principle of science and engineering. Making results more reproducible requires time and effort by authors, journals, funders, and institutions. The research community will benefit from tools that automate and speed up those steps. We must also provide authors financial incentives and recognition to encourage them to make their work more available and results more reproducible. We look forward to including these reproducibility incentives and practices in a future reproducible results policy for the journal.

Acknowledgments

Rosenberg wrote the first draft. The remaining authors provided feedback on subsequent drafts that Rosenberg merged into more final versions. Tanu Malik, William Farmer, Laura DeCicco, and 22 other journal associate editors provided feedback that improved intermediate drafts.

References

Aarts, A. A., J. E. Anderson, C. J. Anderson, P. R. Attridge, A. Attwood, and A. Fedor. 2015. “Estimating the reproducibility of psychological science.” Science 349 (6251): 1–8. https://doi.org/10.1126/science.aac4716.

Google Scholar

Buscombe, D. 2017. “Shallow water benthic imaging and substrate characterization using recreational-grade sidescan-sonar.” Environ. Modell. Software 89 (Mar): 1–18. https://doi.org/10.1016/j.envsoft.2016.12.003.

Google Scholar

Collberg, C., T. Proebsting, G. Moraila, A. Shankaran, Z. Shi, and A. M. Warren. 2014. Measuring reproducibility in computer systems research. Tucson, AZ: Univ. of Arizona.

Google Scholar

Horsburgh, J. S., M. E. Leonardo, A. M. Abdallah, and D. E. Rosenberg. 2017. “Measuring water use, conservation, and differences by gender using an inexpensive, high frequency metering system.” Environ. Modell. Software 96 (Oct): 83–94. https://doi.org/10.1016/j.envsoft.2017.06.035.

Google Scholar

Hutton, C., T. Wagener, J. Freer, D. Han, C. Duffy, and B. Arheimer. 2016. “Most computational hydrology is not reproducible, so is it really science?” Water Resour. Res. 52 (10): 7548–7555. https://doi.org/10.1002/2016WR019285.

Google Scholar

NAS (National Academies of Sciences, Engineering, and Medicine). 2019. Reproducibility and replicability in science. Washington, DC: NAS.

Google Scholar

Neuwirth, C. 2017. “System dynamics simulations for data-intensive applications.” Environ. Modell. Software 96 (Oct): 140–145. https://doi.org/10.1016/j.envsoft.2017.06.017.

Google Scholar

NSF and IES (National Science Foundation and Institute of Education Sciences). 2018. “Companion guidelines on replication & reproducibility in education research: A supplement to the common guidelines for education research and development.” US Dept. of Education. Accessed March 3, 2020. https://www.nsf.gov/pubs/2019/nsf19022/nsf19022.pdf.

Google Scholar

Rosenberg, D. E., and D. W. Watkins Jr. 2018. “New policy to specify availability of data, models, and code.” J. Water Resour. Plann. Manage. 144 (9): 01618001. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000998.

Google Scholar

Rosenthal, R. E. 2014. GAMS: A user’s guide. Washington, DC: GAMS Development Corporation.

Google Scholar

Stagge, J. H., D. E. Rosenberg, A. M. Abdallah, H. Akbar, N. A. Attallah, and R. James. 2019. “Assessing data availability and research reproducibility in hydrology and water resources.” Sci. Data 6: 190030. https://doi.org/10.1038/sdata.2019.30.

Google Scholar

Stodden, V., J. Seiler, and Z. Ma. 2018. “An empirical analysis of journal policy effectiveness for computational reproducibility.” Proc. National Acad. Sci. 115 (11): 2584–2589. https://doi.org/10.1073/pnas.1708290115.

Google Scholar

Xu, W., P. Collingsworth, B. Bailey, M. Carlson Mazur, J. Schaeffer, and B. Minsker. 2017. “Detecting spatial patterns of rivermouth processes using a geostatistical framework for near-real-time analysis.” Environ. Modell. Software 97 (Nov): 72–85. https://doi.org/10.1016/j.envsoft.2017.06.049.

Google Scholar

Yu, C. W., F. Liu, and B. R. Hodges. 2017. “Consistent initial conditions for the Saint-Venant equations in river network modeling.” Hydrol. Earth Syst. Sci. 21 (9): 4959–4972. https://doi.org/10.5194/hess-21-4959-2017.

Google Scholar

Information & Authors

Information

Published In

Journal of Water Resources Planning and Management

Volume 146 • Issue 6 • June 2020

Copyright

History

Received: Dec 13, 2019

Accepted: Dec 30, 2019

Published online: Apr 9, 2020

Published in print: Jun 1, 2020

Discussion open until: Sep 9, 2020

Permissions

Request permissions for this article.

Request Permissions

Authors

Affiliations

David E. Rosenberg, A.M.ASCE [email protected]

Associate Professor, Dept. of Civil and Environmental Engineering and Utah Water Research Laboratory, Utah State Univ., 8200 Old Main Hill, Logan, UT 84322-8200 (corresponding author). Email: [email protected]

View all articles by this author

Yves Filion, Ph.D., M.ASCE [email protected]

P.Eng., D.WRE

Professor, Dept. of Civil Engineering, Queen’s Univ., 58 University Ave., Kingston, ON, Canada K7K0B9. Email: [email protected]

View all articles by this author

Rebecca Teasley, A.M.ASCE [email protected]

Associate Professor, Dept. of Civil Engineering, Univ. of Minnesota Duluth, 1405 University Dr., Duluth, MN 55812. Email: [email protected]

View all articles by this author

Samuel Sandoval-Solis, Ph.D., A.M.ASCE [email protected]

Associate Professor, Dept. of Land, Air, and Water Resources, Univ. of California, Davis, 1 Shields Ave., Davis, CA 95616. Email: [email protected]

View all articles by this author

Jory S. Hecht [email protected]

Hydrologist, US Geological Survey, 10 Bearfoot Rd., Northborough, MA 01532, USA. Email: [email protected]

View all articles by this author

Jakobus E. van Zyl, M.ASCE [email protected]

Watercare Chair in Infrastructure, Dept. of Civil and Environmental Engineering, Univ. of Auckland, 20 Symonds St., Auckland 1010, New Zealand. Email: [email protected]

View all articles by this author

George F. McMahon, Ph.D., M.ASCE [email protected]

P.E., D.WRE, P.H.

Vice President, National Expert, Water Management, Arcadis, 2839 Paces Ferry Rd., Suite 900, Atlanta, GA 30339. Email: [email protected]

View all articles by this author

Jeffery S. Horsburgh, Ph.D. [email protected]

Associate Professor, Dept. of Civil and Environmental Engineering and Utah Water Research Laboratory, Utah State Univ., 8200 Old Main Hill, Logan, UT 84322-8200. Email: [email protected]

View all articles by this author

Joseph R. Kasprzyk, Ph.D., A.M.ASCE [email protected]

Assistant Professor, Dept. of Civil, Environmental, and Architectural Engineering, Univ. of Colorado Boulder, UCB 607, Boulder, CO 80309. Email: [email protected]

View all articles by this author

David G. Tarboton, Sc.D., M.ASCE [email protected]

P.E.

Professor, Director Utah Water Research Laboratory, Dept. of Civil and Environmental Engineering, Utah State Univ., 8200 Old Main Hill, Logan, UT 84322-8200. Email: [email protected]

View all articles by this author

Metrics & Citations

Metrics

Citations

Download citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

The Next Frontier: Making Research More Reproducible

Checklist of Practices to Improve Reproducibility

Build Reproducibility into the Project from the Start

Choose a Repository

State the Level of Reproducibility Users Should Expect and List Requirements

Make All Materials Available in the Repository

Write Code to Automate Manual Computation Steps

Make Proprietary, Private, Sensitive, Computationally Intensive, and Stochastic Data, Models, and Code More Reproducible

Make It Easy for People to Find the Content

Verify Your Results Are Reproducible

Follow Good Examples

Encourage Journals to Promote Reproducible Results

Encourage Funders to Promote Reproducible Results

Encourage Universities, Agencies, and Institutions to Promote Reproducible Results

Acknowledgments

References

Information & Authors

Information

Published In

Copyright

History

Permissions

Authors

Affiliations

Metrics & Citations

Metrics

Citations

Download citation

Cited by

View Options

Media

Figures

Other

Tables

NEXT ARTICLE

Verify Phone

Congrats!

Checklist of Practices to Improve Reproducibility

Build Reproducibility into the Project from the Start

Choose a Repository

State the Level of Reproducibility Users Should Expect and List Requirements

Make All Materials Available in the Repository

Write Code to Automate Manual Computation Steps

Make Proprietary, Private, Sensitive, Computationally Intensive, and Stochastic Data, Models, and Code More Reproducible

Make It Easy for People to Find the Content

Verify Your Results Are Reproducible

Follow Good Examples

Encourage Journals to Promote Reproducible Results

Encourage Funders to Promote Reproducible Results

Encourage Universities, Agencies, and Institutions to Promote Reproducible Results

Acknowledgments

References

Information

Published In

Copyright

History

Permissions

Authors

Affiliations

Metrics

Citations

Download citation

Cited by

Figures

Other

Share

Copy the content Link

Share with email

Share

Request Username

Create a new account

Change Password

Password Changed Successfully

Verify Phone

Congrats!