Resources for Reviewing Code

Posted on February 27, 2025 by Jody Peters

February 27, 2025

Co-authors are Education and Theory Working Group Participants, Resource Developers, and Testers of the Review Materials:
Jody Peters¹, Abby Lewis², Alyssa Willson¹, Cazimir Kowalski¹, Cole Brookson³, Gerbrand Koren⁴, Hassan Moustahfid⁵, Hannah O’Grady¹, Jason McLachlan¹, John Zobitz⁶, Mary Lofton⁷, Ruby Krasnow⁸, Saeed Shafiei Sabet⁹

¹University of Notre Dame, ²Smithsonian Environmental Research Center, ³Yale University, ⁴Utrecht University, ⁵NOAA, ⁶Augsburg University, ⁷Virginia Tech, ⁸University of Maine, ⁹University of Guilan

The goal of this blog post is to share resources that individuals in the EFI community have developed and have found useful when reviewing code.

Specifically, this blog post provides

An Overview of why to review code or have your code reviewed
The Background for this blog post and the resources presented
Resources developed and tested by blog co-authors including a project overview template and code review checklist template
Pain points to be aware of and suggestions for how to manage them in the review process
Other resources from SORTEE (Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology)
Additional resources the EFI working groups have found useful
Best wishes for your code review

Why to review code or have your code reviewed

Just like text review, such as peer-review or co-author review, improves published manuscripts, code review can be critical for reliability, reusability, reproducibility, and knowledge sharing. Code review can take many forms, including as an individual or team activity, in research or classroom settings. Ultimately, reviewing code provides an opportunity to:

learn from more experienced coders about how to code or code more efficiently, either as the code reviewer or as the one requesting code review
provide another set of eyes to reduce errors and the potential of reporting faulty results, which can slow down scientific progress and may lead to retractions for a publication
increase the reliability and reusability of the code to help with the repeatability of studies and the application of previously developed code in new contexts. This is increasingly recognized as an important characteristic of research software (Barker et al., 2022)
carefully check any code that has been drafted with AI tools (e.g., chatGPT, Copilot, etc.). AI tools may be helpful to save time when first drafting code. However, any code created using an AI tool should not be blindly trusted to work. Ben Weinstein discusses this at time 13:49 in the January 2025 Statistical Methods Seminar Series presentation on the DeepForest package https://youtu.be/fhlC0W2kDMQ?si=KZYObPIlt2512T1Y

Open code reviews coordinated through a third party like rOpenSci also provide opportunities to network and meet colleagues and collaborators from other scientific domains. For R package developers, submitting your code to rOpenSci for peer review has many additional benefits, including assistance with package maintenance and social media promotion.

While there are many benefits of having your code reviewed, there are, however, few resources and standards that exist for code review in ecology, and the specific methods for code review will likely differ across career stages, manuscript development stages, etc.

Background

Over the past year, the EFI Theory and Education working groups have discussed and developed resources for reviewing code that we wanted to share with others who are thinking about or are in the process of having their code reviewed or reviewing code for others.

The working group discussions and subsequent resources were framed around the Ivimey-Cook et al 2023 paper “Implementing code review in the scientific workflow: Insights from ecology and evolutionary biology” (https://doi.org/10.1111/jeb.14230) and materials shared by the SORTEE community (Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology; https://www.sortee.org/).

The Ivimey-Cook et al 2023 paper provides commentary on

How to effectively review code
How to set up projects to enable this form of review
How to implement code review at several stages throughout the research process

In the paper, the authors highlight “The 4 Rs” that code review should evaluate:

Is the code as Reported?
1. Methods and code must match
Does the code Run?
1. Code must be executable
Is the code Reliable?
1. Code runs and completes as intended
Are the results Reproducible?
1. Results must be able to be reproduced

They describe a basic workflow of questions to answer when reviewing code, as summarized in the figure below.

(Image source: Ivimey-Cook et al., 2023)

Jump to the top of the blog post

Resources developed by working group members to share with the EFI community

Based upon the work by Ivimey-Cook et al., the EFI Education and Theory working groups put together two documents: a project overview document and a code review checklist. During the creation of these documents, the assumption is that the code review is being done by an internal code reviewer (e.g., lab mate) vs an external code reviewer (e.g., reviewer for a journal).

The project overview template is filled out by the person who wrote the code. This document helps clarify the purpose of the analysis and where feedback would be useful.

Conversely, the checklist template is filled out by the person who is reviewing the code. It identifies the key points to check during the review.

Project Overview to Prep for Code Review Template – This Project Overview template helps authors describe their project for individuals who will be reviewing their code.
Code Review Checklist Template (spreadsheet version, pdf version) – This Checklist template is based on the material in Ivemy-Cooke et al 2023. There are checklists related to project organization, project and input metadata, code readability, and output readability that both authors and reviewers can check and add notes about. This Checklist has been implemented by working group members based at the University of Notre Dame.

While the checklist is a good way for a reviewer to check off what has been reviewed, we recommend creating a separate document to note any issues a reviewer has during the code review that needs to be addressed by the code author. The code review document can be a word or Google doc or it can be an RMarkdown (.Rmd) file in the GitHub repo with the code. The benefit of the .Rmd file versus a pull request is that the updates to the .Rmd file allow for versioning and transparency without requiring the code reviewer to make the actual code fixes, but instead leaving that to the code author.

Jump to the top of the blog post

Pain points to be aware of & suggestions for how to manage them

Pain point 1: Code that takes a long time to run or creates a large amount of output

How to Manage: Code authors can provide aggregated output or a small example data set that can be run locally

Often in ecological forecasting we develop code workflows that take hours, days, or weeks to run. To avoid placing this computational burden on a code reviewer, authors can either provide the aggregated analysis output or a small example data set for review. Choosing between these two options likely depends on the goals of the code review. If the review is happening at a more mature stage of the project and the primary goal is to reproduce manuscript figures, providing the reviewer with aggregated output may suffice. The disadvantage of this approach is that the reviewer will likely not be running all the steps in the analysis, and therefore may miss errors that occur “upstream” of the creation of the aggregated output. On the other hand, if feedback is needed on the scientific merit and correctness of the analysis from start to finish, it may be better to provide a small example data set to allow the reviewer to run the entire workflow. The disadvantage of this approach is that the results obtained with the example data set will not match the results reported in the final manuscript.

If authors choose to provide an example subset of data for code review, tools such as RMarkdown, Quarto, or Jupyter Notebooks can be useful to walk reviewers through the analysis. These file types allow text interspersed with code and visualizations in an interactive format, which may help a reviewer navigate the steps of a complex coding workflow. The downside is that the file paths/etc will need to be updated to apply to the subset of data.

Each project will need to decide what specific approach works best for them.

Pain point 2: Large data used in analyses that are not yet publicly archived. It can be logistically challenging to share the data and it makes checking paths, folder structure, or data intermediates difficult.

How to Manage: One approach taken by some blog co-authors is to use the staging environment in the Environmental Data Initiative data portal to make data available online so it can be sourced as a script before it is assigned a DOI. The benefit of this approach is that the data is then ready to be archived and ready to add to the manuscript once the checks on the code are finalized.
Another approach used is to share the data with the code reviewer using an external hard-drive or Google drive with zipped folders. If this approach is taken, be sure to include notes for where file paths need to be changed for and after the review.

Pain point 3: Different versions of R (or other coding language) or data packages and their dependencies and compiled languages, e.g. C++

How to Manage: Use a docker environment.

If using a container, code authors should be sure to provide clear instructions to peer reviewers about how to set up and run a container on their machine, as well as how to delete/uninstall the container software afterward. Because using a container adds an extra step to the review process (particularly for those who have not previously used containers), it may be best to reserve this option for analyses with a high number of software and package dependencies, because installing a container becomes easier than installing all those dependencies separately.

Pain point 4: Reviewing code can take a substantial amount of time, anywhere from a couple of hours to a couple of days, depending on the scope of the review. Blog co-authors have found that completing a code review for a co-author often takes longer than completing a peer review of the manuscript.

How to Manage: Be cognizant that reviewing code can take a substantial amount of time and plan accordingly to give the reviewer enough time before major milestones, such as submitting a manuscript. Alternatively, depending on the situation, it may be best to plan for continuous code review as part of the manuscript writing process.
No matter the approach, we highly recommend including code reviewers as co-authors for publications using output from code that was reviewed. This is to appropriately recognize the effort and intellectual contribution involved from code review and it is in line with increasing recognition for co-authorship and different roles in the development and testing process (Leem et al. 2023)

Pain point 5: Getting too much or too little input from a reviewer based on your publication needs.

How to Manage: Before you ask for a review, determine how in-depth you want the review to be. This may reflect what stage you are in the manuscript writing or analysis process. If you are early in the process and want help with making your code more efficient, the code review feedback may work best to come in as GitHub pull requests. If you feel your code is finalized and ready for one final review before publication, then it may work best to have a more in-depth review to confirm the output for the publication can be recreated with the code that will be shared for the publication. The “Project Overview” template (described above) is intended to help communicate these needs when asking for code review.

In practice, you may wish to seek code review at multiple stages in the analysis and writing process. For example, you might ask for a co-author to review key components of the analysis for code correctness (e.g., the “science” is correct; units are converted properly; statistical analyses are applied appropriately; and so on) as you explore your preliminary results. Later, while developing and preparing to submit your manuscript, you may ask a co-author to review more surficial aspects of the code base (e.g., files are organized in a logical way; all filepaths are relative; and so on). We re-emphasize that even senior scientists and experienced coders make mistakes! It is always better to find them before publication than afterward.

Jump to the top of the blog post

Other resources from SORTEE

SORTEE is the Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology. The SORTEE community led the Ivimey-Cook et al 2023 paper and in addition to the paper have shared other resources.

SORTEE: https://www.sortee.org/
SORTEE Slack channel – here is the link to join https://join.slack.com/t/sortee/shared_invite/zt-2fnqytett-AND1mTuXBKQWYyWUXKn6YA
Library of code mistakes: https://docs.google.com/presentation/d/12QN3WUc5v1Df7OArEox2U7l_N_qnHHuwzjCYiI4idC8/edit#slide=id.p
1. Issues that people have found when their code has been reviewed can be anonymously added to this file. It is structured with the same headings used in the 4R paper on code review

Other papers or resources the EFI working groups found helpful

Iveimey-Cook et al. 2023. Implementing code review in the scientific workflow: Insights from ecology and evolutionary biology. Journal of Evolutionary Biology 36: 1347-1356. https://doi.org/10.1111/jeb.14230
Filazzola, A. and C.J. Lortie. 2022. A call for clean code to effectively communicate science. Methods in Ecology and Evolution 13, 2119–2128. https://doi.org/10.1111/2041-210X.13961
Hunter-Zinck, H. et al. 2021. Ten simple rules on writing clean and Reliable Open-source scientific software. PLOS Computational Biology 17(11): e1009481.
https://doi.org/10.1371/journal.pcbi.1009481
Alston, J.M. and J.A. Rick. 2021. A Beginner’s Guide to Conducting Reproducible Research. Bulletin of the Ecological Society of America 102(2): 1-14. https://www.jstor.org/stable/27000718
Git + GitHub As A Platform For Reproducible Research. GitHub – gchure/reproducible_research: A template repository for how I structure my scientific research. This repository sets out the skeleton of an organizational structure used for scientific research.
Cooper, N. 2018. A Guide to Reproducible Code in Ecology and Evolution. https://nhm.openrepository.com/handle/10141/622618. A Guide to Reproducible Code covers all the basic tools and information you will need to start making your code more reproducible.
Lakens, D. 2022. Improving Your Statistical Inferences. Retrieved from https://lakens.github.io/statistical_inferences/. https://doi.org/10.5281/zenodo.6409077.
An open educational resource contains information to improve statistical inferences, design better experiments, and report scientific research more transparently.
Boettiger, C. 2015. An introduction to Docker for reproducible research. ACM SIGOPS Operating Systems Review, 49, 71–79. https://doi.org/10.1145/2723872.2723882
Reproducibility Demo: https://github.com/rqthomas/reproducibility-demo
Barker, M. et al. 2022. Introducing the FAIR Principles for research software. Scientific Data 9: 622. https://doi.org/10.1038/s41597-022-01710-x
Leem, D. et al. 2023. SORTÆD: Software Role Taxonomy and Authorship Definition (0.1). Zenodo. doi:10.5281/zenodo.7896456; https://sdruskat.net/software-authorship/.

Jump to the top of the blog post

Best wishes for your code review

We wish you all the best as you create and have your code reviewed or review code for others.

This XKCD comic was shared in one of the EFI working group calls and hopefully, it brings you a smile and it inspires you to avoid this in your own code. Or hopefully, you won’t see it in the code you are reviewing! (https://xkcd.com/1833/)!

Participate in the EFI-USGS River Chlorophyll Forecasting Challenge!

Posted on April 29, 2024 by Jody Peters

April 29, 2024

Cross-posted from https://waterdata.usgs.gov/blog/habs-forecast-challenge-2024/

Authors: Jacob Zwart (he/him), Rebecca Gorney (she/her), Jennifer Murphy (she/her), Mark Marvin-DiPasquale (he/him)

We invite you to submit to the EFI-USGS River Chlorophyll Forecasting Challenge! Co-hosted by the Ecological Forecasting Initiative (EFI) and U.S. Geological Survey (USGS) Proxies Project, this challenge provides a unique opportunity to forecast data from the USGS. By participating, you’ll sharpen your forecasting skills and contribute to vital research aimed at addressing pressing environmental concerns, such as harmful algal blooms (HABs) and water quality.

Why river chlorophyll?

Algal blooms cost the U.S. economy $2.2-4.6 billion dollars per year on average in water treatment and economic losses (Hudnell, 2010). Developing the capacity to predict when and where these blooms might occur could greatly reduce their impact. While much attention has been given to advancing predictive capabilities for algal blooms in lakes, river algal blooms can also cause substantial socio-ecological impacts, yet our understanding of their dynamics lags that of lakes. Fortunately, the number and diversity of observations that can be used to predict algal blooms are rapidly increasing (e.g., chlorophyll sensors), which enables powerful modeling techniques to extract patterns from these data and predict future HABs with sufficient lead time to initiate appropriate management interventions. Chlorophyll serves as a reliable proxy for algal biomass and can indicate when there might be an impending algal bloom. With this challenge, we hope to compare many different approaches for forecasting river chlorophyll to better understand the predictability of chlorophyll and potentially HABs in rivers across the United States.

How can I get involved?

Getting involved is easy! Simply visit our EFI-USGS River Chlorophyll Forecast Challenge website to register and gain access to all the necessary resources and instructions. Whether you’re a seasoned researcher, a budding data scientist, or participating in a classroom project, there’s a place for you in this challenge. We provide step-by-step instructions, target data, numerical weather forecasts, and tutorials to empower you throughout the process. Plus, all forecasts and scores are publicly available, fostering transparency and collaboration within the community.

Who is organizing?

The Ecological Forecasting Initiative (EFI) is a grassroots consortium dedicated to building and supporting an interdisciplinary community of practice around near-term ecological forecasts. EFI has been running a separate forecast challenge since 2021, welcoming participants to forecast ecological data at National Ecological Observatory Network sites (Thomas et al. 2023). Building forecast models, generating forecasts, and updating these forecasts with new information requires a lot of data, and fortunately the USGS is largest provider of in-situ water information in the world. The USGS Proxies Project teamed up with EFI to select monitoring sites that fulfill the data requirements for a forecast challenge while also being strategically chosen based on their scientific, management, or social significance. Our EFI-USGS team is committed to advancing research in ecological forecasting and environmental modeling and your participation enhances this effort!

Are there any prizes or awards?

While there are no monetary rewards, the benefits of contributing are substantial. Participants can expect to advance their forecasting skills, find joy in tackling complex ecological problems, and potentially be involved in the creation of manuscripts based on their contributions. Our forecasting challenge serves as a platform for the ecological and data science communities to enhance their skills in forecasting ecological systems. By generating forecasts, participants contribute to a synthetic understanding of patterns of environmental predictability.

What if I have questions and will there be updates?

Have questions or need assistance? Feel free to reach out to Jacob Zwart at jzwart@usgs.gov for prompt support and guidance. Additionally, stay updated on the latest developments and announcements by visiting the EFI-USGS River Chlorophyll Forecast Challenge website. We’re here to ensure your experience in the challenge is smooth and rewarding, so don’t hesitate to reach out with any questions.

Reenvisioning EFI-RCN NEON Forecast Challenge Dashboard Visualization

Posted on August 21, 2023 by Jody Peters

August 22, 2023

Melissa Kenney¹, Michael Gerst², Toni Viskari³, Austin Delaney⁴, Freya Olsson⁴, Carl Boettiger⁵, Quinn Thomas⁴

¹University of Minnesota, ²University of Maryland, ³Finnish Meteorological Institute,⁴Virginia Tech, ⁵University of California, Berkeley

With the growth of the EFI NEON Ecological Forecasting Challenge, we have outgrown the current Challenge Dashboard, which was designed to accommodate a smaller set of forecasts and synthesis questions. Thus, we have reenvisioned the next stage of the EFI-RCN NEON Forecast Challenge Dashboard in order to facilitate the ability to answer a wider range of questions that forecast challenge users would be interested in exploring.

The main audience for this dashboard are NEON forecasters, EFI, Forecast Synthesizers, and students in classes or teams participating in the Forecast Challenge. Given this audience, we have identified 3 different dashboard elements that will be important to include:

forecast synthesis overview,
summary metrics about the Forecast challenge, and
self diagnostic platform.

During the June 2023 Unconference in Boulder, our team focused on scoping all three dashboard elements and prototyping the forecast synthesis overview. The objective of the synthesis overview visual platform is to support community learning and emergent theory development. Thus, the synthesis visualizations are aimed at creating a low bar entry for multi-model exploration to understand model performance, identify characteristics that lead to stronger performance than others, the spatial or ecosystems that are more predictable, and temporal forecast validity.

You can view the prototype developed during the meeting HERE and in Figures 1 and 2.

Figure 1. Static image of an interactive map of aggregate forecast skill relative to climatology at each forecasted sites, here showing the water temperature forecasts for the aquatics theme. Bubble colour represents the continuous rank probability score (CRPS) skill relative to climatology with positive values (blues) showing submitted models on average perform better than climatology and negative values showing submitted models perform worse (reds). The size of the bubble represents the percentage of submitted models that outperformed the climatology null (i.e., larger sized bubbles have a higher percentage of skilled models). When hovered over, the bubbles show this percentage (perc_skilled), the site type (field_site_subtype), as well as the total number of models forecasting at that site (n_mod).

Figure 2. a) Percentage of submitted models that are classed as ‘skillful’ (outperform the null climatology forecast based on the continuous rank probability score metric) at the river (n=27) and lake sites (n=6) for water temperature forecasts at each horizon from 1 to 30 days ahead. b) Percentage of submitted models that are classed as ‘skillful’ for water temperature forecasts at six of the lake sites (https://www.neonscience.org/field-sites/explore-field-sites).

Developing these graphics requires aggregation of skill scores. There are a multitude of metrics that can be used to calculate the skill score, which each have their own benefits and flaws. Thus, there should be multiple skill scores for different metrics with clear presentation of what metric is used at a given visualization. Additionally, in order to isolate what sites are more interesting from a model development perspective, there needs to be a comparison of how many of the models meet a baseline skill score at a given site at a chosen time frame. That allows isolating challenge areas and also easily informs which models really succeed at situations where others struggle. For better future analysis of how models perform at certain sites, we also envisage the visualization to include the skill scores for the relevant drivers (NOAA weather) for comparison. For example, if we see a drop in skill across models in water temperature projections after some time, there should be a direct method to assess if this reflects overall flawed model dynamics or if the weather forecast driving the water temperature loses its reliability. This also allows the user to approximate a maximum length in which the model performance analysis is at all useful.

In addition to the main synthesis overview, the goal of this platform is to support exploration of synthesis data. For all themes, there was general agreement that it would be useful to pull up at a glance, site characteristics, a photo, and basic summary statistics about the number of models and model performance.

During the meeting, we worked with the Aquatics and Beetles Challenge teams to identify some of the key data aggregation groupings that will be important to facilitate exploration. One important distinction arose during the conversations – the baseline model, time scale, and data latency. For Aquatics there is a long time series of data that create a climatology and data are provided relatively quickly via data loggers. For Beetles, there is a different null baseline model given the length of historic data that is different at each site and it takes a year to provide beetle abundance and richness assessment. There was also a desire to have specific types of synthesis visualizations including the species accumulation curve over years, 3-year running average, and indicating the lower and upper bounds of a particular variable (use in scale). Thus, for both Beetles and Aquatics there are similarities and differences in the types of groupings that would be most useful to support synthesis exploration.

Table 1. Different data groupings that would be useful to facilitate easy-to-develop synthesis visualizations of the EFI-NEON Forecast Challenge models to facilitate learning and community theory development.

Groupings	All Themes	Aquatics	Beetles
Team / Challenge	theme, site, model ID, customized classroom or team groupings	particular variables (e.g., DO) within a theme
Spatial / Ecosystems	sites, NEON domains, site type (river, stream, lake…), altitude (high vs lowlands)		sites by distance, dominant NLCD classification
Temporal Scale	average for past year, seasonal groupings,	1 day, 5 days, 7 days, 15 days, 30 days	14 days, growing season, multi-year (up to 5 year) forecasts
Models	best model at each site, model inputs, model structure, functional type, output uncertainty representation		model run time, model computational requirements
Skill Scoring	current skill forecast approaches, better than climatology/null baseline,		comparison of your model to the best forecast
Other Features	environmental variables and weather forecast observations	comparison with weather/climate forecast skill	disturbance events (e.g., widlfire), growing season dates at each sites, site disturbance characteristics (e.g., mowing, fencing)

In addition to the synthesis overview, there were two complementary and linked platforms that will create the dashboard. First, the objective of the forecast challenge overview is to provide a basic summary of metrics related to the overall EFI NEON Ecological Forecasting Challenge. Specifically, the metrics that would be included are: number of forecasts submitted, number of unique teams, percentage (or median of all) of models that are better than climatology or a null model per theme, and total forecast and observation pairs.

Second, the objective of the self-diagnositic platform is to provide an overview for individuals or team forecast contributions and performance. The types of summaries that will be provided for the forecasters are: confirmation of forecast submission, date of the most recent forecast submitted for a model, model performance relative to climatology or null model, model prediction versus observation, model performance vs other selected models, and model skill over a specific time horizon (to assess whether it performs better over time).

Overall, the goal of the re-envisioned visual dashboard is to create platforms that will allow us to track challenge engagement, individually or as a team diagnose any model submission problems and performance improvement opportunities, and support community theory development through a synthesis given the range of models submitted through the EFI NEON Ecological Forecasting Challenge. Long-term, if this platform structure is useful and robust, it could be applied to other systems where there are multi-model predictions and there is a desire to collaboratively learn together to improve our theoretical understanding and forecasts to support decision-making.

We are looking for input from the EFI community on the synthesis dashboard for other themes, to discuss with individuals what synthesis would be most relevant to phenology, terrestrial, and ticks forecasters. Reach out to info@ecoforecast.org to share your thoughts or let us know you would like to join future conversations about updating the dashboard.

NEON Biorepository Seeks Collaborative Opportunities in Ecological Monitoring & Forecasting Research

Posted on June 4, 2020 by Jody Peters

Date: June 4, 2020

Post by: Kelsey Yule; Project Manager, NEON Biorepository and Nico Franz; Principal Investigator, NEON Biorepository

Background. The National Ecological Observatory Network (NEON; https://www.neonscience.org/) is known for producing and publishing 180 (and counting) data products that are openly available to both researchers and the greater public. These data products span scales: individual organisms to whole ecosystems, seconds to decades, and meters to across the continent. They are proving to be a central resource for addressing ecological forecasting challenges. Less well known, however, is that these data products are all either directly the result of or spatially and temporally linked to NEON sampling of physical biological (e.g. microbial, plant, animal) and environmental (e.g. soil, atmospheric deposition) samples at all 81 NEON sites.

The NEON Biorepository at Arizona State University (Tempe, AZ) curates and makes available for research the vast majority of these samples, which consist of over 60 types and number over 100,000 per year. Part of the ASU Biodiversity Knowledge Integration Center and located at the ASU Biocollections, the NEON Biorepository was initiated in late 2018 and has received nearly 200,000 samples to date (corresponding to some 850 identified taxa in our reference classification). Sampling strategies and preservation methods that have resulted in the catalog of NEON Biorepository samples have been designed to facilitate their use in large scale studies of the ecological and evolutionary responses of organisms to change. While many of these samples, such as pinned insects and herbarium vouchers, are characteristic of biocollections, others are atypical and meant to serve researchers who may not have previously considered using natural history collections. These unconventional samples include: environmental samples (e.g. ground belowground biomass and litterfall, particulate mass filters; tissue, blood, hair and fecal samples; DNA extractions; and bulk, unidentified community-level samples (e.g. bycatch from sampling for focal taxa, aquatic and terrestrial microbes). Within the overarching NEON program, examination of these freely available NEON Biorepository samples is the path to forecasting some phenomena, such as the spread of disease and invasive species in non-focal taxonomic groups.

NEON Biorepository samples include: pinned, identified insects; dry soils; bulk, unidentified, ground-dwelling invertebrate community samples; frozen small mammal tissue samples

Sample Use. Critically, the NEON Biorepository can be contrasted with many other biocollections in the allowable and encouraged range of sample uses. For example, some sample types are collected for the express purpose of generating important datasets through analyses that necessitate consumption and even occasionally full destruction. Those of us at the NEON Biorepository are working to expedite sample uptake as early and often as possible. While we hope to maintain a decadal sample time series, we also recognize that the data potential inherent within these samples needs to be unlocked quickly to be maximally useful for ecological forecasting and, therefore, to decision making.

Data portal. In addition to providing access to NEON samples, the NEON Biorepository publishes biodiversity data in several forms on the NEON Biorepository data portal (https://biorepo.neonscience.org/portal/index.php). Users can interact with this portal in several ways: learn more about NEON sample types and collection and preservation methods; search and map available samples; download sample data in the form of Darwin Core records; find sample-associated data collected by other researchers; explore other natural history collections’ data collected from NEON sites; initiate sample loan requests; read sample and data use policies; and contribute and publish their own value-added sample-associated data. While more rapidly publishable NEON field data will likely be a first stop for forecasting needs, the NEON Biorepository data portal will be the only source for data products arising from additional analyses of samples collated across different research groups.

Map results for the spatial and taxonomic distribution of NEON mosquito (Culicidae) specimens currently available for use

Exploration of feasible forecasting collaborations. The NEON Biorepository faces both opportunities and challenges as it navigates its role in the ecological forecasting community. As unforeseen data needs arise, the NEON Biorepository will provide the only remaining physical records allowing us to measure relevant prior conditions. Yet, we are especially keen to collaboratively explore what kinds of forecasting challenges are possible to address now, particularly with regards to biodiversity and community level forecasts. And for those that are not possible now, what is missing and how can we collaborate to fill gaps in raw data and analytical methods? Responses to future forecasting challenges will be strengthened by understanding these parameters as soon as possible. We at the NEON Biorepository actively solicit inquiries by researchers motivated to tackle these opportunities, and our special relationship to NEON Biorepository data can facilitate these efforts. Please contact us with questions, suggestions, and ideas at biorepo@asu.edu.

Going Virtual! What we learned from the EFI-RCN Virtual Workshop

Posted on May 21, 2020 by Jody Peters

Date: May 21, 2020

Post by: Jody Peters¹ and Quinn Thomas²

¹ University of Notre Dame, ²Virginia Tech

On May 12 and 13 our NSF-funded EFI Research Coordination Network (RCN) hosted a virtual workshop, “Ecological Forecasting Initiative 2020: Coordinating the NEON-enabled forecasting challenge”. This workshop replaced the three day in-person workshop that was scheduled at the same time, but which was canceled due to COVID-19. Going virtual allowed us to increase our participation and diversity. We were originally space-limited to 65 in-person participants, but with our virtual meeting, we had a little over 200 people register to access the workshop materials, with 150 individuals consistently joining on Day 1 and 110 individuals who consistently participated on Day 2. We also welcomed participants from around the globe with almost 10% of participants calling in from outside the U.S. And instead of being limited to 15 graduate student participants, we ended up with over 50 graduate students who participated in the meeting. While EFI has been using Zoom from the beginning and the EFI-RCN leadership committee members are constantly on Zoom for calls and online courses, this was a much larger gathering than any of us had organized previously. To help others as their workshops are embracing the virtual format, we reflected on the key elements that allowed the workshop logistics and technology flow smoothly. We hope you find our tips useful! If you have any additional questions feel free to reach out to us at eco4cast.initiative@gmail.com.

Thanks to Dave Klinges (University of Florida) who captured these screenshots of 6 screens of Zoom boxes.

Prepping for the Meeting

Get input from many perspectives. There are a number of great suggestions online about hosting virtual workshops. To prepare for the virtual format, multiple leadership committee members took a free 1 hr class on running virtual scientific meetings. You can find the video and slides from the class here https://knowinnovation.com/2020/03/you-too-can-go-virtual/. Alycia Crall from NEON was hugely helpful with ideas like QUBES and Poll Everywhere. Lauren Swanson from Poll Everywhere provided a tutorial on how to use the different features of Poll Everywhere and helped us to test the polls before the workshop. Julie Vecchio, from the Navari Family Center for Digital Scholarship for the Hesburgh Libraries at the University of Notre Dame, shared an example slide deck and script for sharing virtual logistics at the beginning of a workshop. And Google was a great resource for finding additional input along the way.
Scale your goals to the format and your objectives. Our goal for the original in-person meeting was to finalize rules for the NEON Forecasting Challenge but we knew that this was not possible virtually. However, the virtual meeting allowed us to have more people and more perspectives for idea generation. Therefore, our goals shifted to brainstorming so that we could leverage perspectives from the diverse attendees. We now have a ton of work to do synthesizing the input but we have a better pulse of what the community is interested in. Recognizing the challenge of engaging attendees virtually over long periods of time, we reduced the original 3-day in person meeting to a 2-day meeting with a schedule that was conducive to participants from east to west coasts of the U.S.
Virtual meetings require as much or more prep than in-person meetings. Be prepared for a lot of planning before the meeting.

General Meeting Set-up

Don’t go all day. Our first day was 6 hours and the second day was only 4 hours and the hours were set to accommodate people from the U.S. east and west coast time zones. Unfortunately, there is no good time for all global participants, but we were thrilled to see so many participants who woke up early or stayed up late to join us from outside the U.S.
Incorporate plenty of breaks. Virtual meetings are more tiring than in-person meetings. We had two longer 30-minute breaks that corresponded to lunch-times on the U.S. east and west coasts as well as shorter 15-minute breaks spread throughout both days.
Have a production manager for the meeting. This person focuses on set up and running the technical logistics. For example, this person stays in the main room during breakouts to provide assistance and oversee the timing of activities. Having a production manager allows the meeting lead (i.e., project Principal Investigator) to be the M.C. of the meeting and do real time synthesis of the ideas without having to worry about meeting logistics.
Create a minute-by-minute script for the entire meeting. This includes both the public Agenda and the behind the scenes tasks. For example, we wrote out the messages that would be sent through Zoom Chat/Breakout messaging with the time that each message would be sent. You should be able to articulate in writing what is going to happen at every moment of the meeting before the meeting starts and assign who is going to do each task.
Pre-record talks and add edited closed captioning. This prevents issues that come with live talks like bad mics or bad connections. This also keeps the meeting on schedule and avoids the awkward need to cut someone off. We felt the talks were better because they were pre-recorded and, for the talks that presenters agreed to share, we now have an excellent resource for folks that missed the meeting. The pre-recorded talks may require editing, so find someone with resources and time to make edits prior to the meeting. We made playlists for each plenary session available as unlisted videos on YouTube for any workshop participant that had connection issues while the videos were being played.
Be prepared to pay for a closed captioning service so that the meeting is accessible. In the registration form for the meeting, ask if anyone needs CC and if they do, hire a service. We were able to find a service through our university (Virginia Tech) vendor system that worked well (www.ACSCaptions.com). The production manager moved the captioner to be in the same Breakout as those that requested the service. CC is also nice, because you get the full record of text right after the meeting, instead of waiting for the Zoom transcript to come through, plus the captioner’s transcription is better than the automatic Zoom transcript.
Use hardwired internet. Our production manager/meeting host used a computer that was connected to the internet via a wire – this will reduce the chance that the central person loses connection.
Plan for leadership team meetings during the workshop. The leadership committee met for 1 hour before and 30 minutes after the meeting each day to go over last minute logistics and any adjustments that were needed. Set up a separate Zoom meeting for these calls to avoid participants joining at times when you are not prepared for them.

Zoom worked great. While we know that there are other conferencing platforms, we used Zoom Meeting with a 300 person limit, hosted through the University of Notre Dame. We chose Zoom Meeting over Zoom Webinar, because we wanted the ability for workshop participants to interact during breakouts. Plus it was provided by the University, and did not require the additional set-up or payment that Zoom Webinar required. It worked very well. There were some individuals that could not access Zoom. Therefore, we also streamed the workshop from Zoom to YouTube and shared the YouTube live link with individuals in our group who had registered for the workshop materials.
But Zoom can break communication lines among the host and leadership committee. Have an off Zoom and off computer way for the leadership team to communicate throughout the meeting (like text messaging to phones). It is important to turn off notifications on the host/co-hosts computer due to screen sharing and sounds, but that can leave the production manager or leadership team flying blind unless there is an alternative way to communicate. Leadership committee members that are in Breakout Rooms are unable to message the host in Zoom.
Assign leadership committee members as co-hosts. Assign all leadership members as co-hosts and have them mute people who are not talking but have background sounds. Leadership members can also help with spotlighting the speakers and can also move from breakout room to breakout room if needed to check on how things are going.
Give a brief Zoom training at the beginning. At the beginning of the workshop, use a slide deck (and a written out script to go with it) to introduce all the features of Zoom you want people to use. While many of us use Zoom regularly, not everyone is on Zoom all the time, and it is important that these folks feel comfortable so they can fully participate.
Play videos directly from the production manager/meeting host’s computer. Make sure the videos are downloaded onto your computer hard drive and play them from there. We used a playlist that automatically advances to the next video. In Zoom’s screen share settings, make sure to click both the “Share computer sound” and “Optimize Screen Sharing for Video Clip” options. Do not play videos in Zoom from YouTube to avoid the video having to be played over multiple web services.

Zoom Breakout Rooms

Keep Breakout Rooms small. To make the meeting feel smaller we only had a max of 9 people per breakout room. Using random sorting, as we did on Day 1, was a great way to meet different people throughout the group. We built in time for introductions during the breakouts because one of our goals was community building
Clear and easy to find Breakout instructions. Have specific and easy to find instructions for each Breakout session. If possible try to spread the leadership team among different Breakout Rooms. In practice, this is hard because the production manager has to find them in the list of random Breakout Rooms and reassign (but see point 4 below and have the leadership members rename themselves). In reality, specific instructions that aren’t too complex will allow Breakout Rooms to work fine without a member of the leadership team. Our instructions were located in an easy to find place on the meeting website.
Prepare for providing assistance getting into Breakout Rooms. Some people may not see their notification to join a Breakout Room pop up, so the production manager may need to walk them through that. Include a screenshot in the introductory Zoom instructions of what the Breakout Room assignment notification looks like so people know where to look.
Give extra time if using manually assigned (non-random) Breakout Rooms. Manually sorting Breakout Rooms takes longer to organize so be sure to include the sorting time in your plans. The assigning and sorting can be done at any time during the plenary, it does not need to happen right before the Breakout Rooms open. Use the renaming feature of Zoom to ease the sorting. If everyone changes their Zoom name (under the Participants tab) to start with their group name or number (e.g., A1 Jody) it is much much easier for the production manager to sort.
Create extra Breakout Rooms. When setting up the manually sorted Breakout Rooms, create additional rooms that may stay empty. There may be groups that want to breakout further and if you did not create an extra room when you set up the manually sorted rooms, these additional rooms cannot be added after the rooms are opened.
Character limits for messages to Breakout Rooms. There is a character limit for the messages that can be sent to the Breakout Rooms, so keep them short.
Breakout Rooms are unable to communicate with the production manager. When individuals are in the Breakout Rooms, they cannot use the Zoom Chat to communicate with the production manager in the main room or anyone else in the workshop who is in another Breakout Room. This is important to mention in the introductory Zoom instructions.

Communication throughout the Workshop: Poll Everywhere and QUBES

Create a means for engagement. It is important for attendees to feel like they are involved so that the workshop isn’t a one-way delivery of information. We used an educational account of Poll Everywhere through the University of Notre Dame to help promote participation throughout the workshop in multiple ways, including brainstorming ideas with word clouds, submitting questions and voting on priority questions for panel members, and brainstormed priorities that also could be voted on.
Define use of communication tools. Use the Zoom Chat for logistics and supplemental information and Poll Everywhere for Q&A. That way questions on science do not get lost in questions about links or timing etc. We also were able to download all the questions for the panelists to get their feedback on any questions that we did not have time for during the Q&A sessions. We will share this feedback with the workshop participants in the next month.
Centralize meeting materials. We used QUBES as a platform to organize and share materials easily in a centralized location. The EFI-RCN QUBES site worked well because it was free (thanks NSF and Hewlett Foundation), easy to set up, and we were able to include links to videos, surveys, Zoom login, google documents, papers, etc. all in one place.

Introducing EFI Task Views!

Posted on May 4, 2020 by Jody Peters

Date: April 20, 2020, updated June 29, 2020 and December 16, 2022

For individuals new to the field of ecological forecasting it can feel like there are an overwhelming number of methods and tools to learn and implement. On the other hand, individuals who have been forecasting for some time may want to know if there are any additional tools that others have found useful. In a series of 4 blog posts, the Methods and Cyberinfrastructure EFI Working Groups will highlight common tasks in ecological forecasting and methods and tools to help with those tasks.

Today’s post will cover Reproducible Forecasting Workflows. Other Task Views focus on

Modeling & Statistical resources, including Uncertainty Quantification & Propagation
Data Ingest, Cleaning, and Management
Visualization, Decision Support, and User Interfaces.

The tasks and associated tools will be included in each of the four blog posts as well as kept on easily accessible and periodically updated web pages linked off the EFI Resources, Methods & Tools, and Cyberinfrastructure pages.

Resources and tools listed in the four categories of tasks are meant to be living documents. This list is not meant to be a comprehensive overview of all possible resources, as there are some tasks where there are hundreds of different tools available. Instead we focus on commonly used tools. However, if there are often used tools and resources we are missing, we welcome input from anyone — suggestions can be shared using this Google Form.

On the short-term scale, our goal is to provide the Task Views as resources to the ecological forecasting community. In the long-term, we want to supplement these resources with gap analyses to determine where there are unmet needs for generalizable tools (i.e. methods are known but tools don’t exist) versus where methods don’t exist and there’s a need for new research on statistical methods or cyberinfrastructure.

Reproducible Forecasting Workflows.

This material can also be found on the Reproducible Forecasting Workflows Task View Page.

Curators: Jacob Zwart¹, Alexey Shiklomanov², Kenton McHenry³, Daniel S. Katz³, Rob Kooper³, Carl Boettiger⁴, Bryce Mecum⁵, Michael Dietze⁶, Quinn Thomas⁷

¹USGS, ²NASA, ³National Center for Supercomputing Applications at the University of Illinois Urbana-Champaign, ⁴University of California, Berkeley, ⁵National Center for Ecological Analysis and Synthesis, ⁶Boston University, ⁷Virginia Tech

Overview
Scripted Analyses
Project Structures
Version Control
Literate Programming
Workflows and Dependency Management
Unit Testing
Continuous Integration and Automation
Containerization
Metadata
Data and Code Release

Overview

Reproducibility⁸ of scientific output is of utmost importance because it builds trust between stakeholders and scientists, increases scientific transparency, and often makes it easier to build upon methods and/or add new data to an analysis. Reproducibility is particularly critical when forecasting, because the same analyses need to be repeated again and again as new data become available, often in an automated workflow. Furthermore, reproducible workflows are essential to benchmark forecasting skill, improve ecological models, and analyze trends in ecological systems over time. All forecasting projects, from single location forecasts disseminated to only a few stakeholders to daily-updating continental-scale forecasts for public consumption, benefit from using tools and techniques that enable reproducible workflows. However, the project size often dictates which tools are most appropriate, and here we give a brief introductory overview of some of the tools available to help make your ecological forecast reproducible. There are many more tools available than what we describe here, and we primarily focus on open-source software to facilitate reproducibility independent of software access.

⁸Reproducibility is the degree of agreement among results generated by at least two independent groups using the same suite of methods. Many of the tools highlighted here facilitate repeatability, which is a measure of agreement among results from a single task that is performed by the same person or tool. Repeatability is necessary for reproducible forecasting workflows.

Scripted Analyses

Forecasts produced without using scripted analyses are often inefficient and prone to non-reproducible output. Therefore, it is best to perform data ingest and cleaning, modeling, data assimilation, and forecast visualization using a scripted computing language that perform tasks automatically once properly configured.

Interpreted languages allow the user to execute commands line-by-line, interactively, in real time. This makes debugging and exploratory analysis much easier, and significantly reduces programmer time for performing analyses. Analyses using interpreted languages are also usually easier to reproduce because of fewer installation/configuration steps (and more standardized, centralized installation mechanisms). This convenience generally comes at the expense of computational speed, but many times the tradeoff is worth it.
- R – Originally developed for statistical computing, and still primarily used for data science and other scientific computing tasks. Many important data science tools, including statistical distributions, plotting, and tabular data analysis, are included in the core language. Tens of thousands of add-on packages for just about any task imaginable exist.
- Python – General purpose programming language with a much more limited set of core features than R. Many data-science features are accessible through add-on packages and are curated through repositories such as PyPi, Anaconda, and Enthought.
- Julia – Very recent language. Claims to combine the ease-of-use of interpreted languages like R and Python with the performance of compiled languages like C and Fortran. Specifically relevant to forecasting and uncertainty propagation, Julia has extremely powerful probabilistic programming tools (e.g. Turing for Bayesian inference, Flux for machine learning).

Compiled languages generally perform computationally intensive tasks much faster (up to 80x or more) than interpreted languages. However, their syntax is generally stricter / less forgiving, and analyses have to be written as complete programs that are compiled in operating system-specific (and even machine-specific) ways. In general, these languages should be avoided in favor of easier-to-use interpreted languages unless you are addressing specific computational bottlenecks. Note that all of the interpreted languages above provide ways to call specific functions/subroutines written in these compiled languages, so you have the option to only use these routines for specific, computationally-limiting steps in your analysis. Commonly used compiled languages include:
- C
- C++
- Fortran

A good standard is to develop an analysis using an interpreted language first and assess if it is fast enough for your needs. If it is fast enough, then you are done! If not, do some basic profiling to identify performance bottlenecks. See if there are existing tools or techniques in the language you are using that can help address the bottlenecks. Only fall back on compiled languages if you’ve reasonably exhausted possibilities using your current language.
List of programming languages

Back To Contents

Project Structure

Organized project structures help the scientist and collaborators navigate the workflow of the ecological forecasting project from data input to dissemination of results. Subfolders should be used to break up the project into conceptually distinct steps of the forecasts and sequentially numbering of scripts and subfolders helps with readability, for example, “10_data”, “20_clean”, “40_forecast”, “60_visualize”, “95_report” (see more detailed example below). The number prefixes should represent a conceptual workflow for each forecasting project, and subdirectories within each phase of the project should describe the inputs, outputs, and functions for each step. Generally, unnumbered directories should contain supporting files that apply to the overall project, for example a configuration file that is used in multiple phases of the forecasting project.

*Example of organized folder and file structure for a forecasting project.*

Tools for organized project structures

Project-oriented workflows are self-contained workflows enabling reproducibility and navigability when used in conjunction with organized project structures for ecological forecasting projects. Ideally, a collaborator should be able to run the entire project without changing any code or files (e.g. file paths should be workstation-independent). R and Python both have options for enabling self-contained workflows in their coding environments.
- R – RStudio projects – R projects allow for analyses to be contained in a single working directory that can be given to a collaborator and run without changing file directory paths.
- Python – Spyder projects – Python projects also allow for self-contained analyses and integration with Git version control (see Version Control below).

Back To Contents

Version Control

Version control is the process of managing changes in code and workflows and improves transparency in the code development process and facilitates open science and reproducibility. Code versioning also enables experimentation and development of code on different “branches” while retaining canonical files that can be used in operations, for example. Modern version control systems make it easy to create and switch between branches within a code base, encouraging developers to experiment without potentially breaking changes without worrying about losing stable code. This is especially useful for forecasting projects that need to make forecasts at regular schedules (e.g. daily), while researchers can also make alterations to the code base on experimental branches. Finally, version control facilitates collaboration by formalizing the process for introducing changes and keeping a record of who introduced which changes, when, and why. Additionally, version control allows contributions in an open way from even unknown contributors with the opportunity for the main authors control which contributions are accepted. Software development is a trillion dollar industry and it is well worth the time learning the basics of industry standard tools like version control, rather than relying on ad hoc and error prone approaches such as file naming (e.g. script.v2.R, python_script_final_FINAL.py), Dropbox/Google Drive, or emailing files to collaborators.

Tools for version control

The distributed model of version control is where developers of code work from local repositories which are linked to a central repository. This enables automatic branching and merging, improves the ability to work offline, and doesn’t rely on a single repository for backup.

Git is the most popular open-source version control system among ecologists and also professional software developers. The popularity enables contributions from many collaborators since potential contributors will likely be used to using Git and web interfaces like GitHub.
- You can practice using Git for version control with some simple tutorials.
- Rstudio has integrated support for GitHub
- Spyder projects (Python) and Integrated Develop Environments such as PyCharm and Visual Studio Code have integration with Git.
- GitLab also uses Git, and similar to GitHub allows for issue tracking and various other project management tools, and GitLab provides more options for collaborator authentication.

*Example of version control workflow using Git. Figure from* *here*.

List of other version control programs

Back To Contents

Literate Programming

Traditionally, scientific writing and coding are separate activities—for example, a researcher who wants to use code to generate a figure for her paper will have the code for generating that figure in one file and the document itself in another. This is a challenge for reproducibility and provenance tracking because both criteria have to be maintained for multiple files simultaneously. “Literate programming” provides an alternative approach, whereby code and text are interleaved within a single file; these files can be processed by special literate programming software to produce documents with the output of the code (e.g. figures, tables, and summary statistics) automatically interspersed with the document’s body text. This approach has several advantages. For one, the code output of a literate programming document is by definition guaranteed to be consistent with the code in the document’s source. At the same time, literate programming can make it easier to develop analyses by reducing the separation between writing and coding; for instance, interactive literate programming software can be used to keep “digital lab notebooks” where analyses are developed and described in the same file. In the context of ecological forecasting, literate programming techniques can be particularly useful for writing forecast software documentation, and can even be used for creating automatically-updating documents and reports describing forecast output.

Tools for literate programming

Two effective and common tools for literate programming are:

R Markdown — Allows code from multiple different languages including R, Python, SQL, C, and sh to be embedded within a common markup language (Markdown). Multiple different languages can be embedded within different blocks in the same document. Documents can be exported to a wide range of formats, including PDF, HTML, and DOCX. By default, R Markdown documents are static (i.e. the entire document is rendered all at once with a command); however, recent versions of RStudio allow them to be used interactively by rendering specific code blocks directly in the code editor window. R Markdown documents compiled to HTML format can easily embed interactive elements ranging from clickable plots and subsettable tables (e.g. htmlwidgets) to full applications with user-defined inputs (via RShiny); for more information, stay tuned for our follow up task view on Visualization.
Jupyter — Unlike R Markdown, these were designed from the start to be used interactively. Documents are stored in a format that makes them difficult to edit with a plain-text editor; rather, they are typically edited using a special browser-based editor that runs a language “kernel” in the background. The results of any particular code block are stored across sessions, so code blocks do not need to be re-evaluated when exporting to other formats. A document can only use a single language, with Julia, Python, and R supported.

Back To Contents

Workflows and Dependency Management

Workflows are typically high-level descriptions of sets of tasks to be performed as part of an overall scientific application, at least in the context of this blog. There are a wide variety of methods and formats for expressing such descriptions. Workflows must include information about the tasks themselves, as well as their inputs and outputs, which either implicitly define or explicitly state dependencies among the tasks. This information, including the dependencies, is used by a Workflow Management System (WMS) to execute the tasks, potentially 1) on a local computer or one or more remote computers, including clouds and HPC or HTC systems; 2) serially or in parallel; 3) from the start or from a previous partially complete state. These dependencies can be static (fully defined before the application is run) or dynamic (e.g. partially defined based on data, execution, or other resources).

Workflow management systems help efficiently reproduce portions of or entire scientific workflows. These tools analyze workflows, skip phases of the workflow that are up-to-date (if the exact inputs and tasks have been run previously, the previous outputs can be returned; this technique is sometimes called memoization), and execute tasks that are out-of-date, tasks downstream of out-of-date tasks, or tasks required to execute based on scheduled run times (e.g., daily-updating forecast). These tools are especially useful for large projects that bring multiple streams of data together in an analysis since it relieves the analyst from duties of keeping track of workflow order and tasks that need to be rerun. For example, when new data about a model parameter is included in the forecasting workflow, only the portion of the workflow dependent on that new data will be executed.

*Example of a simple dependency graph and which tasks will be executed using a workflow management system (from* *Drake workflow example*).

Below we list a few tools for workflows and dependency management. There are however many other workflow and dependency management tools. A larger list can be found here.

Drake is an R-based ‘make’ like toolkit that tracks dependencies among phases of your workflow and executes work that is out-of-date. Drake builds upon previous R dependency managers such as remake, and can deal with high-performance or -throughput computing (HPC / HTC) within the WMS framework. This includes automated detection and retries for model failures, and launching Slurm (or other job schedulers for HTC) jobs directly from a drake plan.
- Video tutorial and other Drake resources
Snakemake is a Python-based workflow management tool that includes a lot of the same functionality as Drake for R, including being compatible with HPC / HTC or cloud computing environments. The rules defined in a Snakemake target can use shell or Python commands or run external Python or R scripts, as well as utilize various remote storage environments such as Amazon S3, Dropbox, or Google Storage.
Parsl is a Python library that lets users define a workflow through a Python program, or parallelize a Python program. They do this by ‘decorating’ the definition of Python functions and calls to external applications to indicate that they are potentially parallelizable and asynchronous tasks. When such a task is called, Parsl intercepts it and adds it to an internal dynamic directed acyclic graph that captures the overall application dependencies. If both the inputs for the task and execution resources are available, the task is run, and if not, it waits until these conditions are satisfied. In either case, Parsl immediately returns a ‘future’, a placeholder for the eventual return value, so that the overall application can proceed, which allows multiple tasks to run in parallel. Parsl is an open source project led by U Chicago & Illinois, which supports a wide variety of execution resources (e.g., local, CPUs, GPUs, HPC, HTC, cloud) and schedulers.
Pegasus is another scientific workflow system with a long history of development and use in the science world (e.g., it’s the workflow system used by LIGO)
Argo is a more recent kubernetes-based workflow system, convenient when much of the workflow is within docker already (see Containerization below).
Airflow is another workflow system, developed and used by AirBnB and others, mostly in industry. Airflow is now a project within the Apache Software Foundation. It allows a user to author workflows as Directed Acyclic Graphs (DAGs) of tasks. The Airflow scheduler executes the tasks on an array of workers while following the specified dependencies. It also has a user interface to allow the user to visualize pipelines running in production, monitor progress, and troubleshoot issues.

Back To Contents

Unit Testing

Ecological forecasting workflows can be complex and involve many steps from data ingest, data cleaning, producing forecasts, to visualizing output. Often, these forecasting workflows need to produce output on a regular schedule and ensuring that each part of the workflow performs appropriately is crucial for making forecasts and identifying failure points, whether operational or not. Unit testing is automated tests on small units within a larger workflow to ensure that the different sections behave as intended (e.g. testing that individual functions return the expected outputs for valid inputs and the expected errors for invalid inputs). Frequently unit tests are also used for regression testing, where a test is created for a previous bug or problem that is fixed. The regression test is used to prevent a bug from being reintroduced. In combination with continuous integration (see below), these tests ensure that modifications to a code base run as expected after the modifications have been integrated.

In case of complex workflows or systems, a unit test will only test to make sure each of the components are working as intended. Additionally an integration or system test will need to be performed at certain points to test all the components interacting with each other. For example does each component still produce the outputs expected by the next steps in the workflow.

Tools for unit testing

Most programming languages have a testing framework that will help with the unit tests. A list of tools here, some of the commonly used testing frameworks for tools used in forecasting are:

testthat for R, including examples of how to implement unit testing in R.
pytest for Python

Back To Contents

Continuous Integration and Automation

Both the models we use to make predictions and the forecasting workflows we build around them are, in some sense, always a work in progress. Any time we make changes to our models and workflows, whether it’s updating a library or adding a data source, there’s a chance that we’ll break our workflow. Tools for continuous integration enable researchers to update their forecasts and run tests on their code in an automated and robust manner (e.g. with system tests in place to check for accidental deployments that would otherwise break a deployment). Continuous Integration (CI) tools automatically builds and deploys software ecosystems, and tests new versions of code to ensure development of models will work. This is especially important for iterative forecasts that need to be deployed at regular intervals (e.g. daily forecasts). As CI tools continue to become more powerful, flexible, and generous with their service offerings, they can expand from supporting development workflows to even be used as the primary platforms for application workflows, such as iterative, real-time forecasting. Below we list few of these tools and a larger list can be found here or here:

Travis CI, Probably the most popular automated testing tool on GitHub, at least in the recent past. This service is designed to test builds and run unit tests (and other, short-lived scripts) on a variety of different virtual platforms with different configurations. Travic CI runs for free on its Travis CI servers, but has time and CPU limits (at least for the free version (though a user can request that these limits be increased). Some features include the ability to run actions in parallel (configured via a YAML file) and an ability to be accessed via an API.
GitHub Actions, similar to Travis CI, but hosted natively by GitHub and with more generous time, memory, and CPU allowances for open-source (public) projects on GitHub. GitHub Actions is quickly increasing in popularity.
GitLab CI, similar to Travis and GitHub Actions but hosted by GitLab.
Circle CI, similar to Travis and GitHub Actions.
Jenkins, a locally run alternative that you can deploy on your own servers.

Back To Contents

Containerization

Complex scientific workflows often involve combining multiple different tools written in different programming languages and/or possessing different software dependencies. Even a simple R script may depend on multiple R packages, and may only work as expected if specific versions of those packages are used. Managing these different tools and their dependencies can be a complex task, especially when tools conflict with each other (e.g. one tool may only work with an older version of a library, while another tool may only work with a newer version of the same library). As the number of tools and their dependencies in a workflow grows, managing these dependencies becomes challenging, and reproducing this workflow on a different machine (potentially with a different operating system) is even more challenging. Containers resolve these issues by providing a way to create isolated packages for each software element and its dependencies. These containers can then run on any computing environment (as long as it has the requisite container software itself installed). Moreover, containerization software sometimes allows for the creation of container stacks (a.k.a “orchestration”)— collections of multiple containers that communicate with each other (including sharing data) and with the host system in precise, user-defined ways (see Workflow and Dependency Management above). In some cases, these container stacks can be deployed across multiple physical or virtual computers, which greatly facilitates the process of scaling computationally intensive analyses.

Tools for containerization

By far the most common tool for containerization — indeed, the emerging standard across the software development industry — is Docker. Docker containers are typically created from a definition file, basically just a starting container (e.g. a specific version of a Linux operating system) followed by a list of shell commands describing the installation and configuration of the specified software and its dependencies. Thousands of existing containers (any of which can be used as a starting point for a custom container) for a wide range of software are available on Docker Hub, a publicly available registry. Software stacks and workflows using multiple containers can be created via Docker Compose, which automatically configures and runs multiple interrelated Docker containers from a human-readable (YAML) specification file. Several tools for orchestration of Docker containers exist — Docker Swarm is distributed as part of Docker (i.e. no additional installation) and allows for rapid deployment with minimal configuration, while Kubernetes is a much more complex but feature-rich solution. Another quickly maturing tool leveraging Docker is The Binder Project, which is a relatively easy to use tool that turns a Git repository into a Docker image for deploying a reproducible computing environment in the cloud.

Unfortunately, Docker’s design precludes its use on high-performance computing clusters and other enterprise-managed machines often encountered in the sciences. In particular, running Docker containers requires running a persistent background process with administrative (“root”) privileges on the host machine. This is not an issue on self-managed, isolated physical (e.g. your personal laptop) and virtual (e.g. Amazon Web Services) machines. However, it does pose a major security concern that precludes its use on high-performance computing clusters and other enterprise-managed machines often encountered in the sciences. Singularity is an alternative that was designed specifically to address these concerns. Unlike Docker, Singularity does not require a persistent background process to run — rather, its design involves creating containers that are fully self-contained executable files. These files can then be distributed just like any other files, and executed on any machine (as long as that machine has a compatible version of Singularity installed). The initial install of Singularity, as well as the creation of containers, does require root permissions, but unlike Docker, the containers themselves run as a single process with only user permissions. Besides the security implications, this design also makes Singularity containers more amenable to HPC queue submission systems (running the containers is effectively the same as running any other executable). Like Docker, Singularity containers can be created via a definition file, and can be stored on a free, publicly available registry (Singularity Hub). The major downside of Singularity is that it has a much smaller user base (largely limited to a small subset of the scientific community, compared to Docker’s widespread use in both science and industry), and is much less mature software. For example, while Singularity does provide a “Compose” interface, as of this writing this is still in early development and highly experimental. Singularity also works with Kubernetes.

Back To Contents

Metadata

Metadata provide crucial information on the ecological forecasting data, including model input, output, and parameters, among others. Metadata tells the user how to interpret model output and what conditions are needed to reproduce output. Metadata is also used to describe the size and dimensions of the dataset, quality of the data, author of the data, keywords of the project used to produce the data, and details on how the data were produced. Appropriately documenting ecological forecasting output helps other researchers find relevant datasets and reuse output for other applications such as input to other models, or cross-model comparison such as a forecasting challenge.

Tools for metadata

Ecological Metadata Language (EML) is a community-maintained project for documenting research data with a readable XML markup syntax. EML serves the needs of the research community and is modularly designed to enable growth in the language as the needs of the earth and environmental sciences evolve. The Ecological Forecasting Initiative has developed additional forecast-specific standards using EML as the base metadata standards. The EML R package facilitates generating an EML document, however, these documents can also be created using a text editor or other scripting languages such as Python.
EFI is in the process of drafting an ecological forecasting metadata standard that extends EML. Current info is located in our forecast-standards repo
Many other metadata standards can be found here.

Back To Contents

Data and Code Release

A core principle of creating reproducible scientific workflows is making the code and data used in the analyses available to the public through data and code publication or releases. It is now often required by journals or institutions to publish the data used in scientific publications and to a lesser extent, the code used in the analyses. Many of the other reproducible principles described above enable efficient data and code release and publication. For example, remote version control repositories, such as GitHub, display developmental and stable code bases and can tag versions of code to be released along with details on what the version was used for (e.g. “v1.2.1 used in analyses described by Dasari et al. 2019”). These code releases can also become citable with digital object identifier (DOI) by connecting with other archiving tools. Data releases should also be relatively painless if the previous principles of reproducible workflows are followed. Key to data releases and publishing in repositories are descriptive metadata that describe important characteristics of the dataset that is to be published (see Metadata section above). Additionally, embedding data publishing tasks (e.g. metadata descriptions, pushing data to a remote repository) in a dependency management system (see above) can make updating data in a public repository as easy as executing one line of code.

Tools for data and code release

Zenodo – a general purpose open-access repository, often used with GitHub to publish software.
DataOne – a repository for environmental and ecological data.
Dryad – a general purpose repository for research data.
Environmental Data Initiative – a repository for environmental data.
ScienceBase – an open-access data repository maintained by the US Geological Survey.
Open Science Framework (OSF) – an open-access repository for research data.
Software Heritage Archive – a repository of open-access software.
Registry of Research Repositories – a collection of information about > 2000 research data repositories.

Back To Contents