EFI RCN NEON Ecological Forecast Challenge

The NSF funded EFI Research Coordination Network (EFI-RCN) is hosting a NEON Ecological Forecast Challenge with the goal to create a community of practice that builds capacity for ecological forecasting by leveraging NEON data products. The Challenge revolves around the five theme areas listed below that span aquatic and terrestrial systems, and population, community, and ecosystem processes across a broad range of ecoregions that uses data collected by NEON.

As a community, we are excited to learn more about the predictability of ecological processes by forecasting NEON data prior to its release.  What modeling frameworks, mechanistic processes, and statistical approaches best capture community, population, and ecosystem dynamics? These questions are answerable by a community generating a diverse array of forecasts.  The Challenge is open to any individual or team that wants to submit forecasts and includes categories for different career stages. Individuals or team contacts can register to submit forecasts HERE.

The design of the Challenge is the result of contributions of over 200 participants in the May 2020 virtual EFI-RCN meeting, including partner organizations, and the hard work from the Design Teams that have developed the protocols for each of the themes.

Computational resources are supported by NSF funded CyVerse, Jetstream, and XSEDE.

Year 1 Challenge Themes

Aquatic Ecosystems

WHAT: Freshwater temperature and dissolved oxygen

WHERE: 1 lake and 1 river NEON sites

WHEN: Daily forecasts for 7 days at the beginning of the month and submitted monthly from May 31-August 2021; later submissions after the May 31 strat are permissible

WHY: Temperature and oxygen are critical for life in aquatic environments and can represent the health of the system

WHO: Open to any individual or team that registers

HOW: REGISTER your team and submit forecast

OVERVIEW:
In streams and rivers, forecasting water temperature can be meaningful for protecting aquatic communities while maintaining socio-economic benefits (Ouellet-Proulx et al. 2017). In lentic systems, successfully forecasting surface water temperatures can be important for fisheries and water utilities that need to manage the outflowing temperatures (Zhu et al. 2020). Recently, water temperature forecasts in lakes have been used to predict seasonal turnover when nutrients from the bottom can be mixed to the surface and impair the water quality.

Dissolved oxygen concentration is a critically important variable in limnology. Forecasts of dissolved oxygen in freshwaters is the first step to understanding other freshwater ecosystem processes. For example, oxygen serves as the gatekeeper to other biogeochemical reactions that occur in rivers and lakes. Preemptive forecasts of dissolved oxygen concentrations can anticipate periods of high or low oxygen availability, thereby providing insight into how the ecosystem may change at relatively short timescales.

CHALLENGE:
This design challenge asks teams to produce forecasts of mean daily surface water temperature and/or dissolved oxygen in one NEON lake and/or one NEON river site in the Southeastern U.S. 35 days from the first of the month. The NEON lake site is Barco Lake (BARC) in Florida and the NEON river site is Flint River (FLNT) in Georgia. Each forecast will start on the 1st of each month and must forecast up to 35 days into the future at minimum. Forecasts are welcome to go past the 35 day timeline but those dates will not be evaluated.

Teams are asked to submit their forecast of NEON surface water temperature and dissolved oxygen measurements for one month at a time (prior to that month’s data being released), along with uncertainty estimates and metadata. Any surface water temperature and dissolved oxygen prior to the month being forecasted will be provided and may be used to build and improve the models used to generate forecasts. Other data can be used so long as they are not from the month being forecast, that they are publicly available, and that teams provide access (minimum of URL, but ideally a script) to all teams in the challenge.

Submissions of forecast and metadata will be through https://data.ecoforecast.org/minio/submissions/ using prescribed file formats described in the challenge theme documentation (PENDING).

Forecasts will be scored and compared using the Continuous Ranked Probability Score, a metric that combines accuracy and uncertainty estimation (Gneiting, T., & Raftery, A. E., 2007).

DATA: TRAINING & EVALUATION:
The challenge uses the following NEON data products:
DP1.20264.001: Temperature at specific depth in surface water
DP1.20288.001: Water quality

A file with previously released NEON data that has been processed into “targets” will be provided. The same processing will be applied to new data that are used for forecast evaluation. Before the Aquatics challenge begins, a processing script will be available in the neon4cast-aquatics GitHub repository.

TIMELINE:
The timeline is determined by the data latency provided by NEON. NEON data is released in month long sets, 2 weeks after the month ends.

NEON data for a given month is scheduled to be released around the 15th of the following month. Once the NEON data for a previous month is released, teams have between the release of those data to the end of the month to forecast the current month.

The submissions will begin May 31, 2021 at 11:59 Eastern Standard Time (UTC−05:00) for forecasts that start May 1. They end on August 31, 2021 at 11:59 Eastern Standard Time (UTC−05:00) for forecasts that start August 1.

As an example, if NEON water temperature data is released on April 15 for data from March 1 - 31, teams then can use these new March data to help generate forecasts from April 1 - May 5 (35 days). This April forecast is due by 11:59 pm EST on April 30. The forecast issue date for the April forecast is April 1, so no new observational data from after that date can be used to constrain forecasts and the forecast should use the weather forecast issued at midnight April 1 (i.e. start of day) as the driver (not the observed meteorology in April or forecasts made at later dates).

Evaluation will occur as new NEON data is released.

Here is a draft of the details of the targets, how they are calculated, descriptions of the target files, and examples of other environmental variables that could be used in the Challenge. The design team is currently working on this draft and will have a finalized version by December 9, 2020.

DESIGN TEAM:
James Guinnip, Kansas State University
Sarah Burnet, University of Idaho
Ryan McClure, Virginia Tech
Chris Brown, National Oceanic and Atmospheric Administration
Cayelan Carey, Virginia Tech
Whitney Woelmer, Virginia Tech
Jake Zwart, United States Geological Survey

PARTNERS:
The challenge is hosted by the Ecological Forecasting Initiative (EFI; https://ecoforecast.org/) and its U.S. National Science Foundation sponsored Research Coordination Network (EFI-RCN; https://ecoforecast.org/rcn/).

Data used in the challenge are from the National Ecological Observatory Network (NEON): https://www.neonscience.org/.

Scientists from NOAA and USGS have been involved in the design of the challenge.

REFERENCES:
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437

Ouellet-Proulx, S., St-Hilaire, A., and Bouchar, M.-A.. 2017. Water temperature ensemble forecasts: Implementation using the CEQUEAU model on two contrasted river systems. Water 9(7):457. https://doi.org/10.3390/w9070457

Zhu, S., Ptak, M., Yaseen, Z.M., Dai, J. and Sivakumar, B. 2020. Forecasting surface water temperature in lakes: a comparison of approaches. Journal of Hydrology 585, 124809. https://doi.org/10.1016/j.jhydrol.2020.124809

Terrestrial Carbon and Water Fluxes

WHAT: Net ecosystem exchange of CO2, evapotranspiration, and soil moisture

WHERE: 4 NEON sites that span a water stress gradient in the U.S.

WHEN: Half hour and daily forecasts for 35 days into the future submitted once per month January 31-December 31, 2021; later submissions after the January 31 start are permissible

WHY: Carbon and water cycling are foundational climate and water regulation services provided by ecosystems

WHO: Open to any individual or team that registers

HOW: REGISTER your team and submit forecast

OVERVIEW:
The exchange of water and carbon dioxide between the atmosphere and the land is akin to earth’s terrestrial ecosystems breathing rate and lung capacity. The water available to plant roots plays a critical role in plant function, and subsequently represents a predominant source of uncertainty for predictions of how much carbon is entering or exiting an ecosystem. One of the best ways to monitor changes in the amount of carbon and water in an ecosystem is the eddy-covariance method. This method observes the net amount of carbon and water entering and exiting ecosystems at half-hourly timesteps, which is important because it can provide information on ecosystem processes such as photosynthesis, respiration, and transpiration, their sensitivities to ongoing climate and land use change, and greenhouse gas budgets for carbon accounting and natural climate solutions. Forecasts of carbon uptake and release, water use, and soil moisture can provide insights into future production of food, fiber, timber and carbon credits along with the influence water stress has on these processes.

CHALLENGE:
This design challenge asks teams to produce forecasts of net ecosystem exchange of carbon dioxide (NEE), latent heat flux of evapotranspiration (LE), and soil moisture across four NEON sites with differing climates. These target variables are important because they can be used to inform energy budgets and further reduce uncertainty in the CO2 sink or source behavior of the terrestrial biosphere.
This forecasting challenge asks teams to forecast NEE, LE, and soil moisture at either the 30-minute or daily time step over the next 35-days using NOAA Global Ensemble Forecast System weather forecasts as drivers (if forecasting model uses meteorological inputs). Monthly forecasts can be submitted for each month in 2021. The challenge will take place using the eddy covariance flux towers at 4 NEON sites: Bartlett Experimental Forest (BART), Konza Prairie Biological Station (KONZ), Ordway-Swisher Biological Station (OSBS), and Santa Rita Experimental Range (SRER).

Users are asked to submit their forecast of NEON data measured NEE, LE, and soil moisture for one month at a time (prior to that month’s data being released), along with uncertainty estimates and metadata. Any NEE, LE, and soil moisture data prior to the month being forecasted will be provided and may be used to build and improve the models used to generate forecasts. Other data can be used so long as they are not from the month being forecast, that they are publicly available, and that teams provide access (minimum of URL, but ideally a script) to all teams in the challenge.

Submissions of forecast and metadata will be through https://data.ecoforecast.org/minio/submissions/ using prescribed file formats described in the challenge theme documentation (PENDING).

Forecasts will be scored and compared using the Continuous Ranked Probability Score, a metric that combines accuracy and uncertainty estimation (Gneiting, T., & Raftery, A. E., 2007).

DATA: TRAINING & EVALUATION:
The challenge uses the following NEON data products:
DP4.00200.001: Bundled data products - eddy covariance
DP1.00094.001: Soil water content and water salinity

A file with previously released NEON data that has been processed into “targets” will be provided. The same processing will be applied to new data that are used for forecast evaluation. Before the Terrestrial Carbon and Water Flux challenge begins, a processing script will be available in the neon4cast-terrestrial GitHub repository.

TIMELINE:
The timeline is determined by the data latency provided by NEON. NEON data is released in month long sets, 2 weeks after the month ends.

The challenge will begin January 31, 2021 at 11:59 Eastern Standard Time (UTC−05:00) and run through December 31, 2021. Subsequent forecasts are due at 11:59 EST on the final day of each month.

NEON data for a given month is scheduled to be released around the 15th of the following month. Once the NEON data for a previous month is released, teams have between the release of those data to the end of the month to forecast the current month.

As an example, if NEON eddy-covariance is released on April 15 for data from March 1 - 31, teams then can use these new March data to help generate forecasts from April 1 - May 5 (35 days). This April forecast is due by 11:59 pm EST on April 30. The forecast issue date for the April forecast is April 1, so no new observational data from after that date can be used to constrain forecasts and the forecast should use the weather forecast issued at midnight April 1 (i.e. start of day) as the driver (not the observed meteorology in April or forecasts made at later dates).

Evaluation will occur as new NEON data is released.



Here is a draft of the details of the targets, how they are calculated, descriptions of the target files, and examples of other environmental variables that could be used in the Challenge. The design team is currently working on this draft and will have a finalized version by December 9, 2020.

DESIGN TEAM:
Alex Young, SUNY - College of Environmental Science & Forestry
George Burba, LI-COR Biosciences
Jamie Cleverly, Terrestrial Ecosystem Research Network (TERN)
Ankur Desai, University of Wisconsin, Madison
Mike Dietze, Boston University
Andy Fox, Joint Center for Satellite Data Assimilation
William Hammond, Oklahoma State University
Danica Lombardozzi, National Center for Atmospheric Research
Quinn Thomas, Virginia Tech

PARTNERS:
The challenge is hosted by the Ecological Forecasting Initiative (EFI; https://ecoforecast.org/) and its U.S. National Science Foundation-sponsored Research Coordination Network (EFI-RCN; https://ecoforecast.org/rcn/).

Data used in the challenge are from the National Ecological Observatory Network (NEON): https://www.neonscience.org/.

Ameriflux is an excellent database of eddy-covariance data, including historical data for some of the four challenge sites: https://ameriflux.lbl.gov/.

Terrestrial Ecosystem Research Network (TERN) has been involved in the design of the challenge: https://www.tern.org.au/.

REFERENCES:
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437

Tick Populations

WHAT: Amblyomma americanum and Ixodes scapularis nymphal tick abundance

WHERE: 22 plots at 7 NEON sites

WHEN: Weekly forecasts for 34 weeks into the future starting March 31-October 31, 2021 with training data available January 31, 2021. Forecasts are submitted monthly and later submissions after the March 31 start are permissible.

WHY: There is a correlation between tick population abundance and disease incidence, meaning forecasts for tick abundance have the potential to aid in our understanding of disease risk through time and space.

WHO: Open to any individual or team that registers

HOW: REGISTER your team and submit forecast

OVERVIEW:
Target species for the population forecasts are Amblyomma americanum and Ixodes scapularis nymphal ticks. A. americanum is a vector of ehrlichiosis, tularemia, and southern tick-associated rash illness, while I. scapularis is a vector for Lyme disease, the most prevalent tick-borne disease in North America. Both species are present in the eastern United States, and have been collected at numerous NEON sites. There is a correlation between tick population abundance and disease incidence, meaning forecasts for tick abundance have the potential to aid in our understanding of disease risk through time and space.

CHALLENGE:
The challenge is open to any individual, group, or institution that may want to participate. The goals of this challenge are to forecast total Ixodes scapularis and Amblyomma americanum nymphs each epidemiological week (Sun-Sat) at a set of NEON plots within NEON sites. Due to challenges in data collected in 2020, this round of the forecasting challenge will simulate a true forecasting challenge by focusing on data from the 2019 field season.

NOAA Global Ensemble Forecast System weather forecasts for each NEON site is provided for teams to use: https://data.ecoforecast.org/minio/drivers/noaa/

Teams must provide access (minimum of URL, but ideally a script) to any additional data they wish to use to all teams in the challenge. Teams of various career stages and disciplines are encouraged.

Submissions of forecast and metadata will be through https://data.ecoforecast.org/minio/submissions/ using prescribed file formats described in the challenge theme documentation (PENDING).

Forecasts will be scored and compared using the Continuous Ranked Probability Score, a metric that combines accuracy and uncertainty estimation (Gneiting, T., & Raftery, A. E., 2007).

DATA: TRAINING & EVALUATION:
The challenge uses the following NEON data products:
DP1.10093.001: Ticks sampled using drag cloths

Total Ixodes scapularis will be forecasting for the following plots (siteID_plotID):
BLAN_012
BLAN_005
SCBI_013
SCBI_002
SERC_001
SERC_005
SERC_006
SERC_012
ORNL_007

Total Amblyomma americanum will be forecasting for the following plots (siteID_plotID):
SCBI_013
SERC_001
SERC_005
SERC_006
SERC_002
SERC_012
KONZ_025
UKFS_001
UKFS_004
UKFS_003
ORNL_002
ORNL_040
ORNL_008
ORNL_007
ORNL_009
ORNL_003
TALL_001
TALL_008
TALL_002

A file with previously released NEON data that has been processed into “targets” will be provided. The same processing will be applied to new data that are used for forecast evaluation. Before the Tick challenge begins, a processing script will be available in the neon4cast-ticks GitHub repository.

TIMELINE:
The timeline for this challenge will be monthly, which is how often new data will be released by the EFI RCN.

The final data set containing the training data will be available no later than January 31st, 2021. The challenge will begin (first forecast submission) on March 31st, 2021 at 11:59 PM Eastern Standard Time, and will run through October 31st, 2021 (last forecast submission).

2019 data will be released on the first of the month following a submission deadline, which gives teams a month to assimilate new data. For example, the forecasts submitted on March 31st, 2021 will be for every epidemiological week starting at the beginning of March 2019 through the end of November 2019. Then, on April 1st, 2021, tick counts from March 2019 will be released. The next forecast submission is April 30th, 2021, which will be for every epidemiological week starting at the beginning of April 2019 through the end of November 2019. The table below shows which epidemiological weeks are to be forecasted for each submission date.

2021 Forecast Submission Date2019 Target Epidemiological Weeks
March 3110-44
April 3014-44
May 3119-44
June 3023-44
July 3128-44
August 3132-44
September 3136-44
October 3141-44

Evaluation will occur shortly after each forecast submission.

Here is a draft of the details of the targets, how they are calculated, descriptions of the target files, and examples of other environmental variables that could be used in the Challenge. The design team is currently working on this draft and will have a finalized version by December 9, 2020.

DESIGN TEAM:
John Foster, Boston University
Matt Bitters, University of Colorado, Boulder
Melissa Chen, University of Colorado, Boulder
Leah Johnson, Virginia Tech
Shannon LaDeau, Cary Institute of Ecosystem Studies
Cat Lippi, University of Florida
Brett Melbourne, University of Colorado, Boulder
Wynne Moss, University of Colorado, Boulder
Sadie Ryan, University of Florida

PARTNERS:
The challenge is hosted by the Ecological Forecasting Initiative (EFI; https://ecoforecast.org/) and its U.S. National Science Foundation sponsored Research Coordination Network (EFI-RCN; https://ecoforecast.org/rcn/).

Data used in the challenge are collected by the National Ecological Observatory Network (NEON; https://www.neonscience.org/).

REFERENCES:
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Phenology

WHAT: Terrestrial phenology defined by daily greenness of plants

WHERE: 7 deciduous broadleaf forest NEON sites in the continental U.S.

WHEN: Daily forecasts for 35-days in the future from February 1 - July 1, 2021 with a second round being run for the autumn. Forecast submissions are accepted daily, and later submissions after the February 1 start are permissible.

WHY: Phenology has been identified as one of the primary ecological fingerprints of global climate change.

WHO: Open to any individual or team that registers

HOW: REGISTER your team and submit forecast

OVERVIEW:
Phenology has been shown to be a robust integrator of the effects of year-to-year climate variability and longer-term climate change on natural systems (e.g., recent warming trends). Experimental studies have shown how other global change factors (e.g., elevated CO2 and N deposition) can also influence phenology. There is a need to better document biological responses to a changing world, and improved phenological monitoring at scales from individual organisms to ecosystems, regions, and continents will contribute to achieving this goal.

Phenology researchers often use digital cameras (such as those that are part of the PhenoCam Network) that take regular repeated images of plant canopies to monitor changes in greenness throughout the year. The PhenoCam Network is a cooperative continental-scale phenological observatory that uses digital repeat photography to track vegetation phenology in a diverse range of ecosystems across North America and around the World. Imagery and data are made publicly available in near-real-time through the PhenoCam webpage: http://phenocam.sr.unh.edu/.

CHALLENGE:
This is an open ecological forecasting challenge to forecast spring green-up of the common greenness index (GCC), as measured by digital cameras at various deciduous broadleaf NEON sites. The forecasts will be forecasts of daily mean GCC (specifically the 90% quantile, which has been shown to be more robust). The sites include Harvard Forest (HARV), Bartlett Experimental Forest (BART), Smithsonian Conservation Biology Institute, (SCBI), Steigerwaldt Land Services (STEI), The University of Kansas Field Station, KS (UKFS), Great Smoky Mountains National Park (GRSM), Dead Lake (DELA), and National Grassland (CLBJ).

NOAA Global Ensemble Forecast System weather forecasts for each NEON site is provided for teams to use: https://data.ecoforecast.org/minio/drivers/noaa/

Teams must provide access (minimum of URL, but ideally a script) to any additional data they wish to use to all teams in the challenge. Teams of various career stages and disciplines are encouraged to submit forecasts.

Submissions of forecast and metadata will be through https://data.ecoforecast.org/minio/submissions/ using prescribed file formats described in the challenge theme documentation (PENDING).

Forecasts will be scored and compared using the Continuous Ranked Probability Score, a metric that combines accuracy and uncertainty estimation (Gneiting, T., & Raftery, A. E., 2007).

DATA: TRAINING & EVALUATION:
The challenge uses the following NEON data products:
DP1.00033.001: Phenology images

A file with previously released NEON data that has been processed into “targets” will be provided. The same processing will be applied to new data that are used for forecast evaluation. Before the Phenology challenge begins, a processing script will be available in the neon4cast-phenology GitHub repository.

TIMING:
Forecasts for a minimum of 35 days can be submitted daily by 6 pm ET for the period of February 1st through July 1st, 2021. Forecast should be submitted starting February 1st by 6 pm ET. A minimum of 35 days in the future must be forecasted for each submission. For example, the first submitted forecast should be for at least February 1st – March 7th, but it could be for the full spring. New forecasts can be submitted daily as new weather forecasts and observations (e.g., PhenoCam) become available. Processed PhenoCam data will be available daily by 11:59 pm ET for each day. Teams are allowed to start submitting forecasts after February 1st, but only forecasts of future days (when submitted) will be allowed. Late forecasts might be allowed under extenuating circumstances related to computer failure or processing delayed on our end. Forecasts do not have to be submitted daily and can be longer than 35 days.

Here is a draft of the details of the targets, how they are calculated, descriptions of the target files, and examples of other environmental variables that could be used in the Challenge. The design team is currently working on this draft and will have a finalized version by December 9, 2020.

DESIGN TEAM:
Kathryn Wheeler, Boston University
Michael Dietze, Boston University
Kathy Gerst, National Phenology Network,
Chris Jones, NC State University
Andrew Richardson, Northern Arizona University
Bijan Seyednasrollah, Northern Arizona University, PhenoCam Network

PARTNERS:
The challenge is hosted by the Ecological Forecasting Initiative (EFI; https://ecoforecast.org/) and its U.S. National Science Foundation sponsored Research Coordination Network (EFI-RCN; https://ecoforecast.org/rcn/).

Data used in the challenge are collected by the National Ecological Observatory Network (NEON; https://www.neonscience.org/) and hosted by the Phenocam Network (http://phenocam.sr.unh.edu/).

The forecasting challenge was developed in collaboration with the USA National Phenology Network: https://www.usanpn.org/usa-national-phenology-network.

REFERENCES:
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
Beetle Communities

WHAT: Beetle abundance and species richness from the National Ecological Observatory Network

WHERE: 47 terrestrial NEON sites that span the diverse ecosystems of the U.S.

WHEN: Weekly forecasts for 20 months in the future starting June 30-December 31, 2021; later submissions after the June 30 start are permissible

WHY: Improve understanding of habitat quality, conservation potential, land-use sustainability, and biodiversity change in response to global change and ecological disturbances

WHO: Open to any individual or team that registers

HOW: REGISTER your team and submit forecast

OVERVIEW:
Biodiversity monitoring is critical for understanding environmental quality, evaluating the sustainability of land-use practices, and forecasting future impacts of global change on ecosystems. Sentinel species give forewarning of environmental risk to humans, so are particularly useful for such monitoring and forecasting efforts because they can provide surrogates for other co-located components of biodiversity (Sauberer et al. 2004).

Ground beetles (Family: Carabidae) are appropriate candidates for biodiversity monitoring and ecological forecasting as they are well-studied sentinel species that are geographically widespread, and their community dynamics are particularly congruent with the diversity of other invertebrates (Holland 2002; Lundgren & McCravy 2011; Bousquet 2012; Hoekman et al. 2017). Therefore, monitoring carabid communities and forecasting changes in their species richness and abundance can be useful in studying edge effects and habitat quality (Magura 2002), conservation potential (Butterfield 1995), land-use sustainability (Pearce & Venier 2006) and biodiversity change in response to global change and ecological disturbances (Koivula 2011). Most ecological forecasting models are limited in the geographic scale and also suffer from scarcity of temporally extensive data. Further, most existing forecasting efforts focus on a single species (Humphries et al. 2018) with limited community-wide forecasts at the continental scale. Developing forecasts for community-scale metrics (i.e., species richness, abundance) and evaluating such models for accuracy and generalizability can help test our scientific knowledge of spatial (geographical turnover) and temporal (seasonal, inter-annual) carabid community dynamics (Dietze et al. 2018). Such forecasting models can inform regional or local habitat management, identify where biodiversity monitoring efforts should be prioritized, and shed light on what data or modelling techniques are needed to build the best forecasts of ecological dynamics (e.g., can we predict richness or abundance better and why?) (Johansson et al. 2019).

With the long-term, community-wide, continental-scale data collection through the National Ecological Observatory Network (NEON), 181 data products are available for 81 sites in the US (47 terrestrial, at which carabids are sampled, and 34 aquatic). Fully initiated in 2019, this sampling will continue for 30 years (Schimel et al. 2007; 2011). NEON has effectively removed the previous barriers to community-scale forecasting across a broader geographical realm.

CHALLENGE:
This is an open ecological forecasting challenge to forecast carabid species richness, defined as the total number of species, and abundance, defined as the total number of carabid individuals. The forecasts should be done weekly per site for all NEON terrestrial sites with richness being absolute and abundance scaled by the sampling effort. Contributing teams are required to submit a forecast for May-Dec 2021 and Jan-Dec 2022 on June 30, 2021. However, teams are encouraged to update their forecasts on the last day of each month, ending on December 31, 2021, as NEON validation data are released. NEON releases carabid sampling data weekly and no sooner than 60 days after collection, so a model submitted on June 30 can include a forecast for the first week of May, and so forth. Teams may use any open data products as drivers of richness and abundance so long as they are not from the month being forecast, and are made publicly available (minimum of URL, but ideally a script). Potential driver data sources include: NEON site data (Soil and sediment data, Terrestrial Plant data, weather data), NOAA forecasts, and beyond.

Submissions of forecast and metadata will be through https://data.ecoforecast.org/minio/submissions/ using prescribed file formats described in the challenge theme documentation (PENDING).

Forecasts will be scored and compared using the Continuous Ranked Probability Score, a metric that combines accuracy and uncertainty estimation (Gneiting, T., & Raftery, A. E., 2007). Only weeks for which data are collected will be included in scoring.

DATA: TRAINING & EVLAUATION:
The challenge uses the following NEON data product:
DP1.10022.001: Ground beetles sampled from pitfall traps

A file with previously released NEON data that has been processed into the aggregate “target” variables (richness and abundance) will be provided. The same processing will be applied to new data that are used for forecast evaluation. Further information about data structure, initial code for processing, and example null forecasts is provided in the neon4cast-beetles GitHub repository.

TIMELINE:

The timeline is determined by the data latency provided by NEON. NEON carabid pitfall trap data is released weekly with a latency of at least 60 days after collection. Weekly forecasts were chosen to best match up with NEON’s weekly release of carabid data.

The challenge will begin June 30, 2021 at 11:59 Eastern Standard Time (UTC−05:00) and run through December 31, 2021. Subsequent forecasts are due at 11:59 EST on the final day of each month.

As an example, carabid pitfall trap data released in the last week of June would include data as recent as the last week of April. Thus, a model submitted in the last week of June could include forecasts from May onwards. Then, the forecast update submitted at the end of July can use new May data to help refine June forecasts, or forecasts following June, depending on what drivers are used. The July forecast update is due by 11:59 pm EST on July 31. Forecasts updates will not be considered for weeks where NEON validation data has already been released (i.e., no May forecast updates may be submitted on July 31) or for weeks when no NEON carabid data is available.

Here is a draft of the details of the targets, how they are calculated, descriptions of the target files, and examples of other environmental variables that could be used in the Challenge. The design team is currently working on this draft and will have a finalized version by December 9, 2020.



* Click on the images above to expand

Pitfall traps are collected and reset every 2 weeks throughout the growing season. Due to weather and conditions beyond the field team’s control, this collection schedule may not be followed. Thus, the exact collection dates cannot be known until they have happened. Field data are made publicly available on the data portal as the bet_fielddata dataframe no sooner than 14 days after collection. Teams can use bet_fielddata to inform the actual collection schedule. Data with parataxonomist identifications are made publicly available on the data portal as the bet_sorting dataframe no sooner than 60 days after collection. Data are released on the NEON data portal weekly, but may be released on any day of the week. The forecast schedule follows the ISO week standard with weeks starting on Mondays.

DESIGN TEAM:
Anna Spiers, University of Colorado, Boulder
Carl Boettiger, University of California, Berkeley
Tad Dallas, Louisiana State University
Nico Franz, NEON Biorepository at Arizona State University
Kari Norman, University of California, Berkeley
Thilina Surasinghe, Bridgewater State University
Brett Melbourne, University of Colorado, Boulder
Eric Sokol, NEON
Kelsey Yule, NEON Biorepository at Arizona State University

PARTNERS:
The challenge is hosted by the Ecological Forecasting Initiative (EFI; https://ecoforecast.org/) and its U.S. National Science Foundation sponsored Research Coordination Network (EFI-RCN; https://ecoforecast.org/rcn/).

Data used in the challenge from National Ecological Observatory Network (NEON): https://www.neonscience.org/.

REFERENCES:
Bousquet, Y. (2012) Catalogue of Geadephaga (Coleoptera: Adephaga) of America, north of Mexico. ZooKeys 245: 1-1722. https://doi.org/10.3897/zookeys.245.3416

Butterfield, J., Luff, M., Baines, M., Eyre, M. (1995) Carabid beetle communities as indicators of conservation potential in upland forests. Forest Ecology and Management 79, 63-77.
https://doi.org/10.1016/0378-1127(95)03620-2

Dietze, M.C., Fox, A., Beck-Johnson, L.M., Betancourt, J.L., Hooten, M.B., Jarnevich, C.S., Keitt, T.H., Kenney, M.A., Laney, C.M., Larsen, L.G. (2018) Iterative near-term ecological forecasting: Needs, opportunities, and challenges. Proceedings of the National Academy of Sciences 115, 1424-1432. https://doi.org/10.1073/pnas.1710231115

Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437

Hoekman, D., LeVan, K.E., Gibson, C., Ball, G.E., Browne, R.A., Davidson, R.L., Eriwin, T.L., Knisley, C.B., LaBonte, J.R., Lundgren, J., Maddison, D.R., Moore, W., Niemela, J., Ober, K.A., Pearson, D.L. Spence, J.R., Will, K., Work, T. (2017) Design for ground beetle abundance and diversity sampling within the National Ecological Observatory Network. Ecosphere, 8(4), e01744. https://doi.org/10.1002/ecs2.1744

Holland, J.M. (2002) The agroecology of carabid beetles. Intercept Limited, Andover.

Humphries, G.R., Che-Castaldo, C., Bull, P., Lipstein, G., Ravia, A., Carrión, B., Bolton, T., Ganguly, A., Lynch, H.J. (2018) Predicting the future is hard and other lessons from a population time series data science competition. Ecological Informatics 48, 1-11. https://doi.org/10.1016/j.ecoinf.2018.07.004

Johansson, M.A., Apfeldorf, K.M., Dobson, S., Devita, J., Buczak, A.L., Baugher, B., Moniz, L.J., Bagley, T., Babin, S.M., Guven, E. (2019) An open challenge to advance probabilistic forecasting for dengue epidemics. Proceedings of the National Academy of Sciences 116, 24268-24274. https://doi.org/10.1073/pnas.1909865116

Koivula, M.J. (2011) Useful model organisms, indicators, or both? Ground beetles (Coleoptera, Carabidae) reflecting environmental conditions. ZooKeys, 287-317. https://doi.org/10.3897/zookeys.100.1533

Lundgren, J., McCravy, K. (2011) Carabid beetles (Coleoptera: Carabidae) of the Midwestern United States: A review and synthesis of recent research. Terrestrial arthropod reviews 4, 63-94. https://doi.org/10.1163/187498311X565606

Magura, T. (2002) Carabids and forest edge: spatial pattern and edge effect. Forest Ecology and Management 157, 23-37. https://doi.org/10.1016/S0378-1127(00)00654-X

Pearce, J.L., Venier, L.A. (2006) The use of ground beetles (Coleoptera: Carabidae) and spiders (Araneae) as bioindicators of sustainable forest management: A review. Ecological Indicators 6, 780-793. https://doi.org/10.1016/j.ecolind.2005.03.005

Sauberer, N., Zulka, K.P., Abensperg-Traun, M., Berg, H.-M., Bieringer, G., Milasowszky, N., Moser, D., Plutzar, C., Pollheimer, M., Storch, C. (2004) Surrogate taxa for biodiversity in agricultural landscapes of eastern Austria. Biological Conservation 117, 181-190. https://doi.org/10.1016/S0006-3207(03)00291-X

Schimel, D., Hargrove, W., Hoffman, F., MacMahon, J. (2007) NEON: a hierarchically designed national ecological network. Frontiers in Ecology and the Environment 5, 59-59. https://doi.org/10.1890/1540-9295(2007)5[59:NAHDNE]2.0.CO;2

Schimel, D., Keller, M., Berukoff, S., Hufft, R., Loescher, H., Powell, H., Kampe, T., Moore, D., Gram, W. (2011) NEON Science Strategy: Enabling Continental-Scale Ecological Forecasting. https://www.neonscience.org/sites/default/files/basic-page-files/NEON_Strategy_2011u2_0.pdf

Participation

Participation guidance

HOW TO PARTICIPATE:
Participation requires that teams:
1) Complete a REGISTRATION for each forecast theme you are participating in and each model you are contributing within a theme
2) Agree to the participation agreement below
3) Submit forecast netCDF or csv file(s)
4) Provide the metadata xml file documenting the forecast

One contact person should register on behalf of their team. That contact person will be asked to provide the group members' names, emails, and affiliations so that everyone in the group can receive an invitation to join the Challenge theme Slack channel and access group resources. Teams are allowed and encouraged to join the challenge after the start date of each Challenge theme because there are multiple deadlines to submit forecasts. However, only forecasts submitted by each submission deadline will be officially scored.


TEAMS:
Teams can be individuals or groups. They can represent institutions or organizations. You will have 25 characters for a team name (e.g., “EFI Null Model”) and 10 characters for the team name ID (no spaces allowed; e.g., “EFI_Null”).

The registration includes team categories (e.g., undergraduate only, graduate only, multi-institution, etc). Please check all that apply.

If your team wants to submit multiple forecasts, please register a team for each model as only one forecast model per cycle per team is allowed.

SLACK AND GITHUB COMMUNICATION
We strongly encourage participants to use the Challenge theme Slack channels to ask questions, discuss ideas and challenges, and share resources. Overall, we strongly encourage a collegial approach to the Challenge -- this is a friendly competition to move the field forward and bring more people into the community, not a cutthroat competition to win by denying other teams useful information.

GitHub repositories for each Challenge theme will be available with helper code and an example workflow (null models). We encourage teams to contribute code to these repositories (via Pull Request) if they develop additional helper code. This is especially important if an individual or group is going to add additional data constraints to their forecast. Remember, the use of data external to NEON is allowed and encouraged so long as it is publicly available and other teams are notified about it. Also, while most anything could be used to calibrate parameters and constrain initial conditions, only other forecasts (e.g. weather) can be used as drivers/covariates during the actual forecast period.

Links to GitHub Repositories:


Electronic Submission

FORECAST FORMAT:
Teams will submit their forecasts as a single netCDF or csv file with the following naming convention:

[theme_name]-[year]-[month]-[day]-[team_name_ID].csv
or
[theme_name]-[year]-[month]-[day]-[team_name_ID].nc

With the [theme_name] being: aquatics, terrestrial, ticks, beetles, or phenocam

Where [year], [month], and [day] are the year, month, and day for the submitted forecast and the [team_name_ID] is the code for the team name that is specified in the registration (team_name_ID is a 10 character name with no spaces in it).

Forecast netCDF or csv files should have the following columns (csv) or variables (netcdf) that correspond to the columns.
- siteID: NEON code for site
- ensemble*: integer value for forecast replicate within the year and month (i.e. ensemble member or MCMC sample)
- forecast: set as 1 for each row (1 = variables were forecasted; a 0 would designate a hindcast which does not apply to submissions to the challenge)
- data_assimilation: set as 0 for each row (0 = no data assimilation occurred because it is a forecast)

*Teams that are not using ensemble-based forecast methods should replace the ensemble column with a statistic column. Multiple statistics can be reported using a long format in a csv or adding a statistic dimension in netCDF. The valid options for this column are mean, sd, Conf_interv_02.5, Conf_interv_97.5, Pred_interval_02.5, and Pred_interval_97.5. In the last four options numbers indicate equal-tail quantiles for a 95% interval estimate and conf=confidence and pred=predictive. If statistics are reported we will make a Gaussian assumption when calculating error scores. The Continuous Ranked Probability Score is based on the predictive distribution so reported sd (standard deviation) should be for the predictive distribution and if intervals are reported we will score using the predictive interval. If both are reported we will use the sd rather than the interval estimate.

The following are required columns that differ by theme:

Terrestrial
- time: YYYY-MM-DD HH:MM UTC of the start of the 30-minute value or YYYY-MM-DD for daily forecasts
- nee: net ecosystem exchange (umol CO2 m-2 s-1)
- le: latent heat (W m-2)
- vswc: volumetric soil water content (%)

Beetles
- time: YYYY-WW of forecast (year-week)
- abund: abundance of beetles
- n: species richness of beetles

Aquatics
- time: YYYY-MM-DD of forecast
- oxygen: dissolved oxygen (ug/L)
- temp: water temperature (C)

Phenology
- time: YYYY-MM-DD
- gcc: green chromatic coordinate

Ticks
- time: YYYY-WW of forecast (year-week)
- plotID: NEON plotID
- Amblyomma: Number of Amblyomma americanum nymphs per plot per week
- Ixodes: Number of Ixodes scapularis nymphs per plot per week

For those using netCDF, the order of dimensions on forecast variables should be: time, site, plot [ticks only], ensemble or statistic.

Additional detail about file formats can be found in the EFI Forecast Standard Documentation.

METADATA FORMAT:
Each submission requires a metadata file to be submitted. The metadata file must have the same name as the submitted forecast, but with the .xml extension. Model descriptions can be uploaded to https://data.ecoforecast.org/minio/submissions/

[theme_name]-[year]-[month]-[day]-[team_name_ID].xml

The metadata standard has been designed by the Ecological Forecasting Initiative and is build off the widely used Ecological Metadata Language (EML).

The following components are required:
- Title
- pubDate: date that forecast is generated
- License: See below for details
- Creator(s)
- Spatial and temporal coverage
- File descriptions
- Model timestep
- forecast_horizon (length)
- forecast_issue_time
- forecast_iteration_id
- forecast_project_id: team_name_ID (e.g., "EFI_NULL")
- model_description which includes:
1) forecast_model_id
2) Name
3) Type: statistical, process-based, machine-learning, etc.
4) Repository: url or DOI
- Info on model structure and uncertainty (standards section 2.1.2)

The license for the forecast output is required to be from the following Creative Commons License options: CC BY, CC BY-SA, CC BY-NC, CC BY-NC-SA. While we recommend a CC BY license, teams may use less permissive CC licenses if more appropriate. The license entry can be the CC option (i.e., CC BY) and a web link to the full CC license (e.g., https://creativecommons.org/licenses/by/4.0/)

We recommend teams read the full metadata standard description for definitions and more information, and in particular that they look at the example vignettes, which demonstrate the standard being used. Note that these Standards are a work in progress. If you find issues as you are applying them, let us know at eco4cast.initaitive@gmail.com.

The Ecological Forecasting Initiative has provided R scripts to assist in generating the metadata XML file. The scripts can be found at the GitHub repository for the standard: https://github.com/eco4cast/EFIstandards as well as the EML validator. Teams are encouraged to check the validity of their metadata before submission.

SUBMISSION PROCESS:
Individual files (csv, netCDF, xml) can be uploaded any time before the specific deadlines as defined by each theme. Only the most recent files will be scored.

Teams will submit their forecast netCDF or csv files through the challenge website. You can manually submit your forecast through the https://data.ecoforecast.org/minio/submissions/ website using the red plus on the bottom left. You can submit from an R script using the following:

Sys.setenv("AWS_DEFAULT_REGION" = "data",
"AWS_S3_ENDPOINT" = "ecoforecast.org")

aws.s3::put_object(object = “[theme_name]-forecast-[year]-[month]-[day]-[team_name].csv”, bucket = “submissions”)

Submissions need to adhere to the forecast format that is provided above, including the file naming convention. Our cyberinfastructure automatically evaluates forecasts and relies on the expected formatting. Contact eco4cast.initiative@gmail.com if you experience technical issues with submitting.

ARCHIVING MODELS
Teams are highly encouraged to publicly archive the code they are using for their forecast models and workflows. Information about where models are archived would be included in your metadata XML.

Teams are also encouraged to use Docker or Singularity to containerize their models & workflows. EFI conventions for containerizing forecasts are still being developed, but our aim (particularly in later years of the forecast challenge) is to be able to provide shared cyberinfrastructure that makes it easier for teams to automate containerized forecasts. Containers will also facilitate Challenge themes interested in performing post-hoc analyses, such as uncertainty quantification and sensitivity analysis.

COMPUTATIONAL RESOURCES
We are currently working with CyVerse for access to computational resources for teams that require resources not available through home institutions. We will update with more details as they become available.

Participation Agreement

All participants agree to have their forecast posted in real-time on the NEON Ecological Forecast Challenge Output RShiny app (in development) and potentially published in a scientific journal. The manuscripts describing the accuracy of forecasts across teams will be coordinated by the Ecological Forecasting Initiative Research Coordination Network and extend authorship to members of each team with an opt-in policy.

If a publication is generated by a forecast team, we ask that the manuscript acknowledge the Ecological Forecasting Initiative Research Coordination Network and its support from the National Science Foundation (DEB-1926388).

NEON Data Use

NEON data products, software, and derivatives thereof are freely available for use when accompanied by appropriate disclaimers, acknowledgments, and data citations, defined in the NEON data use policy.
Additional Data Options

Individuals and groups may create forecasts that use other publicly available data in addition to the NEON data, so long as other teams participating in the challenge are notified about the existence of the data via the Challenge theme’s Slack channel. Teams are encouraged to make available the code they are using to access, download, and process any additional data constraints they are using, ideally via a pull request to each Challenge Github repo.

When considering the use of data in forecasts it is important to distinguish data that are being used as drivers/covariates during each forecast from data being used to constrain model structure, parameters, initial conditions, and error distributions. While the latency of NEON data requires that some of our forecast will be (fully or partly) hindcasts, all forecasts should be run as if they are true forecasts -- you cannot use any observed data as a driver/covariate or constraint during the forecast period itself as that info would not have been available at the forecast start date. For example, if you find that a particular variable is a useful covariate during the model development & calibration period (e.g. soil temperature) then you would need to find or make a forecast of that variable if you want to use it as a covariate. Teams using meteorological covariates should use the shared meteorological driver data provided by EFI (see Shared Forecast Drivers).

As an example of potentially useful external data, each NEON site has subsets of various remote sensing products that are hosted on the ORNL DAAC (ORNL DAAC subsets). These include:
  • MODIS collection 6: LAI, FPAR, burned area, surface reflectance, land surface temperature, vegetation indices (NDVI, EVI), modeled ET, GPP, NPP.

  • VIIRS collection 1: surface reflectance, vegetation indices, LAI, FPAR, land surface temperature,

  • SMAP: modeled NEE, GPP, Rh, SOC

  • Daymet: daily surface weather data


  • Evaluation

    Forecasts will be evaluated at each site and forecast horizon (i.e., time-step into the future), and a summary score will be assigned evaluating overall performance of all forecast submissions across sites.  Forecasts will also be compared to a null model.

    Forecast evaluation results will be presented for all submitted models together and separately for each team category: undergraduate student only team, graduate student only team, post-doc only team, single institution team, multi-institution team, international team (team with individuals from at least two countries).

    Results

    Preliminary results will be distributed using the NEON Ecological Forecast Challenge Output RShiny app (in development) and at https://data.ecoforecast.org/minio/scores/. We intend to write a joint manuscript synthesizing forecasts. Teams are welcome to publish results from their model at any time. If a publication is generated we encourage the manuscript to acknowledge the Ecological Forecasting Research Coordination Network and its support from the National Science Foundation (DEB-1926388).
    Continuous Ranked Probability Score

    Forecasts will be scored using the continuous ranked probability score (CRPS), a proper scoring rule for evaluating forecasts presented as distributions or ensembles (Gneiting & Raftery 2007). The CRPS compares the forecast probability distribution to that of the validation observation and assigns a score based on both the accuracy and precision of the forecast. We will use the 'crps_sample' function from the 'scoringRules' package in R to calculate the CRPS for each forecast.

    Here is the scoring GitHub repository which includes an Rmarkdown document that explores the metric and an example of code used for evaluating the beetle forecasts.

    We will generate a combined score for all locations and forecast horizons. Forecasts will also be evaluated using the CRPS at each time-step in the forecast horizon and each location included in the forecasts.

    Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378. https://doi.org/10.1198/016214506000001437
    Null forecast

    All forecasts will be compared to a null forecast produced by a simple historical-means calculation. The GitHub repository for each theme will include the code for the null model.
    Forecast Submission Visualization and Leaderboard

    The Webpage Is In Development

    Shared Forecast Drivers

    We are downloading, subsetting, and processing forecasted meteorology drivers for each NEON site.

    Meteorology: NOAA Global Ensemble Forecasting System
    Meteorology: NEON Observed

    In Development

    Video Resources

    These videos provide a) an overview of why we need forecasts and why we are using NEON data, b) describes Forecasting Challenges in general, and c) provides an overview of the draft Ecological Forecast Standards that the EFI Cyberinfrastructure, Methods & Tools, and Theory Working Groups have developed. The Standards are still in beta testing, but you can find details on this GitHub repo which summarizes the proposed standards.

    Overview of the EFI-RCN, Why Forecasting is Important, and Why Use NEON Products
    General Description of Forecasting Challenges
    Draft of the Forecasting Standards
    Playlist of Videos Describing the NEON Data Products