August 22, 2023
Melissa Kenney1, Michael Gerst2, Toni Viskari3, Austin Delaney4, Freya Olsson4, Carl Boettiger5, Quinn Thomas4
1University of Minnesota, 2University of Maryland, 3Finnish Meteorological Institute,4Virginia Tech, 5University of California, Berkeley
With the growth of the EFI NEON Ecological Forecasting Challenge, we have outgrown the current Challenge Dashboard, which was designed to accommodate a smaller set of forecasts and synthesis questions. Thus, we have reenvisioned the next stage of the EFI-RCN NEON Forecast Challenge Dashboard in order to facilitate the ability to answer a wider range of questions that forecast challenge users would be interested in exploring.
The main audience for this dashboard are NEON forecasters, EFI, Forecast Synthesizers, and students in classes or teams participating in the Forecast Challenge. Given this audience, we have identified 3 different dashboard elements that will be important to include:
- forecast synthesis overview,
- summary metrics about the Forecast challenge, and
- self diagnostic platform.
During the June 2023 Unconference in Boulder, our team focused on scoping all three dashboard elements and prototyping the forecast synthesis overview. The objective of the synthesis overview visual platform is to support community learning and emergent theory development. Thus, the synthesis visualizations are aimed at creating a low bar entry for multi-model exploration to understand model performance, identify characteristics that lead to stronger performance than others, the spatial or ecosystems that are more predictable, and temporal forecast validity.
You can view the prototype developed during the meeting HERE and in Figures 1 and 2.
Figure 1. Static image of an interactive map of aggregate forecast skill relative to climatology at each forecasted sites, here showing the water temperature forecasts for the aquatics theme. Bubble colour represents the continuous rank probability score (CRPS) skill relative to climatology with positive values (blues) showing submitted models on average perform better than climatology and negative values showing submitted models perform worse (reds). The size of the bubble represents the percentage of submitted models that outperformed the climatology null (i.e., larger sized bubbles have a higher percentage of skilled models). When hovered over, the bubbles show this percentage (perc_skilled), the site type (field_site_subtype), as well as the total number of models forecasting at that site (n_mod).
Figure 2. a) Percentage of submitted models that are classed as ‘skillful’ (outperform the null climatology forecast based on the continuous rank probability score metric) at the river (n=27) and lake sites (n=6) for water temperature forecasts at each horizon from 1 to 30 days ahead. b) Percentage of submitted models that are classed as ‘skillful’ for water temperature forecasts at six of the lake sites (https://www.neonscience.org/field-sites/explore-field-sites).
Developing these graphics requires aggregation of skill scores. There are a multitude of metrics that can be used to calculate the skill score, which each have their own benefits and flaws. Thus, there should be multiple skill scores for different metrics with clear presentation of what metric is used at a given visualization. Additionally, in order to isolate what sites are more interesting from a model development perspective, there needs to be a comparison of how many of the models meet a baseline skill score at a given site at a chosen time frame. That allows isolating challenge areas and also easily informs which models really succeed at situations where others struggle. For better future analysis of how models perform at certain sites, we also envisage the visualization to include the skill scores for the relevant drivers (NOAA weather) for comparison. For example, if we see a drop in skill across models in water temperature projections after some time, there should be a direct method to assess if this reflects overall flawed model dynamics or if the weather forecast driving the water temperature loses its reliability. This also allows the user to approximate a maximum length in which the model performance analysis is at all useful.
In addition to the main synthesis overview, the goal of this platform is to support exploration of synthesis data. For all themes, there was general agreement that it would be useful to pull up at a glance, site characteristics, a photo, and basic summary statistics about the number of models and model performance.
During the meeting, we worked with the Aquatics and Beetles Challenge teams to identify some of the key data aggregation groupings that will be important to facilitate exploration. One important distinction arose during the conversations – the baseline model, time scale, and data latency. For Aquatics there is a long time series of data that create a climatology and data are provided relatively quickly via data loggers. For Beetles, there is a different null baseline model given the length of historic data that is different at each site and it takes a year to provide beetle abundance and richness assessment. There was also a desire to have specific types of synthesis visualizations including the species accumulation curve over years, 3-year running average, and indicating the lower and upper bounds of a particular variable (use in scale). Thus, for both Beetles and Aquatics there are similarities and differences in the types of groupings that would be most useful to support synthesis exploration.
Table 1. Different data groupings that would be useful to facilitate easy-to-develop synthesis visualizations of the EFI-NEON Forecast Challenge models to facilitate learning and community theory development.
|Team / Challenge
|theme, site, model ID, customized classroom or team groupings
|particular variables (e.g., DO) within a theme
|Spatial / Ecosystems
|sites, NEON domains, site type (river, stream, lake…), altitude (high vs lowlands)
|sites by distance, dominant NLCD classification
|average for past year, seasonal groupings,
|1 day, 5 days, 7 days, 15 days, 30 days
|14 days, growing season, multi-year (up to 5 year) forecasts
|best model at each site, model inputs, model structure, functional type, output uncertainty representation
|model run time, model computational requirements
|current skill forecast approaches, better than climatology/null baseline,
|comparison of your model to the best forecast
|environmental variables and weather forecast observations
|comparison with weather/climate forecast skill
|disturbance events (e.g., widlfire), growing season dates at each sites, site disturbance characteristics (e.g., mowing, fencing)
In addition to the synthesis overview, there were two complementary and linked platforms that will create the dashboard. First, the objective of the forecast challenge overview is to provide a basic summary of metrics related to the overall EFI NEON Ecological Forecasting Challenge. Specifically, the metrics that would be included are: number of forecasts submitted, number of unique teams, percentage (or median of all) of models that are better than climatology or a null model per theme, and total forecast and observation pairs.
Second, the objective of the self-diagnositic platform is to provide an overview for individuals or team forecast contributions and performance. The types of summaries that will be provided for the forecasters are: confirmation of forecast submission, date of the most recent forecast submitted for a model, model performance relative to climatology or null model, model prediction versus observation, model performance vs other selected models, and model skill over a specific time horizon (to assess whether it performs better over time).
Overall, the goal of the re-envisioned visual dashboard is to create platforms that will allow us to track challenge engagement, individually or as a team diagnose any model submission problems and performance improvement opportunities, and support community theory development through a synthesis given the range of models submitted through the EFI NEON Ecological Forecasting Challenge. Long-term, if this platform structure is useful and robust, it could be applied to other systems where there are multi-model predictions and there is a desire to collaboratively learn together to improve our theoretical understanding and forecasts to support decision-making.
We are looking for input from the EFI community on the synthesis dashboard for other themes, to discuss with individuals what synthesis would be most relevant to phenology, terrestrial, and ticks forecasters. Reach out to email@example.com to share your thoughts or let us know you would like to join future conversations about updating the dashboard.