Statistical Methods Seminar Series

EFI and the ESA Statistical Ecology Section are hosting this virtual seminar series that demonstrates a variety of quantitative methods applied within Ecology and Environmental Science in the R programming language. Attendees will gain valuable insight into methods that they may or may not be familiar with from experts on a given topic.

Target audience: Quantitative environmental scientists and ecologists either in-training (graduate students and postdocs) or working professionals in academia, government agencies, or non-governmental organizations. Attendees are expected to be proficient in R.

Webinar structure: Each seminar is 1-1.5 hour in length and is led by a different invited speaker with expertise on a given topic or statistical method. Speakers spend the first part of the webinar presenting a project where they used the method, followed by sharing R code or packages related used for the statistical method. Presenters walk through the code, taking time to describe common pitfalls or stumbling blocks for performing the method and visualizing results. R code is available on this GitHub repository. Recordings from the webinars are available in the EFI YouTube Statistical Methods Webinar Series Playlist.

Dates/Times: We will have monthly webinars typically on the first Monday of each month at noon US Eastern unless. The first round of webinars has concluded. We expect to start the second round in September 2022.

Recordings and R Resources from Previous Seminars:

Recordings and R Resources from Previous Seminars

Hidden Markov Models in Ecology. May 2, 2022

  • A recording of the presentation, Vianey’s walkthrough of the R code, and the Q&A are here:
  • Vianey starts going through the R code at time 36:13 and the Q&A starts at time 55:45.
  • Vianey’s presentation and R code and data are available on GitHub HERE
  • Additional papers shared during the webinar include:
    • Pohle, J., Langrock, R., van Beest, F.M. et al. Selecting the Number of States in Hidden Markov Models: Pragmatic Solutions Illustrated Using Animal Movement. JABES 22, 270–293 (2017).
    • Valle, D.; Jameel, Y.; Betancourt, B.; Azeria, E.; Attias, N.; Cullen, J. 2022. Automatic selection of number of clusters using Bayesian clustering and sparsity inducing priors. Ecological Applications, 32:e2524.
    • Cullen, J. A., Poli, C. L., Fletcher, R. J., & Valle, D. (2022). Identifying latent behavioural states in animal movement with M4, a nonparametric Bayesian method. Methods in Ecology and Evolution, 13, 432– 446.

Hidden Markov models (HMMs) are a widely applied modeling framework to data with serial dependence in ecology. An HMM is a time series model involving two layers, an observable state-dependent process and an unobservable state process, where the unobservable state process can be thought to serve as a proxy for biological processes of interest. For instance, in application of HMMs to animal movement, the states can serve as a proxy for animal behavior. It is also straightforward to incorporate environmental variables in the state and/or observation process, account for missing data, and account for individual variation through the use of random effects. Dr. Leos Barajas will demonstrate how to fit HMMs in a Bayesian framework with the R packages ‘rstan‘ and ‘cmdstanr‘, both of which use the programming language Stan, as well as how to interpret the results and common pitfalls of an HMM analysis. 

Vianey Leos Barajas is an Assistant Professor in the Department of Statistical Sciences and the School of the Environment at the University of Toronto and leads the Bayesian Ecological and Environmental Statistics (B.E.E.S.) research group. B.E.E.S. is dedicated to the development of statistical methodology to answer pressing ecological and environmental questions. Dr. Leos Barajas’ work focuses on the analysis of sensor data collected from animals and the environment over time and space but also includes collaborations in health and other areas.

NIMBLE. April 18, 2022

NIMBLE, short for Numerical Inference for statistical Models using Bayesian and Likelihood Estimation, is a system for building and sharing analysis methods in R for statistical models. The NIMBLE system provides a flexible language for declaring a wide range of hierarchical models, a framework for defining algorithms that operate on this representation of models, and a compiler for generating equivalent C++.

Lauren Ponisio is an assistant professor at the University of Oregon, where she uses modeling, synthesis, and field-based work to study pollinators and understand the mechanisms by which species interactions maintain species diversity. Dr. Ponisio is working with NIMBLE to build common hierarchical models used in ecology, mainly occupancy models, and is the lead author this study looking at NIMBLE’s MCMC performance and customizations for a variety of ecological models. 

Multi-Species (Species Interactions) Occupancy Modeling. April 4, 2022

  • A recording of the presentation and Q&A are here:
    • Chris started with the R code at time 26:45. The Q&A starts at time 1:08:21.
  • Chris’ presentation and R code and data are available on GitHub HERE
  • Here is a quick link to the presentation which has links to the papers Chris references
  • During the presentation, Chris recommended the book: Applied Hierarchical Modeling in Ecology by Marc Kéry & Andy Royle

Multi-species occupancy models incorporate both environmental variables and interspecific correlations when estimating factors that influence occupancy, all while accounting for imperfect detection.  Further, multi-species occupancy models can be used to explore whether interspecific correlations vary across environmental gradients.  Given the detail with which multi-species occupancy models are able to investigate interspecific correlations, they are best suited for relatively small species groups. Dr. Rota will demonstrate how to use the ‘unmarked’ R package to fit, interpret, and solve common problems associated with multi-species occupancy models.

Christopher Rota is an Assistant Professor of Wildlife & Fisheries Resources at West Virginia University. Dr. Rota’s research addresses diverse questions in applied vertebrate ecology working with birds, mammals, reptiles, and amphibians. He is interested in understanding factors that shape the spatial distribution of species, and the dynamic interplay between space use and demography. A common link throughout his research is the application and development of modern statistical techniques that capture many of the myriad processes giving rise to ecological data sets.

Integrated Step-Selection Analysis. March 7, 2022

  • A recording of the presentation and Q&A are here:
    • Brian walked through the R code at time 27:58, followed by Tal going over FAQs at time 1:01:58 and the Q&A starts at time 1:08:13.
  • R code and presentation slides are available on GitHub HERE.
  • Answers to additional questions that were not covered during the live session are available on GitHub HERE.
  • Citations shared during the presentation:
    • Avgar, et al. 2016. Integrated step selection analysis: bridging the gap between resource selection and animal movement. Methods Ecol Evol, 7: 619-630.
    • Fieberg, et al. 2021. A ‘How to’ guide for interpreting parameters in habitat-selection analyses. J Anim Ecol. 90: 1027– 1043.
    • Fieberg et al. 2017. Used-habitat calibration plots: A new procedure for validating species distribution, resource selection, and step-selection models. Ecography. 41. 10.1111/ecog.03123.
    • Prokopenko, C.M., Boyce, M.S. and Avgar, T. (2017), Characterizing wildlife behavioural responses to roads using integrated step selection analysis. J Appl Ecol, 54: 470-479.
    • Avgar et al. 2017. Relative Selection Strength: Quantifying effect size in habitat- and step-selection inference. Ecol Evol. 7: 5322– 5330.
    • Signer et al. 2017. Estimating utilization distributions from fitted step-selection functions. Ecosphere 8( 4):e01771. 10.1002/ecs2.1771
    • Additional references are available in the FAQ section of the pdf in the GitHub repository

A habitat selection function is a model of the relative probability that an available spatial unit will be used by an animal given its habitat value, but how do we appropriately define availability? In an integrated Step-Selection Analysis (iSSA), availability is defined by the animal’s ‘selection-free movement kernel’, which is fitted in conjunction with a conditional habitat-selection function. Parameter estimates are obtained using a conditional-logistic regression by contrasting each ‘used step’ (a straight line connecting two consecutive observed positions of the animal) against a set of ‘available steps’ (randomly sampled from one of several possible theoretical distributions). iSSA thus relaxes the implicit assumption that movement is independent of habitat selection and instead allows simultaneous inference on both processes, resulting in an empirically parametrized mechanistic space-use model.

In this webinar, we will highlight the R package ‘amt' for implementing iSSA, from raw data through simulations from the mechanistic space-use model.

Brian Smith is a PhD student, co-advised by Tal Avgar and Dan MacNulty, studying the space-use ecology of northern Yellowstone elk and the feedbacks between space-use and demography. Brian is particularly interested in how density-dependent habitat selection interacts with predation risk and how animals balance this tradeoff between “many mouths to feed” and “safety in numbers”. His goal is to find insights from individual behavior that scale up to population- and community-level patterns.

Tal Avgar is an Assistant Professor of Movement Ecology in the Department of Wildland Resource and Ecology Center at Utah State University. Dr. Avgar’s research focuses on the ecological and evolutionary causes and consequences of animal movement behaviour. The premise behind Dr. Avgar’s research is that quantitative understanding of the processes underlying animal movement behaviours is essential, not only as means to identifying ecological needs and interactions at the individual level, but as a mechanistic key to emerging population and community patterns.

Movement Ecology. February 7, 2022

Recent developments in tracking technology have made it possible to collect high volumes of data on animal movement and behaviour, e.g., animal trajectories using GPS tags, or detailed activity profiles with accelerometers. Increasingly sophisticated statistical methods are required to obtain ecological inferences from these complex data (which often include autocorrelation, and can reach millions of observations). This webinar will provide a very brief overview of existing frameworks, and will then focus on one main theme: using location (long-lat) data to learn about animals’ behaviour. In particular, we will discuss how hidden Markov models (HMMs) can be used to draw inferences about the behavioural state process underlying observed movement patterns. The outcomes of an HMM analysis include movement parameters (such as mean step length) for each behavioural state, as well as an estimated state for each time of observation. It is also possible to estimate the effect of covariates (e.g., temperature, bathymetry) on the behavioural dynamics of the animal, which is often of great ecological interest. We will illustrate the application of this method with the R package momentuHMM, and discuss common practical challenges with model fitting. A secondary theme of this webinar will be the filtering and regularisation of animal tracking data. HMMs assume that animal locations are observed at regular time intervals and with no error. When this assumption is not satisfied, a two-stage approach is typically applied, and we will demonstrate this using the R packages foieGras and crawl.

Théo Michelot is a postdoctoral researcher in statistics at the Centre for Research into Ecological and Environmental Modelling (CREEM) at the University of St. Andrews. Dr. Michelot is developing flexible stochastic differential equation models, and using them as continuous-time models of animal movement and behaviour. Additional research interests include hidden Markov models and applications in ecology and statistical software development.

Generalized Joint Attribute Modeling (GJAM). January 24, 2022

  • A recording of the presentation and Q&A are here:
    • Tong walked through the code at time 17:09 and the Q&A starts at time 52:08.
  • R code and Tong’s presentation slides are available on GitHub HERE. See slide 19 for additional resources and references
  • gjam Vignettes
  • Example of model and prediction on multiple species group:

The Generalized Joint Attribute Model (GJAM) is a probabilistic framework that allows combinations of presence-absence, ordinal, continuous, discrete, composition, zero-inflated, and censored data.  The gjam R package provides inference on sensitivity to input variables, correlations between responses, model selection, prediction of responses, inverse prediction of predictors, and community classification by response to predictors. This model is useful for creating probabilistic forecasts of species distribution and abundance that incorporate a wide range of ecological data and can accommodate massive zeros by relying on censoring.

Tong Qiu is a Postdoc Associate at Duke University.  Dr. Qiu’s research aims to understand how the function and structure of the terrestrial ecosystem respond to global environmental changes at regional to global scales. He uses a data-model synthesis approach that integrates satellite and airborne remote sensing, monitoring networks, and forest inventory with Bayesian hierarchical models. Dr. Qiu uses GJAM to model responses of 1) forest trees and 2) ground beetles to climate habitat interactions.

Generalized Additive Models (GAMs). January 3, 2022

  • A recording of the presentation and Q&A are here:
    • Gavin’s walk-through of the R code starts at time 33:49 and the Q&A starts at time 1:19:10.
  • R code, slides, resources, and answers to the questions we didn’t get to in the Q&A and the R code shared by Skip are available on GitHub HERE. A quick link to the presentation slides are HERE.

Generalized Additive Models were introduced as an extension to linear and generalized linear models, where the relationships between the response and covariates are not specified up-front by the analyst but are learned from the data themselves. This learning is achieved by viewing the effect of a covariate on the response as a smooth function, rather than following a fixed form (linear, quadratic, etc). The smooth functions are represented in the GAM using penalized splines, in which a penalty against fitting overly-complex functions is employed. GAMs are most useful when the relationships between covariates and response are non linear, and GAMs have found particular use for modelling inter alia spatiotemporal data.

The presentation will briefly explain what a GAM is and how penalized splines work before focusing on the practical aspects of fitting GAMs to data using the mgcv R package, and will be most useful to ecologists who already have some familiarity with linear and generalized linear models.

Gavin Simpson is an Assistant Professor in the Department of Animal Science at Aarhus University. Dr. Simpson’s research uses approaches to modelling large regional to global spatio-temporal data sets using generalized additive models (GAMs) and functional statistical methods to examine broad ecosystem responses to environmental change. He is an active member of the R and Data Science communities and was a lead developer on the vegan package for multivariate data analysis and wrote the permute package for restricted permutation tests that allow multi-species data analyses from complex experimental designs. Dr. Simpson is currently developing a package, gratia, to work with GAMs fitted in R.

Species Archetype Models and Regions of Common Profile Models. December 6, 2021

  • A recording of the presentation and Q&A here:
    • Skip’s walk-through of the R code for SAMs starts at time 21:54 and the R code for RCPs starts at time 58:26.
  • Resources and R code shared by Skip are available on GitHub HERE.

Dr. Woolley will present two types of finite mixture models, that extend GLMs by allowing for multiple components. Specifically, he will present on Species Archetype Models (SAM; Dunstan et al. 2011) and the Region of Common Profile models (RCP; Foster et al. 2013, 2017). Together, these approaches cover inferential situations where understanding joint responses of species are of primary importance (SAMs) or when managing groups of sites are of primary importance (RCPs). Species Archetype Models (SAMs) are a “Mixture-of-regressions”, and describe how a homogeneous group of species varies with the environment. The environmental gradients are represented by covariates in the model. Regions of Common Profile (RCP) models are a type of ‘Mixture-of-Experts Models’ and try to describe how groups of sites vary with the environment. The sites are grouped based on the profile of biological content at the sites, with sites that have relatively similar observed assemblages are grouped together. The RCPs are defined by estimating how these groups vary with environment.

Skip Woolley is a research fellow at the University of Melbourne working on Integrated Environmental Assessment Modelling and he is a visiting
scientist at CSIRO. His research focuses on the development, implementation and interpretation of statistical modelling for integrated environmental risk assessment. Dr. Woolley’s research also focuses on understanding how biodiversity interacts with economic, social and environmental drivers of human activities and pressures, to better protect and reduce the risk of biodiversity loss into the future.

Mixed Models. November 1, 2021

 “Mixed models” refers to a broad class of statistical models that extend linear and generalized linear models to handle data where observations are measured within discrete groups such as field sites; years or other temporal blocks; individuals that are observed multiple times; genotypes; species; etc. They can be thought of (equivalently)
as (1) accounting for the correlation among observations from the same group; (2) estimating the variability among groups, or (3) parsimoniously estimating the effects of groups. They are most useful when the experimental or observational design includes a large number of groups with varying numbers of observations per group.

This presentation will be most useful to ecologists who already have some familiarity with linear and generalized linear models.

Ben Bolker is the Director of the School for Computational Science and Engineering and Acting Associate Chair for Mathematics at McMaster University. His interests include spatial, theoretical, mathematical, computational and statistical ecology, evolution and epidemiology, plant community, ecosystem, and epidemic dynamics. He has two books, including Ecological Models and Data in R, and is the co-author of a Very Short Introduction to Infectious Disease with Marta Wayne. Dr. Bolker maintains a popular GLMM FAQ, and keeps miscellaneous mixed models resources here.