Skip to main content
Browse by:
GROUP

Surveying the World's Biodiversity with DNA, Audio, Image, Machine Learning and Statistics

Otso Ovaskainen
Tuesday, April 29, 2025
10:00 am - 12:00 pm
Otso Ovaskainen, Professor of Mathematical and Statistical Ecology, University of Jyvaskyla

Traditional expert-based biodiversity surveys are increasingly replaced by semi-automated sensors that produce completely new types and volumes of data. As one example, our ongoing ERC-synergy project LIFEPLAN (PIs: Otso Ovaskainen, David Dunson, Tomas Roslin & Brian Fisher) maps global biodiversity with DNA-, image-, and audio-based sampling technologies. The project has generated >100 years of audio, >10M camera-trap images, and >10 billion DNA metabarcoding sequences. These new types of data provide completely new challenges for data processing and interpretation, the solving of which has motivated the close collaboration between statisticians and ecologists in LIFEPLAN. To convert the information in raw DNA-, audio- and image samples to data on species-level occurrences and/or abundances, we have improved automated methods for probabilistic species classification. One particular challenge concerning the species rich groups of fungi and arthropods is that most species are still unknown to science and thus lacking from any reference databases. To address this challenge, we have developed classifiers that robustly assign DNA sequencies to earlier known species, to novel species, or highlight them as uncertain cases that may represent either earlier known or novel species. To generate ecological insights and to make predictions on how species communities respond to the ongoing global change, we have developed joint species distribution models that integrate the classified biodiversity data with spatiotemporal predictors. One particular challenge in this line of research is the high dimensionality of the multivariate response (the LIFEPLAN data contains detection of ca. 1 million species), combined with the extreme sparseness of the data (most species have been observed only in one sample). To address this challenge, we have solved earlier computational bottlenecks of by combining high-performance computing with common to rare species transfer learning, enabling models to be fitted to full biodiversity data. The methodological advances generated by the LIFEPLAN project have already contributed to more broadly to the society, including e.g. a new smartphone-based approach to citizen science.

Type: LECTURE/TALK
Contact: Lori Rauch