TY - JOUR
T1 - The effect of soundscape composition on bird vocalization classification in a citizen science biodiversity monitoring project
AU - Clark, Matthew L.
AU - Salas, Leonardo
AU - Baligar, Shrishail
AU - Quinn, Colin A.
AU - Snyder, Rose L.
AU - Leland, David
AU - Schackwitz, Wendy
AU - Goetz, Scott J.
AU - Newsam, Shawn
N1 - Funding Information:
The Soundscapes to Landscapes project was funded by NASA's Citizen Science for Earth Systems Program (CSESP) 16-CSESP 2016-0009 under cooperative agreement 80NSSC18M0107 . SB and SN were supported in part by a Global Wildlife Conservation and Microsoft AI for Earth Innovation grant ( AI-201909-1413 ). We are grateful for the hundreds of citizen scientists who participated in Soundscapes to Landscapes from 2017 to 2021. The following citizen scientists are being recognized by name for each of their expert contributions of over 100 volunteer hours spent working on the project: Taylour Stephens (279 h), Jade Spector (208 h), Tiffany Erickson (190 h), Teresa Tuffli (140 h), Miles Tuffli (129 h), Katie Clas (121h), and Bob Hasenick (119 h). We thank Pepperwood Preserve, Sonoma County Agricultural Preservation and Open Space District, Sonoma Land Trust, Audubon Canyon Ranch, Sonoma County Regional Parks, California State Parks, and other property owners for site access. We also thank Sieve Analytics for implementing the Arbimon citizen science interface used in this project. Point Blue Conservation Science's Informatics team helped design and construct cloud-based prediction databases. Computational analyses were run on Northern Arizona University's Monsoon computing cluster, funded by Arizona's Technology and Research Initiative Fund . We also thank xeno-canto for the open-access use of their recordings. Finally, we thank three anonymous reviewers who helped us improve this paper with their insightful comments.
Publisher Copyright:
© 2023
PY - 2023/7
Y1 - 2023/7
N2 - There is a need for monitoring biodiversity at multiple spatial and temporal scales to aid conservation efforts. Autonomous recording units (ARUs) can provide cost-effective, long-term and systematic species monitoring data for sound-producing wildlife, including birds, amphibians, insects and mammals over large areas. Modern deep learning can efficiently automate the detection of species occurrences in these sound data with high accuracy. Further, citizen science can be leveraged to scale up the deployment of ARUs and collect reference vocalizations needed for training and validating deep learning models. In this study we develop a convolutional neural network (CNN) acoustic classification pipeline for detecting 54 bird species in Sonoma County, California USA, with sound and reference vocalization data collected by citizen scientists within the Soundscapes to Landscapes project (www.soundscapes2landscapes.org). We trained three ImageNet-based CNN architectures (MobileNetv2, ResNet50v2, ResNet100v2), which function as a Mixture of Experts (MoE), to evaluate the usefulness of several methods to enhance model accuracy. Specifically, we: 1) quantify accuracy with fully-labeled 1-min soundscapes for an assessment of real-world conditions; 2) assess the effect on precision and recall of additional pre-training with an external sound archive (xeno-canto) prior to fine-tuning with vocalization data from our study domain; and, 3) assess how detections and errors are influenced by the presence of coincident biotic and non-biotic sounds (i.e., soundscape components). In evaluating accuracy with soundscape data (n = 37 species) across CNN probability thresholds and models, we found acoustic pre-training followed by fine-tuning improved average precision by 10.3% relative to no pre-training, although there was a small average 0.8% reduction in recall. In selecting an optimal CNN architecture for each species based on maximum F(β = 0.5), we found our MoE approach had total precision of 84.5% and average species precision of 85.1%. Our data exhibit multiple issues arising from applying citizen science and acoustic monitoring at the county scale, including deployment of ARUs with relatively low fidelity and recordings with background noise and overlapping vocalizations. In particular, human noise was significantly associated with more incorrect species detections (false positives, decreased precision), while physical interference (e.g., recorder hit by a branch) and geophony (e.g., wind) was associated with the classifier missing detections (false negatives, decreased recall). Our process surmounted these obstacles, and our final predictions allowed us to demonstrate how deep learning applied to acoustic data from low-cost ARUs paired with citizen science can provide valuable bird diversity data for monitoring and conservation efforts.
AB - There is a need for monitoring biodiversity at multiple spatial and temporal scales to aid conservation efforts. Autonomous recording units (ARUs) can provide cost-effective, long-term and systematic species monitoring data for sound-producing wildlife, including birds, amphibians, insects and mammals over large areas. Modern deep learning can efficiently automate the detection of species occurrences in these sound data with high accuracy. Further, citizen science can be leveraged to scale up the deployment of ARUs and collect reference vocalizations needed for training and validating deep learning models. In this study we develop a convolutional neural network (CNN) acoustic classification pipeline for detecting 54 bird species in Sonoma County, California USA, with sound and reference vocalization data collected by citizen scientists within the Soundscapes to Landscapes project (www.soundscapes2landscapes.org). We trained three ImageNet-based CNN architectures (MobileNetv2, ResNet50v2, ResNet100v2), which function as a Mixture of Experts (MoE), to evaluate the usefulness of several methods to enhance model accuracy. Specifically, we: 1) quantify accuracy with fully-labeled 1-min soundscapes for an assessment of real-world conditions; 2) assess the effect on precision and recall of additional pre-training with an external sound archive (xeno-canto) prior to fine-tuning with vocalization data from our study domain; and, 3) assess how detections and errors are influenced by the presence of coincident biotic and non-biotic sounds (i.e., soundscape components). In evaluating accuracy with soundscape data (n = 37 species) across CNN probability thresholds and models, we found acoustic pre-training followed by fine-tuning improved average precision by 10.3% relative to no pre-training, although there was a small average 0.8% reduction in recall. In selecting an optimal CNN architecture for each species based on maximum F(β = 0.5), we found our MoE approach had total precision of 84.5% and average species precision of 85.1%. Our data exhibit multiple issues arising from applying citizen science and acoustic monitoring at the county scale, including deployment of ARUs with relatively low fidelity and recordings with background noise and overlapping vocalizations. In particular, human noise was significantly associated with more incorrect species detections (false positives, decreased precision), while physical interference (e.g., recorder hit by a branch) and geophony (e.g., wind) was associated with the classifier missing detections (false negatives, decreased recall). Our process surmounted these obstacles, and our final predictions allowed us to demonstrate how deep learning applied to acoustic data from low-cost ARUs paired with citizen science can provide valuable bird diversity data for monitoring and conservation efforts.
KW - ARU
KW - Automated recording units
KW - Avian diversity
KW - Bird species classification
KW - BirdNET
KW - CNN
KW - Citizen science
KW - Convolutional neural networks
KW - Ecoacoustics
KW - Mixture of experts (MoE)
KW - Soundscape components
KW - Soundscapes to landscapes
UR - http://www.scopus.com/inward/record.url?scp=85150805810&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85150805810&partnerID=8YFLogxK
U2 - 10.1016/j.ecoinf.2023.102065
DO - 10.1016/j.ecoinf.2023.102065
M3 - Article
AN - SCOPUS:85150805810
SN - 1574-9541
VL - 75
JO - Ecological Informatics
JF - Ecological Informatics
M1 - 102065
ER -