Abstract
Recent advances in Machine Learning, including Deep Neural Networks (DNNs), have resulted in Automated Speech Recognition (ASR) systems with highly accurate transcription abilities and relevance to the second language (L2) speech field. However, the implementation of such systems in mainstream applied linguistics research remains rare, as the relationship between human perception of L2 intelligibility and DNN-based measurement of L2 speech is underexplored. This chapter introduces recent DNN-based ASR technology and provides insights into its alignment with human perception by comparing leading models to human perception. Analyses of speech samples produced by 63 English as a Second Language (ESL) learners indicate that current ASR models only moderately align with human perception (0.17 < r 2 < 0.46), and the most advanced models transcribe with greater accuracy than humans after one listening. Upon further investigation, most features of L2 pronunciation that are known to inhibit intelligibility result in similar perceptual difficulties for human and DNN-based ASR, except for vowel insertion, which reduces explanatory ability by about 8%. Implications and tools for applied linguistics research involving L2 speech, transcription, and intelligibility measurement are provided for DNN-based measurement of L2 production.
Original language | English (US) |
---|---|
Title of host publication | Routledge Handbook of Technological Advances in Researching Language Learning |
Publisher | Taylor and Francis |
Pages | 465-478 |
Number of pages | 14 |
ISBN (Electronic) | 9781040165409 |
ISBN (Print) | 9781032604312 |
DOIs | |
State | Published - Jan 1 2024 |
ASJC Scopus subject areas
- General Arts and Humanities
- General Social Sciences