Abstract
The performance of machine learning classifiers in automatically scoring the English proficiency of unconstrained speech has been explored. Suprasegmental measures were computed by software, which identifies the basic elements of Brazil’s model in human discourse. This paper explores machine learning training with multiple corpora to improve two of those algorithms: prominent syllable detection and tone choice classification. The results show that machine learning training with the Boston University Radio News Corpus can improve automatic English proficiency scoring of unconstrained speech from a Pearson’s correlation of 0.677–0.718. This correlation is higher than any other existing computer programs for automatically scoring the proficiency of unconstrained speech and is approaching that of human raters in terms of inter-rater reliability.
Original language | English (US) |
---|---|
Pages (from-to) | 755-768 |
Number of pages | 14 |
Journal | International Journal of Speech Technology |
Volume | 19 |
Issue number | 4 |
DOIs | |
State | Published - Dec 1 2016 |
Keywords
- Automated proficiency scoring
- Boston University Radio News Corpus
- Brazil’s prosody model
- Genetic algorithm feature selection
- Multiple corpora training
- Suprasegmental measures
ASJC Scopus subject areas
- Software
- Language and Linguistics
- Human-Computer Interaction
- Linguistics and Language
- Computer Vision and Pattern Recognition