Improved automatic English proficiency rating of unconstrained speech with multiple corpora

David O. Johnson, Okim Kang, Romy Ghanem

Research output: Contribution to journalArticlepeer-review

13 Scopus citations


The performance of machine learning classifiers in automatically scoring the English proficiency of unconstrained speech has been explored. Suprasegmental measures were computed by software, which identifies the basic elements of Brazil’s model in human discourse. This paper explores machine learning training with multiple corpora to improve two of those algorithms: prominent syllable detection and tone choice classification. The results show that machine learning training with the Boston University Radio News Corpus can improve automatic English proficiency scoring of unconstrained speech from a Pearson’s correlation of 0.677–0.718. This correlation is higher than any other existing computer programs for automatically scoring the proficiency of unconstrained speech and is approaching that of human raters in terms of inter-rater reliability.

Original languageEnglish (US)
Pages (from-to)755-768
Number of pages14
JournalInternational Journal of Speech Technology
Issue number4
StatePublished - Dec 1 2016


  • Automated proficiency scoring
  • Boston University Radio News Corpus
  • Brazil’s prosody model
  • Genetic algorithm feature selection
  • Multiple corpora training
  • Suprasegmental measures

ASJC Scopus subject areas

  • Software
  • Language and Linguistics
  • Human-Computer Interaction
  • Linguistics and Language
  • Computer Vision and Pattern Recognition


Dive into the research topics of 'Improved automatic English proficiency rating of unconstrained speech with multiple corpora'. Together they form a unique fingerprint.

Cite this