TY - JOUR
T1 - Comparison of algorithms to divide noisy phone sequences into syllables for automatic unconstrained English speaking proficiency scoring
AU - Johnson, David O.
AU - Kang, Okim
N1 - Funding Information:
The authors would like to thank Michael Albanese, Tory Bottiglieriy, Trent Coopery, Drew McDaniely, and Adam Thomas for developing the initial prototypes of the HMM, k-means, and genetic algorithm methods of syllabification as part of their senior Computer Science Capstone project at Northern Arizona University.
Publisher Copyright:
© 2017, Springer Science+Business Media B.V., part of Springer Nature.
PY - 2019/10/1
Y1 - 2019/10/1
N2 - Four algorithms for syllabifying phones are compared in automatically scoring English oral proficiency. The first algorithm clusters consonants into groups with the vowel nearer to them temporally, taking into account the maximal onset principle. A Hidden Markov Model (HMM) predicts the syllable boundaries based on their sonority value in the second algorithm. The third one employs three HMMs which are tuned to specific categories of utterances. The final algorithm uses a genetic algorithm to identify a set of rules for syllabifying the phones. They were evaluated by: (1) how well they syllabified utterances from the Boston University Radio News Corpus (BURNC) and (2) how well they worked as part of a process to automatically score English speaking proficiency. A measure of the temporal alignment of the syllables was utilized to judge how satisfactorily they syllabified utterances. Their suitability in the proficiency process was assessed with the Pearson correlation between the computer’s predicted proficiency scores and the scores determined by human examiners. We found that syllabification-by-genetic-algorithm performed the best in syllabifying the BURNC, but that syllabification-by-grouping (i.e., syllables are made by grouping non-syllabic consonant phones with the vowel or syllabic consonant phone nearest to them with respect to time) performed the best in the English oral proficiency rating application.
AB - Four algorithms for syllabifying phones are compared in automatically scoring English oral proficiency. The first algorithm clusters consonants into groups with the vowel nearer to them temporally, taking into account the maximal onset principle. A Hidden Markov Model (HMM) predicts the syllable boundaries based on their sonority value in the second algorithm. The third one employs three HMMs which are tuned to specific categories of utterances. The final algorithm uses a genetic algorithm to identify a set of rules for syllabifying the phones. They were evaluated by: (1) how well they syllabified utterances from the Boston University Radio News Corpus (BURNC) and (2) how well they worked as part of a process to automatically score English speaking proficiency. A measure of the temporal alignment of the syllables was utilized to judge how satisfactorily they syllabified utterances. Their suitability in the proficiency process was assessed with the Pearson correlation between the computer’s predicted proficiency scores and the scores determined by human examiners. We found that syllabification-by-genetic-algorithm performed the best in syllabifying the BURNC, but that syllabification-by-grouping (i.e., syllables are made by grouping non-syllabic consonant phones with the vowel or syllabic consonant phone nearest to them with respect to time) performed the best in the English oral proficiency rating application.
KW - ASR phone recognition
KW - Automatic speaking proficiency scoring
KW - Automatic syllabification
KW - Maximal onset principle
KW - Sonority sequencing principle
UR - http://www.scopus.com/inward/record.url?scp=85034664551&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85034664551&partnerID=8YFLogxK
U2 - 10.1007/s10462-017-9594-y
DO - 10.1007/s10462-017-9594-y
M3 - Article
AN - SCOPUS:85034664551
SN - 0269-2821
VL - 52
SP - 1781
EP - 1804
JO - Artificial Intelligence Review
JF - Artificial Intelligence Review
IS - 3
ER -