Abstract
Forced alignment is an automatic speech recognition procedure frequently employed in the speech sciences. A typical forced aligner, like the Penn Forced Aligner takes two inputs, the audio file which contains the speech, and the text file with the transcription of the speech. The aligner produces a PRAAT TextGrid file as output, which contains the alignment in typically two tiers, the word level, and the phone level. Within the alignment script, the Penn Aligner calls the function HVite. The original output of HVite is in the form of a MLF file, which contains, among other things the probability score which the Viterbi algorithm computes while aligning the file. In order to test the hypothesis that the log likelihood scores obtained from forced aligners can be employed to tell us about phonetic distances, these scores need to be compared with an existing measure of distance. These probability measures seem to show a relationship with some acoustic characteristics of the segments. It is clear that these scores behave differently with different training, and this relationship may be exploitable to assess typicality of sounds within a given corpus context.
Original language | English (US) |
---|---|
Pages (from-to) | 232-233 |
Number of pages | 2 |
Journal | Canadian Acoustics - Acoustique Canadienne |
Volume | 44 |
Issue number | 3 |
State | Published - Sep 2016 |
Externally published | Yes |
ASJC Scopus subject areas
- Acoustics and Ultrasonics