TY - JOUR
T1 - The development of synthetic child speech in three South African languages
AU - Terblanche, Camryn
AU - Schnoor, Tyler T.
AU - Harty, Michal
AU - Tucker, Benjamin V.
N1 - Publisher Copyright:
© 2024 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
PY - 2024
Y1 - 2024
N2 - It is well-known that children with expressive communication difficulties have the right to communicate, but they should also have the right to do so in whichever language they choose, with a voice that closely matches their age, gender, and dialect. This study aimed to develop naturalistic synthetic child speech, matching the vocal identity of three children with expressive communication difficulties, using Tacotron 2, for three under-resourced South African languages, namely South African English (SAE), Afrikaans, and isiXhosa. Due to the scarcity of child speech corpora, 2 hours of child speech data per child was collected from three 11- to 12-year-old children. Two adult models were used to “warm start” the child speech synthesis. To determine the naturalness of the synthetic voices, 124 listeners participated in a mean opinion score survey (Likert Score) and optionally gave qualitative feedback. Despite limited training data used in this study, we successfully developed a synthesized child voice of adequate quality in each language. This study highlights that with recent technological advancements, it is possible to develop synthetic child speech that matches the vocal identity of a child with expressive communication difficulties in different under-resourced languages.
AB - It is well-known that children with expressive communication difficulties have the right to communicate, but they should also have the right to do so in whichever language they choose, with a voice that closely matches their age, gender, and dialect. This study aimed to develop naturalistic synthetic child speech, matching the vocal identity of three children with expressive communication difficulties, using Tacotron 2, for three under-resourced South African languages, namely South African English (SAE), Afrikaans, and isiXhosa. Due to the scarcity of child speech corpora, 2 hours of child speech data per child was collected from three 11- to 12-year-old children. Two adult models were used to “warm start” the child speech synthesis. To determine the naturalness of the synthetic voices, 124 listeners participated in a mean opinion score survey (Likert Score) and optionally gave qualitative feedback. Despite limited training data used in this study, we successfully developed a synthesized child voice of adequate quality in each language. This study highlights that with recent technological advancements, it is possible to develop synthetic child speech that matches the vocal identity of a child with expressive communication difficulties in different under-resourced languages.
KW - Augmentative and alternative communication (AAC)
KW - children
KW - expressive communication difficulties
KW - speech synthesis
KW - Tacotron 2
KW - under-resourced languages
UR - http://www.scopus.com/inward/record.url?scp=85198095551&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85198095551&partnerID=8YFLogxK
U2 - 10.1080/07434618.2024.2374312
DO - 10.1080/07434618.2024.2374312
M3 - Article
AN - SCOPUS:85198095551
SN - 0743-4618
JO - AAC: Augmentative and Alternative Communication
JF - AAC: Augmentative and Alternative Communication
ER -