The development of synthetic child speech in three South African languages

Camryn Terblanche, Tyler T. Schnoor, Michal Harty, Benjamin V. Tucker

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

It is well-known that children with expressive communication difficulties have the right to communicate, but they should also have the right to do so in whichever language they choose, with a voice that closely matches their age, gender, and dialect. This study aimed to develop naturalistic synthetic child speech, matching the vocal identity of three children with expressive communication difficulties, using Tacotron 2, for three under-resourced South African languages, namely South African English (SAE), Afrikaans, and isiXhosa. Due to the scarcity of child speech corpora, 2 hours of child speech data per child was collected from three 11- to 12-year-old children. Two adult models were used to “warm start” the child speech synthesis. To determine the naturalness of the synthetic voices, 124 listeners participated in a mean opinion score survey (Likert Score) and optionally gave qualitative feedback. Despite limited training data used in this study, we successfully developed a synthesized child voice of adequate quality in each language. This study highlights that with recent technological advancements, it is possible to develop synthetic child speech that matches the vocal identity of a child with expressive communication difficulties in different under-resourced languages.

Original languageEnglish (US)
JournalAAC: Augmentative and Alternative Communication
DOIs
StateAccepted/In press - 2024

Keywords

  • Augmentative and alternative communication (AAC)
  • children
  • expressive communication difficulties
  • speech synthesis
  • Tacotron 2
  • under-resourced languages

ASJC Scopus subject areas

  • Rehabilitation
  • Speech and Hearing

Fingerprint

Dive into the research topics of 'The development of synthetic child speech in three South African languages'. Together they form a unique fingerprint.

Cite this