Title: Prosody
Generation In Malay Language Speech Synthesizer
Abstract: This paper
describes the development of a Text-to-Speech (TTS) system for Malay language
using intonation generation. The system was developed based on two
methods: concatenation of diphones waveforms and a prosodic model selection
to control fundamental frequency and duration. The prosodic generation is
part of the speech control module, which carries out the interface function,
bridging the gap between the output of the block of text linguistic processing
and the input of speech signal generation module. As a result, each voice
segment (syllable) in a word being synthesized, is attributed to a set of
pitch target values. Signal generation is implemented according to the prosody
phrasing stream, which describes the phrase as a sequence of diphone phoneme
codes with assigned duration and fundamental frequency values. To transform
the base diphones to the required prosodic values, procedures used are close
to Multi Band Resynthesis Overlap and Add (MBROLA) interface. The key steps
in prosody generation based on MBROLA technology for concatenation TTS system
is also described. Attention is paid to the ways of increasing naturalness
of synthesized speech.
Authors: Siew Hock Ow , Roziati Zainuddin and
David Wai Keong Loo