A
server in a network gathers
textual information, such as news items, E-mail and the like. From that information, the
server develops or identifies messages for use by individual subscribers. The same
server that accumulates the text messages or another server in the network converts the
textual information in each message to a sequence of speech synthesizer instructions. The converted messages, containing the sequences of speech synthesizer instructions, are transmitted to each identified subscriber's terminal device. A synthesizer in the terminal generates an audio waveform
signal, representing the speech information, in response to the instructions. In the preferred embodiment, the terminals utilize concatenative type speech synthesizers, each of which has an associated vocabulary of stored fundamental sound samples. The instructions identify the sound samples, in order. The instructions also provide parameters for controlling characteristics of the
signal generated during waveform synthesis for each sound sample in each sequence. For example, the instructions may specify the
pitch, duration, amplitude,
attack envelope and decay envelope for each sample. The division of the
text to speech synthesis processing between the server and the terminals places the cost of the front end
processing in the server, which is a
shared resource. As a result, the hardware and
software of the terminal may be relatively simple and inexpensive. Also, it is possible to
upgrade the quality of the synthesis by upgrading the server
software, without modifying the terminals.