The present invention provides, among other things, automatic identification of near-redundant units in a large TTS voice table, identifying which units are distinctive enough to keep and which units are sufficiently redundant to discard. According to an aspect of the invention,
pruning is treated as a clustering problem in a suitable feature space. All instances of a given unit (e.g. word or characters expressed as
Unicode strings) are mapped onto the feature space, and cluster units in that space using a suitable
similarity measure. Since all units in a given cluster are, by construction, closely related from the point of view of the measure used, they are suitably redundant and can be replaced by a single instance. The disclosed method can detect near-redundancy in TTS units in a completely unsupervised manner, based on an original
feature extraction and clustering strategy. Each unit can be processed in parallel, and the
algorithm is totally scalable, with a
pruning factor determinable by a user through the near-redundancy criterion. In an exemplary implementation, a matrix-style
modal analysis via
Singular Value Decomposition (SVD) is performed on the matrix of the observed instances for the given word unit, resulting in each row of the matrix associated with a
feature vector, which can then be clustered using an appropriate closeness measure.
Pruning results by mapping each instance to the
centroid of its cluster.