The invention discloses a multi-to-multi speaker
conversion method based on STARGAN and an x vector, which comprises a training stage and a conversion stage, wherein a speech conversion
system is achieved by combining the STARGAN and the x vector, the personality similarity and quality of the converted speech can be greatly improved, particularly, for the short-time
utterance, the x vector has better characterization performance and better speech conversion quality can be achieved, meanwhile, the problem of over-
smoothing in C-VAE can be overcome, and a high-quality speech
conversion method isachieved. In addition, the method can achieve the speech conversion under the condition of non-parallel text, the training process does not need any alignment process, the universality and practicability of a speech conversion
system are improved, and the method can also achieve that the conversion
system with multiple source-target speaker pairs is integrated in a conversion model, namely, the multi-speaker-to-multi-speaker conversion is achieved, and the system has a better application prospect in the fields of cross-
language speech conversion, film dubbing,
speech translation and the like.