The invention discloses a
speech synthesis method and device,
electronic equipment and a storage medium, and relates to the technical field of
artificial intelligence such as
deep learning and speechtechnology. The method comprises steps: in a process of performing voice synthesis on a to-be-synthesized text, obtaining
timbre characteristics corresponding to a
user identifier in combination withthe
user identifier in a voice synthesis request, and obtaining at least one group of candidate
rhythm characteristics of the to-be-synthesized text in combination with the
user identifier; selectingone group from the at least one group of candidate
rhythm features as the
rhythm feature of the to-be-synthesized text; and performing voice synthesis according to the
timbre features, the to-be-synthesized text and the rhythm features to obtain a synthesized audio corresponding to the to-be-synthesized text. Therefore, the synthesized audio of the to-be-synthesized text is synthesized by combining the
timbre characteristics corresponding to the user identifier, the to-be-synthesized text and the rhythm characteristics, so that the obtained synthesized audio has the user voice characteristicscorresponding to the user identifier, the synthesized audio is more real and natural, and the voice synthesis effect is improved.