The application provides a speech
separation method, a speech separation device,
electronic equipment, and a storage medium. The speech
separation method includes: obtaining the original audio, and extracting the
spectrogram feature sequence from the original audio in a time window sliding window; The graph feature sequence is input into the pre-trained
speech segmentation model, and the embedded feature sequence is obtained through the
speech segmentation model; the embedded feature sequence is input into the pre-trained speech clustering model, and the corresponding embedding feature sequence is obtained through the speech clustering model The predicted
label sequence; perform single-speaker voice restoration based on the predicted
label sequence to generate separated speech. According to the speech
separation method, speech separation device,
electronic equipment and storage medium of the present application, the problem of unsatisfactory speech separation effect can be solved, and the speech segment belonging to a single speaker can be separated from the short-term speech audio file in which multiple people speak alternately, And it can accurately estimate the number of speakers in conjunction with
contextual information.