The application provides a speech separation method, a speech separation device, electronic equipment, and a storage medium. The speech separation method includes: obtaining the original audio, and extracting the spectrogram feature sequence from the original audio in a time window sliding window; The graph feature sequence is input into the pre-trained speech segmentation model, and the embedded feature sequence is obtained through the speech segmentation model; the embedded feature sequence is input into the pre-trained speech clustering model, and the corresponding embedding feature sequence is obtained through the speech clustering model The predicted label sequence; perform single-speaker voice restoration based on the predicted label sequence to generate separated speech. According to the speech separation method, speech separation device, electronic equipment and storage medium of the present application, the problem of unsatisfactory speech separation effect can be solved, and the speech segment belonging to a single speaker can be separated from the short-term speech audio file in which multiple people speak alternately, And it can accurately estimate the number of speakers in conjunction with contextual information.