The invention proposes a voice processing method and device based on artificial intelligence, and the method comprises the steps: voice for segmentation, forming a plurality of voice segments, recognizing each voice segment, obtaining a recognition text segment of each voice segment, determining an original text segment of a current recognition text segment from an original text corresponding to the current recognition text segment according to the sequence of recognition text segments, splicing the original text segment and the voice segments corresponding to an original text segment, obtaining a sentence text and sentence voice corresponding to the sentence text, generating the pinyin of the sentence text, forming a phone sequence according to the pinyin, enabling the phone sequence andthe sentence voice to be aligned, obtaining a phone boundary, and forming target data for the training of the voice synthesis model through the sentence text, sentence voice, pinyin and phone boundary. Therefore, the method achieves the automatic segmentation and marking of the voice, and forms the marking data which is higher in accuracy and is used for training the voice synthesis model.