The invention relates to a voice-and-facial-expression-based identification method for dual-modal emotion fusion. The method comprises: S1, audio data and video data of a to-be-identified object are obtained; S2, a face expression image is extracted from the video data and segmentation of an eye region, a nose region, and a mouth region is carried out; S3, a facial expression feature in each regional image is extracted from images of the three regions; S4, PCA analysis and dimensionality reduction is carried out on voice emotion features and the facial expression features; and S5, naive Bayesian emotion voice classification is carried out on samples of two kinds of modes and decision fusion is carried out on a conditional probability to obtain a final emotion identification result. According to the invention, fusion of the voice emotion features and the facial expression features is carried out by using a decision fusion method, so that accurate data can be provided for corresponding conditional probability calculation carried out at the next step; and an emotion state of a detected object can be obtained precisely by using the method, so that accuracy and reliability of emotion identification can be improved.