The invention discloses a voice adaptive completion system based on a multi-modal knowledge graph. The system comprises a data receiver, a data analyzer and a data inference device. The data receiver preprocesses received audio and video data and outputs the audio and video data to the data analyzer; the data analyzer analyzes the voice and the image to extract waveform time sequence features and lip track features, and a phoneme sequence is obtained through multi-mode joint representation; and the data inference device carries out domain session modeling and candidate text prediction according to historical texts, text inference is carried out in combination with a phoneme sequence, statements with semantics are obtained, and complemented voice is synthesized according to waveform features. According to the invention, through a phoneme reasoning model, phoneme recognition is carried out when the voice modality is lost, the domain session modeling is carried out on the historical text generated by the existing voice according to the semantic relationship between the entities in the multi-modal knowledge graph, so that reasoning is carried out to generate the text with semantic, the voice is synthesized in combination with the waveform characteristics of the user voice, and the complemented audio is formed.