The embodiment of the invention discloses a method and a device for intercepting voice of a target person in a video. The method comprises the following steps of using a lip-shaped
voice activity detection model, giving a first mark to a video frame, subjected to
voice activity, of a target person in the audio and video file, a second mark is given to the video frame, not subjected to the
voice activity, of the target person; obtaining a first marker sequence, continuously setting a preset number of first start-
stop time points of the video frames containing the first mark in the first mark sequence; determining a second start-
stop time point of a corresponding voice frame in the audio and video file, Therefore, the corresponding voice segment in the audio and video file is directly intercepted according to the second start-
stop time point. According to the method and the device, the voice segment file of the target person is obtained through the human-voice
separation algorithm, human-voice separation is realized, and the technical problems that the requirement of the current human-voice
separation algorithm on the definition of audio is high, the audio needs to be subjected to
noise reduction
processing first and then subjected to human-voice separation, the
noise influence is large in a noisy environment, the voice interception difficulty is high, and the voice interceptionefficiency is low are solved.