Method and device for intercepting voice of a target person in a video

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A video-in-target technology, applied in instruments, character and pattern recognition, electrical components, etc., can solve the problems of high audio clarity, difficult speech interception, and low speech interception efficiency.

Active Publication Date: 2019-06-18

SPEAKIN TECH CO LTD

View PDF12 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The embodiment of the present application provides a method and device for intercepting the voice of a target person in a video, which solves the problem that the current voice separation algorithm has high requirements on the clarity of the audio, and needs to perform noise reduction processing on the audio before performing voice separation. In a noisy environment, the impact of noise is large, the difficulty of voice interception exists, and the technical problems of low efficiency of voice interception

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0039] In order to enable those skilled in the art to better understand the solution of the application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

[0040] This application designs a method and device for intercepting the voice of the target person in the video, which solves the problem that the current voice separation algorithm has high requirements on the clarity of the audio. In the environment, the influence of noise is great, the difficulty of voice interception exists, and the technical problems of low efficiency of voice in...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention discloses a method and a device for intercepting voice of a target person in a video. The method comprises the following steps of using a lip-shaped voice activity detection model, giving a first mark to a video frame, subjected to voice activity, of a target person in the audio and video file, a second mark is given to the video frame, not subjected to the voice activity, of the target person; obtaining a first marker sequence, continuously setting a preset number of first start-stop time points of the video frames containing the first mark in the first mark sequence; determining a second start-stop time point of a corresponding voice frame in the audio and video file, Therefore, the corresponding voice segment in the audio and video file is directly intercepted according to the second start-stop time point. According to the method and the device, the voice segment file of the target person is obtained through the human-voice separation algorithm, human-voice separation is realized, and the technical problems that the requirement of the current human-voice separation algorithm on the definition of audio is high, the audio needs to be subjected to noise reduction processing first and then subjected to human-voice separation, the noise influence is large in a noisy environment, the voice interception difficulty is high, and the voice interceptionefficiency is low are solved.

Description

technical field [0001] The present application relates to the technical field of speech processing, in particular to a method and device for intercepting the speech of a target person in a video. Background technique [0002] When the public security conducts voiceprint identification, it is necessary to compare the voiceprint of the suspect's voice. When extracting the voiceprint, some collected audio files have a noisy recording environment and many speakers. It is necessary to separate the human voice in the audio. To get the voice of the target person. At present, there is a special vocal separation algorithm, but it has high requirements on the clarity of the audio. It is necessary to perform noise reduction processing on the audio before performing vocal separation. In a noisy environment, the impact of noise is large, and it is difficult to intercept speech. The technical problem of low efficiency of voice interception. Contents of the invention [0003] The embod...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): H04N21/439H04N21/845G06K9/62G06K9/00

Inventor 郑棉洲吕莉丽

Owner SPEAKIN TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method and device for intercepting voice of a target person in a video

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology