Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Audio and video wake-up method, system and device, and storage medium

A kind of audio and video, voice wake -up technology, is used in the field of equipment and storage medium, system, audio and video wake -up method, which can solve problems such as low wake -up rate and decline in the performance of the system.

Pending Publication Date: 2021-09-14
UNIV OF SCI & TECH OF CHINA
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] In summary, the existing HMM-GMM-based voice wake-up and deep learning-based voice wake-up schemes, the performance of the voice wake-up system will drop sharply in real complex environments, especially in noisy and far-field environments. The wake-up rate is still relatively low. Low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Audio and video wake-up method, system and device, and storage medium
  • Audio and video wake-up method, system and device, and storage medium
  • Audio and video wake-up method, system and device, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0037] The embodiment of the present invention provides an audio and video wake-up method based on teacher-student cross-modal learning. Compared with the single-mode voice wake-up model, the system performance of this method is superior in complex environments such as high noise. At the same time, compared with only using multi-mode The performance of the audio and video wake-up model system obtained by random initialization training of dynamic aud...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an audio and video wake-up method, system and device, and a storage medium, which introduces a video mode to improve and enhance the performance of a wake-up system, can adapt to a wake-up task in a real complex scene, improves the wake-up rate, and improves the interaction experience. moreover, aiming at the characteristic that the audio and video multi-mode wake-up data volume is relatively small, the invention provides effective information which is obtained by using a cross-mode-based teacher-student model and migrating and utilizing abundant large-data-volume single-mode acoustic data training, so system performance loss caused by relatively small multi-mode audio and video wake-up training data volume is ameliorated, and the wake-up rate is improved.

Description

technical field [0001] The present invention relates to the technical fields of voice signal processing and video signal processing, in particular to an audio and video wake-up method, system, device and storage medium. Background technique [0002] Voice wake-up, also known as wake-up word recognition technology, is a special speech recognition technology that aims to detect specific segments of speakers in real-time in continuous speech streams. It has been used in scenarios such as smart vehicles, service robots, and smart homes. widely used. Audio-video wake-up aims to further improve the performance of the wake-up model by using a video signal synchronized with speech as an auxiliary input. Voice wake-up technology based on deep learning is currently a hotspot and mainstream method in academia and industry research. Its research can be divided into two categories: voice wake-up based on HMM-GMM and voice wake-up based on deep learning. [0003] 1. Voice wake-up based ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L15/16G10L15/22G10L15/26G06K9/62G06N3/04G06N3/08
CPCG10L15/063G10L15/16G10L15/22G10L15/26G06N3/08G10L2015/223G06N3/047G06N3/045G06F18/2415G06F18/241
Inventor 周恒顺杜俊
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products