Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Monosyllabic language lip-reading recognition system based on vision character

A visual feature and recognition system technology, applied in the field of lip reading recognition system, to achieve the effect of sample diversification, rich content and strong practicability

Inactive Publication Date: 2008-01-09
HUAZHONG UNIV OF SCI & TECH
View PDF0 Cites 97 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The present invention provides a monosyllabic language lip-reading recognition system based on visual features, with the purpose of solving the problem of lip-reading recognition in monosyllabic languages ​​such as Chinese by using only video information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Monosyllabic language lip-reading recognition system based on vision character
  • Monosyllabic language lip-reading recognition system based on vision character
  • Monosyllabic language lip-reading recognition system based on vision character

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] As shown in FIG. 1 , the present invention includes a video decoding module 10 , a lip positioning module 20 , a lip movement segmentation module 30 , a feature extraction module 40 , a corpus 50 , a model building module 60 and a lip reading recognition module 70 .

[0051] The video decoding module 10 accepts a video file or device given by the user, decodes it, and obtains a sequence of image frames that can be used for processing in the present invention.

[0052] The lip location module 20 is used to analyze image frames in the video. It finds and locates the speaker's lip position from the video decoding module 10 . These position information need to be provided to the lip movement segmentation module 30 and feature extraction module 40 . The lip positioning module 20 first obtains a lip position vector, which contains 4 components, and each component is a coordinate value in a two-dimensional space, representing the left lip angle, right lip angle, upper lip verte...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

This system reads the lip movement of the video creature to recognize the speaking content. Its aim is to use the video info only to recognize the lip language of the single syllable word (SSW), e.g. in Chinese language. This invention includes the video demodulating module, the lip allocating module. The lip movement dividing module, the feature drawing module, the language material warehouse (LMW), the model establishing module and the lip language recognizing module. This LMW possesses rich contents and is easy to expand. This invention processes only video images and need not the audio data to help. It can process video files, e.g. avi, wmv, rmvb and mpg to meet the requirement of recognizing the talking content under soundless condition. The lip movement part in this invention aims SSW to handle intelligently dividing. Comparing with the solid length time dividing or the handwork dividing, this method is more practical and greatly raises the recognition accuracy.

Description

technical field [0001] The invention belongs to computer intelligent recognition technology, and in particular relates to a monosyllable language-oriented lip-reading recognition system based on visual features, which recognizes speech content according to lip movement changes of characters in a video when they speak. Background technique [0002] Since its birth in 1946, the computer has gone through the keyboard operation mode and the mouse operation mode, and entered the stage of natural human-computer interaction mode. In this context, speech recognition technology has developed rapidly in recent years, and human-computer interaction through speech is undoubtedly the most effective and fast way of interaction. "Speech recognition in noisy environments: a review" (Y.Cong.Speechrecognition in noisy environments: a survey[J].Speech Communication, 1995, 16: 261-291) analyzed the ViaVoice speech recognition system proposed by IBM, pointing out that These systems, which perfo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/24G06K9/00G10L15/25
Inventor 王天江刘芳周慧华龚立宇陈刚
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products