Multi-mode-based conference spokesman identity non-inductive confirmation method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speaker, multi-modal technology, applied in neural learning methods, character and pattern recognition, biological neural network models, etc., to achieve high accuracy and improve efficiency

Pending Publication Date: 2020-02-18

南京星耀智能科技有限公司

View PDF3 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] In order to solve the cumbersome problem of distinguishing different speakers due to the need to turn off and turn on the microphones at different positions many times due to the adjustment of the distance caused by the traditional allocation of microphones in regular meetings, the present invention provides a multi-modal The method for confirming the identity of the conference speaker without feeling, specifically: the method of automatically identifying and distinguishing the speaker's expression, voice and speech style in three aspects, including the expression recognition method based on the deep learning model, the method based on Voice recognition method based on artificial intelligence algorithm, speech content recognition method based on text clustering algorithm

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0035] Collected about 1,000 face photos of speakers at the meeting site, manually classified these photos into speech and non-speech categories, and then used basic operations such as random interference, deformation, and rotation, and then used the Gan network to generate more The training set of the source data set is about 10 times larger than the original data set. Then use the Faster R-Cnn model to train the sample data, and the final model accuracy rate reaches 85%.

[0036] For speaker voice recognition, as a specific embodiment of the present invention, it is: 1) data collection: real-time collection of voice data at the meeting site, and the data is segmented every 4-8 seconds, preferably 5 seconds, and each section is used as a processing unit; 2 ) data processing: because the speeches at the meeting site are relatively standardized, mostly in Mandarin, and the venue is relatively quiet with less noise, so basically there is no need for data processing; 3) model con...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a multi-mode-based conference speaker identity non-inductive confirmation method. Based on a conference using multiple modes of image, voice and text, the identity of a spokesman is confirmed by recognizing the expression, voice and speaking style of the spokesman, and the method specifically comprises an expression recognition method based on a deep learning model, a voicerecognition method based on an artificial intelligence algorithm and a method for recognizing speaking content by adopting a text clustering algorithm. According to the method, the whole process is automatic, manual intervention is not needed, the identity of the speaker can be confirmed in a non-inductive mode through the artificial intelligence algorithm model, manual intervention is not needed,meeting and office efficiency is greatly improved, and accuracy is high.

Description

technical field [0001] The invention belongs to the field of natural language processing, in particular to a method for non-sensing confirmation of the identity of a conference speaker based on multimodality. Background technique [0002] With the development of the economy, efficient office is increasingly inseparable from the conference system. At this stage, many conference systems need to record the speech content of each speaker for the convenience of summarization and reporting. Therefore, for this requirement, an intelligent and fast method for distinguishing speakers is needed. [0003] At present, the current conference system mostly uses the microphone to record the voice of the speaker to record the content of the speech. If you want to distinguish different speakers, you need to assign a microphone to each speaker. However, if you assign multiple microphones, it may cause crosstalk. Because the distance is too close, multiple microphones will recognize a person ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/00G06K9/62G06F16/35G06F16/55G06F40/30G06F40/216G06N3/04G06N3/08G10L17/04G10L17/08

CPCG06F16/355G06F16/55G06N3/08G10L17/04G10L17/08G06V40/174G06N3/045G06F18/241G06F18/214

Inventor 杨理想王云甘周亚孙振平

Owner 南京星耀智能科技有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-mode-based conference spokesman identity non-inductive confirmation method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology