Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method, system, medium and device for multi-emotion recognition combining voice and text

A speech emotion recognition and emotion recognition technology, applied in the field of human-computer interaction, can solve the problems of inability to handle multi-emotion recognition tasks, difficult to apply, afraid and not afraid, etc.

Active Publication Date: 2021-11-23
GUANGDONG LVAN IND & COMMERCE CO LTD
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This combination method is more suitable for bipolar emotions, such as happy and unhappy emotions, or fearful and not afraid emotions, but it is difficult to apply to multi-emotion recognition, such as happy, sad, angry, and surprised.
[0005] The existing technology can only solve different bipolar emotion recognition tasks, but cannot deal with the technical problems of multi-emotion recognition tasks, and no effective solution has been proposed so far

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, system, medium and device for multi-emotion recognition combining voice and text
  • Method, system, medium and device for multi-emotion recognition combining voice and text
  • Method, system, medium and device for multi-emotion recognition combining voice and text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] Such as figure 1 As shown, the multi-emotion recognition method of combining speech and text in the present embodiment 1 comprises the following steps:

[0050] Step S201, acquiring target audio;

[0051] When acquiring the target audio, it can be acquired actively by the terminal or passively through user operation instructions, or audio sent from other sources, or a collected audio corpus. The function of obtaining the target audio is to identify the emotional information in it, and to obtain the text information in it for text emotion recognition. The above text information includes but not limited to a sentence, a paragraph or a chapter.

[0052] Emotional information refers to the personal emotions that the speaker wants to express during oral expression, such as emotions, anger, sorrow, and joy.

[0053] Step S202, extracting the speech features of the target audio;

[0054] The purpose of acquiring the speech features is to generate the input of the first neu...

Embodiment 2

[0088] Such as image 3 As shown, the present embodiment provides a multi-emotion recognition system combining voice and text, the system includes a target audio acquisition module, a first conversion module, a first speech feature acquisition module, a first text feature acquisition module, and a target situation Determine the module, and the specific functions of each module are as follows:

[0089] The target audio acquisition module is used to acquire target audio, the target audio is composed of a plurality of audio segments, and the target audio includes the first voice feature;

[0090] The first conversion module is configured to convert first text information from the target audio, and the first text information includes first text features;

[0091] The first speech feature obtaining module is used to obtain first speech emotion recognition information based on the first speech feature;

[0092] The first text feature obtaining module is used to obtain first text e...

Embodiment 3

[0097] This embodiment provides a storage medium, the storage medium stores one or more programs, and when the programs are executed by the processor, the multi-emotion recognition method combining voice and text in the above-mentioned embodiment 1 is realized, as follows:

[0098] Obtain target audio, the target audio is made up of a plurality of audio segments, and the target audio contains the first speech feature;

[0099] Converting first text information from the target audio, the first text information includes a first text feature;

[0100] Obtaining first speech emotion recognition information based on the first speech feature;

[0101] Obtaining first text emotion recognition information based on the first text features;

[0102] Determine the target emotion of the target audio based on the first speech emotion recognition information and the first text emotion recognition information.

[0103] The storage medium described in this embodiment may be ROM, RAM, magnet...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method, system, medium and equipment for multi-emotion recognition combining voice and text. The method includes acquiring target audio composed of multiple audio segments, and using voice recognition technology to convert it into corresponding multiple text information; based on Audio emotion recognition information is obtained from the speech features of the audio information, and text emotion recognition information is obtained based on the text features of the text information; the new combination method is used to combine the two recognition emotions to obtain the target emotion information of the corresponding audio segment. The new combination method is based on the emotion vector of speech recognition and the emotion vector of text recognition, by making different combinations of different emotion information in these two vectors, and then using these different combinations to train the emotion combination model. The present invention uses speech and text recognition emotion vectors to ensure that the parts of speech emotion analysis and text emotion analysis are independent of each other, which can not only solve the problem of bipolar emotion analysis, but also be applicable to the scene of multi-emotion analysis.

Description

technical field [0001] The invention relates to the field of human-computer interaction, in particular to a multi-emotion recognition method, system, medium and equipment combining speech and text. Background technique [0002] With the further popularization of the Internet and the continuous development of information technology, people are increasingly aware of the importance of information. The continuous in-depth research of artificial intelligence technology provides the possibility to obtain more types of information. With the development of the Internet, social media is no longer just a platform for transmitting information, but it has begun to allow users to create their own accounts and at the same time become a platform for collecting information. More and more platforms find that the user's emotional information is a very valuable information, which can express the user's likes and dislikes about a certain thing. For example, products that provide users with co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/02G10L15/06G10L15/22G10L15/26G10L25/30G10L25/63G06F16/35G06F40/30G06F40/289G06N3/04G06N3/08
CPCG10L15/02G10L15/063G10L15/22G10L25/30G10L25/63G06F16/35G06N3/084G10L2015/225G10L15/26G06N3/045
Inventor 林伟伟吴铨辉
Owner GUANGDONG LVAN IND & COMMERCE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products