Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Complex sound recognition method based on one-dimensional convolutional neural network

A convolutional neural network and sound recognition technology, applied in neural learning methods, biological neural network models, speech recognition, etc., can solve problems such as the large difference between audio duration and target length, inconvenient application, and difficulty in changing recognition results. Achieve the effects that are conducive to the improvement of classification performance, simplify the attention mechanism, and facilitate model classification

Pending Publication Date: 2021-12-28
OCEAN UNIV OF CHINA
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The first is to directly use the original signal for network training. The advantage of this method is that it does not require manual feature extraction of the signal, which greatly simplifies the operation process, and the model is simple and convenient to popularize; the second method is to process the original data and manually Extract some features of the sound signal, such as spectrogram and Mel frequency cepstral coefficient, etc. This method has the advantage of high accuracy for some data sets, but the generalization ability is poor, and it is difficult for the model to change the subsequent recognition results
The third is a multi-input complex network, which uses the original sound signal and artificially extracted features as the input part of the network. Defects, but this type of model is more complicated, has high requirements on the hardware of the platform, and is inconvenient to apply
However, since the deep learning model is difficult to effectively extract the features of the original signal, and the model proposed by the existing technology is relatively complex, further optimization is required.
Therefore, solving complex sound problems based on the original audio signal is a very big challenge
There will be many audio durations in the data set that are very different from the target length. For example, the actual duration is 1 second and the target length is 4 seconds. Obviously, the cubic spline interpolation is no longer suitable for this situation, and the zero padding method is too simple, and a lot of data will be lost. information, and the more zeros are filled, the more effective information may be concealed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Complex sound recognition method based on one-dimensional convolutional neural network
  • Complex sound recognition method based on one-dimensional convolutional neural network
  • Complex sound recognition method based on one-dimensional convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0039] This embodiment provides a complex sound recognition method based on a one-dimensional convolutional neural network, which includes two aspects: on the one hand, a random filling algorithm is used to process complex sounds, and the original data is filled to the same length for use in one-dimensional convolutional neural networks. input to the network. On the other hand, optimize the network model structure: embed a pre-emphasis module and a simplified attention mechanism module in the basic framework of a one-dimensional convolutional neural network. The pre-emphasis module is placed in the input part of the one-dimensional convolutional neural network to pre-emphasize the input data and participate in network model tuning; the simplified attention mechanism module is placed in the deep layer of the one-dimensional convolutional neural ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a complex sound recognition method based on a one-dimensional convolutional neural network. The method comprises the steps: a complex sound is processed through a random completion algorithm, and original data is filled into the same length for input of the one-dimensional convolutional neural network; and a pre-emphasis module and a simplified attention mechanism module are embedded in a basic framework of the one-dimensional convolutional neural network, wherein the pre-emphasis module is arranged at an input part of the one-dimensional convolutional neural network and is used for pre-emphasis on input data and participating in network model optimization; and the simplified attention mechanism module is arranged in a deep layer of the one-dimensional convolutional neural network, and global features with attention are obtained by using global average pooling and a sigmoid function. Through the method, the network model is optimized, and a good identification effect is obtained.

Description

technical field [0001] The invention belongs to the technical field of audio processing, and relates to complex sound recognition technology, in particular to a complex sound recognition method based on a one-dimensional convolutional neural network. Background technique [0002] Complex sounds refer to non-linguistic sounds in the environment. The sound sources are complex and diverse, the signal itself is non-stationary and often accompanied by extremely disturbing background noise, etc., making the sound features of different sound scenes not obvious enough or the feature similarity is very high. , complex sound recognition can automatically identify specific types of complex sounds in the environment, such as children playing, car horns and street music. In the field of sound classification, such as speech classification and music classification have achieved very high accuracy, but in the field of complex sound recognition due to the non-stationary signal itself, the sp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/06G10L19/04G10L25/30G06N3/04G06N3/08
CPCG10L15/063G10L19/04G10L25/30G06N3/08G10L2015/0631G06N3/045
Inventor 殷波杜泽华魏志强董西峰
Owner OCEAN UNIV OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products