Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Robust acoustic scene recognition method based on local learning

An acoustic scene and partial learning technology, applied in the field of acoustic scene recognition, can solve problems such as unbalanced number of samples in different channels, mismatched audio channels, and low accuracy of acoustic scene recognition, and solve the problem of unbalanced number of device categories and fast Computing speed, easy-to-implement effects

Active Publication Date: 2019-08-27
HARBIN INST OF TECH
View PDF8 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a robust acoustic scene recognition method based on local learning to solve the problem that the accuracy of acoustic scene recognition is not high when audio channels do not match and the number of samples in different channels is unbalanced.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Robust acoustic scene recognition method based on local learning
  • Robust acoustic scene recognition method based on local learning
  • Robust acoustic scene recognition method based on local learning

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0017] Specific implementation manner 1: This implementation manner provides a robust acoustic scene recognition method based on local learning, which specifically includes the following steps:

[0018] Step 1. Collect sound signals of different acoustic scenes at a sampling frequency of 44.1KHz and perform frequency domain feature extraction. The collected audio is divided into frame sequences with a frame length of 40ms. The 40-dimensional FBank (filtered) is extracted from each frame of data. Set) feature to establish a training sample set;

[0019] Step 2: Preprocessing the feature data extracted in Step 1:

[0020] Calculate the mean and standard deviation in each dimension for the features extracted in step 1, as attached figure 1 As shown, calculate the mean value μ for all samples along the time axis, and calculate the standard deviation σ in the same way; use the obtained mean value and standard deviation to normalize all features;

[0021] Step 3: Channel adaptation and data...

specific Embodiment approach 2

[0025] Specific embodiment two: This embodiment is different from specific embodiment one in that all the features normalized by using the mean and standard deviation obtained in step two are specifically:

[0026] Use the obtained mean and standard deviation to normalize the characteristic data according to the following formula:

[0027]

[0028] Where x norm Indicates the normalized data, μ is the mean, σ is the standard deviation; x is the characteristic data.

[0029] The other steps and parameters are the same as in the first embodiment.

specific Embodiment approach 3

[0030] Specific embodiment three: This embodiment is different from specific embodiment two in that the mean value shift in step three is specifically:

[0031] Add the difference ε to the normalized data with probability p:

[0032]

[0033] Among them, μ most Represents the data mean vector of the device with the largest number of samples; N represents the number of devices other than the device with the largest number of samples, μ i Represents the data mean vector of the i-th device except for the device with the largest number of samples; i=1,...,N; in order to increase the robustness of the system, not all data are added by difference, but by probability p plus, p∈[0,1].

[0034] The other steps and parameters are the same as in the second embodiment.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a robust acoustic scene recognition method based on local learning, and belongs to the technical field of sound signal processing. The robust acoustic scene recognition method comprises the steps: firstly, sound signals of different acoustic scenes are collected, and frequency domain feature extraction is conducted; extracted feature data are pre-processed; then the normalized data are subjected to mean value translation, and data augmentation is conducted through a mixup method; then a convolution neural network model is established according to the local learning thought, a training sample set after data augmentation is input into the model to be trained, and the trained model is obtained; and finally, a to-be-recognized sample is sequentially subjected to frequency domain feature extraction data pre-processing, and input into the trained model to be recognized, and the acoustic scene recognition result is obtained. The problem that the acoustic scene recognition accuracy is low under the conditions of audio channel mismatch and the unbalanced number of different channel samples is solved; and the robust acoustic scene recognition method can be suitable foracoustic scene recognition with various channels and the unbalanced number of the different channel samples.

Description

Technical field [0001] The invention relates to an acoustic scene recognition method, which belongs to the technical field of sound signal processing. Background technique [0002] Sound scene recognition can be widely used in fields such as robots and unmanned vehicles that need to effectively perceive the surrounding sound environment. However, there are often more than one sound collection devices in the real world, and different collection devices have different channel characteristics, so the collected signals are usually not exactly the same. How to automatically and accurately classify the scenes of sounds input from different channels and realize robust acoustic scene recognition has become an urgent and challenging research topic. [0003] In order to achieve robust acoustic scene recognition, it is necessary to make full use of the prior knowledge of the data. At present, most of the methods are acoustic scene recognition methods under pure speech or the same channel; s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L25/51G10L25/30G10L25/18
CPCG10L25/18G10L25/30G10L25/51
Inventor 韩纪庆杨皓郑贵滨郑铁然
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products