A voiceprint recognition method based on self-attention and transfer learning

A transfer learning and voiceprint recognition technology, applied in speech analysis, instruments, etc., can solve the problems of lack of generalization ability of real-world applications, low accuracy of voiceprint recognition, etc., to expand generalization ability, strengthen generalization ability, high precision effect

Active Publication Date: 2022-04-12
CHINA SCI INTELLICLOUD TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] At present, the accuracy of voiceprint recognition based on traditional methods is low, while voiceprint recognition based on deep learning relies too much on massive, high-latitude, high-quality voice data, and both are vulnerable to environmental noise, reverberation and audio channels. Influence, lack of generalization ability for real-world applications

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A voiceprint recognition method based on self-attention and transfer learning
  • A voiceprint recognition method based on self-attention and transfer learning
  • A voiceprint recognition method based on self-attention and transfer learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] A voiceprint recognition method based on self-attention and transfer learning, which obtains open-source English speech data and constructs a first-level basic data set; obtains open-source Chinese speech data and constructs a second-level basic data set; collects voice data of application scenarios and constructs application scenarios dataset; such as Figure 6 As shown, based on the attention model and the first-level basic data set, the first-level basic model is trained; then, on the second-level basic data set, the first-level basic model is migrated and fine-tuned to obtain the second-level basic model; finally, in the specific Based on the application scenario data, migrate and fine-tune the secondary basic model to obtain the final model suitable for the specific application scenario. Cascade fine-tuning not only learns the robustness of noise, reverberation, and channels, but also learns the pronunciation characteristics of Chinese and the recognition ability t...

Embodiment 2

[0032] This embodiment optimizes on the basis of embodiment 1, obtains massive open-source English voice data (sitw, voxceleb1, voxceleb2, etc.), and builds a first-level voiceprint basic data set; this data set is collected under unconstrained conditions and has a large Good noise, reverberation, channel robustness.

[0033] Obtain a large amount of open source Chinese speech data (aishell, primewords, st-cmds, thchs30, etc.), and construct a secondary voiceprint basic data set; this data set is a Chinese data set, which can better adapt to the pronunciation characteristics of Chinese.

[0034] Collect a small amount of voice data in application scenarios to build an application scenario voiceprint data set; this data set is collected in real application scenarios, which can better match the actual application scenarios.

[0035] Other parts of this embodiment are the same as those of Embodiment 1, so details are not repeated here.

Embodiment 3

[0037] This embodiment is optimized on the basis of embodiment 1 or 2, as figure 1 , figure 2 As shown, data enhancement in the time domain and frequency domain is performed on the first-level basic data set, the second-level basic data set, and the application scenario data set. like figure 1 As shown, the time domain audio data is enhanced; in the time domain, the rhythm and pitch are controlled, the audio speed is adjusted, and random noise is added. like figure 2 As shown, the audio data in the frequency domain is enhanced; in the frequency domain, Vocal Tract Length Perturbation is used to apply a random distortion factor to the spectral characteristics of each audio.

[0038] The invention obtains English and Chinese public data sets, collects a small amount of application scene data sets, and enhances them from two dimensions of time domain and frequency domain. For all data sets, data enhancement in the time domain and frequency domain is carried out, which great...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a voiceprint recognition method based on self-attention and transfer learning, which acquires open-source English voice data and constructs a first-level basic data set; acquires open-source Chinese voice data and builds a second-level basic data set; collects application-scenario voice data , build the application scenario data set; based on the attention model and the first-level basic data set, train the first-level basic model; then, on the second-level basic data set, perform migration and fine-tuning training on the first-level basic model to obtain the second-level basic model; Finally, on the specific application scenario data, the secondary basic model is migrated and fine-tuned to obtain the final model suitable for the specific application scenario. The present invention not only learns the robustness of noise, reverberation, and channel, but also learns the pronunciation characteristics of Chinese and the recognition ability that is more suitable for real application scenarios, and has the robustness of noise, reverberation, and channel, which is very good for real scenarios Applications.

Description

technical field [0001] The invention belongs to the technical field of voiceprint recognition, in particular to a voiceprint recognition method based on self-attention and transfer learning. Background technique [0002] Biometric technology is an identification technology that relies on human body characteristics for identity verification. Because of its characteristics of no loss, no forgetting, uniqueness, invariance, good anti-counterfeiting performance and convenient use, it is widely used in access control, time attendance, finance, public safety and terminal electronic equipment. [0003] Voice Print Recognition (Voice Print Recognition), as a kind of biometric identification, is a service for identifying the speaker based on the vocal characteristics of the speaker. Its identity recognition has nothing to do with accent, has nothing to do with language, is non-contact, and has a natural way of realization. It has received extensive attention and application in recen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G10L17/00G10L17/20
CPCG10L17/00G10L17/20
Inventor 高登科
Owner CHINA SCI INTELLICLOUD TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products