Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Meanwhile, speaker clustering method for deep representation learning and speaker category estimation is optimized

A clustering method and speaker technology, which is applied in the field of speaker clustering and voiceprint recognition, can solve problems such as unfriendly clustering algorithms and inability to obtain clustering results

Active Publication Date: 2020-05-15
SOUTH CHINA UNIV OF TECH
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to solve the following shortcomings in the existing speaker clustering method: the feature extraction step and the speaker clustering step are carried out independently, the extracted features are not friendly to the clustering algorithm, and better clustering cannot be obtained As a result, taking advantage of the superiority of deep convolutional autoencoder networks to extract features, a speaker clustering method that optimizes both deep representation learning and speaker category estimation is provided

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Meanwhile, speaker clustering method for deep representation learning and speaker category estimation is optimized
  • Meanwhile, speaker clustering method for deep representation learning and speaker category estimation is optimized
  • Meanwhile, speaker clustering method for deep representation learning and speaker category estimation is optimized

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0061] Such as figure 1 As shown, this embodiment discloses a speaker clustering method that simultaneously optimizes deep representation learning and speaker category estimation, including the following steps:

[0062] The first step: preprocessing and extracting I-vector features, the steps are:

[0063] Read in the speech samples to be clustered, and pre-emphasize through the first-order high-pass filter, the filter coefficient a is 0.98, and the transfer function of the first-order high-pass filter is:

[0064] H(z)=1-az -1

[0065] Use the Hamming window for framing, the length of each frame is 25ms, and the frame shift is 10ms;

[0066] Perform Fourier transform on the framed signal xt(n) to obtain the frequency domain signal:

[0067]

[0068] Perform Mel filtering on the frequency domain signal, where the Mel filter bank contains M triangular filters, the center frequency of each filter is denoted as f(m), and the frequency response of the mth triangular filter...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speaker clustering method for simultaneously optimizing deep representation learning and speaker category estimation, and the method comprises the following steps: carrying out the preprocessing of a clustering voice sample, extracting I-vector features, training a convolution self-coding network, and extracting deep representation features; constructing an initial classaccording to the depth representation features to obtain a class number and an initial class label; adding a full connection layer and a Softmax layer to an encoder output layer of the convolutional self-encoding network to form a joint optimization framework, and using the Softmax layer for estimating the category of a speaker; and taking the sum of the reconstruction error of the convolutional self-encoding network and the speaker category estimation cross entropy error of the Softmax layer as a target function, and iteratively updating the joint optimization framework parameters until a convergence condition is met to obtain a voice sample of each speaker. According to the method, the optimized depth representation features and the speaker clustering result can be obtained at the same time, and the speaker clustering effect better than that of a traditional method is obtained.

Description

technical field [0001] The invention relates to the technical field of speaker clustering and voiceprint recognition, in particular to a speaker clustering method that simultaneously optimizes deep representation learning and speaker category estimation. Background technique [0002] In recent years, with the development of deep learning technology, voiceprint recognition technology has made great progress. From traditional I-vector features to d-vector and x-vector features based on deep feature transformation, voiceprint recognition has entered practical applications from theoretical research, such as network bank identity authentication, criminal detection, robot voiceprint wake-up, equipment Voiceprint unlocking and more. [0003] But training a large-scale voiceprint recognition model not only requires sufficient training data, but also needs to know which speaker each sample corresponds to. In practical applications, the training data may come from telephone recordin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L17/02G10L17/18G10L17/04G06N3/04G06N3/08
CPCG10L17/02G10L17/18G10L17/04G06N3/08G06N3/045Y02T10/40
Inventor 李艳雄王武城刘名乐江钟杰陈昊
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products