Meanwhile, speaker clustering method for deep representation learning and speaker category estimation is optimized

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A clustering method and speaker technology, which is applied in the field of speaker clustering and voiceprint recognition, can solve problems such as unfriendly clustering algorithms and inability to obtain clustering results

Active Publication Date: 2020-05-15

SOUTH CHINA UNIV OF TECH

View PDF6 Cites 9 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The purpose of the present invention is to solve the following shortcomings in the existing speaker clustering method: the feature extraction step and the speaker clustering step are carried out independently, the extracted features are not friendly to the clustering algorithm, and better clustering cannot be obtained As a result, taking advantage of the superiority of deep convolutional autoencoder networks to extract features, a speaker clustering method that optimizes both deep representation learning and speaker category estimation is provided

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment

[0061] Such as figure 1 As shown, this embodiment discloses a speaker clustering method that simultaneously optimizes deep representation learning and speaker category estimation, including the following steps:

[0062] The first step: preprocessing and extracting I-vector features, the steps are:

[0063] Read in the speech samples to be clustered, and pre-emphasize through the first-order high-pass filter, the filter coefficient a is 0.98, and the transfer function of the first-order high-pass filter is:

[0064] H(z)=1-az -1

[0065] Use the Hamming window for framing, the length of each frame is 25ms, and the frame shift is 10ms;

[0066] Perform Fourier transform on the framed signal xt(n) to obtain the frequency domain signal:

[0067]

[0068] Perform Mel filtering on the frequency domain signal, where the Mel filter bank contains M triangular filters, the center frequency of each filter is denoted as f(m), and the frequency response of the mth triangular filter...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speaker clustering method for simultaneously optimizing deep representation learning and speaker category estimation, and the method comprises the following steps: carrying out the preprocessing of a clustering voice sample, extracting I-vector features, training a convolution self-coding network, and extracting deep representation features; constructing an initial classaccording to the depth representation features to obtain a class number and an initial class label; adding a full connection layer and a Softmax layer to an encoder output layer of the convolutional self-encoding network to form a joint optimization framework, and using the Softmax layer for estimating the category of a speaker; and taking the sum of the reconstruction error of the convolutional self-encoding network and the speaker category estimation cross entropy error of the Softmax layer as a target function, and iteratively updating the joint optimization framework parameters until a convergence condition is met to obtain a voice sample of each speaker. According to the method, the optimized depth representation features and the speaker clustering result can be obtained at the same time, and the speaker clustering effect better than that of a traditional method is obtained.

Description

technical field [0001] The invention relates to the technical field of speaker clustering and voiceprint recognition, in particular to a speaker clustering method that simultaneously optimizes deep representation learning and speaker category estimation. Background technique [0002] In recent years, with the development of deep learning technology, voiceprint recognition technology has made great progress. From traditional I-vector features to d-vector and x-vector features based on deep feature transformation, voiceprint recognition has entered practical applications from theoretical research, such as network bank identity authentication, criminal detection, robot voiceprint wake-up, equipment Voiceprint unlocking and more. [0003] But training a large-scale voiceprint recognition model not only requires sufficient training data, but also needs to know which speaker each sample corresponds to. In practical applications, the training data may come from telephone recordin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L17/02G10L17/18G10L17/04G06N3/04G06N3/08

CPCG10L17/02G10L17/18G10L17/04G06N3/08G06N3/045Y02T10/40

Inventor 李艳雄王武城刘名乐江钟杰陈昊

Owner SOUTH CHINA UNIV OF TECH

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Meanwhile, speaker clustering method for deep representation learning and speaker category estimation is optimized

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology