Speaker separation method and related equipment thereof

A speaker separation and speaker technology, applied in speech analysis, instruments, etc., can solve the problems of low speaker separation accuracy and speaker separation.

Pending Publication Date: 2021-09-07
IFLYTEK CO LTD
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, due to defects in the related speaker separation technology, the related speaker separation technology cannot perform speaker separation for some complex speech data (such as voice data in which multiple speakers speak at the same time), which leads to the related speaker separation Technology's speaker separation accuracy is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speaker separation method and related equipment thereof
  • Speaker separation method and related equipment thereof
  • Speaker separation method and related equipment thereof

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] see figure 1 , which is a flow chart of a speaker separation method provided in the embodiment of the present application.

[0028] The speaker separation method provided in the embodiment of this application includes S1-S3:

[0029] S1: Obtain the voice data to be separated.

[0030] Wherein, the voice data to be separated refers to voice data that needs to be subjected to speaker separation processing; and the voice data to be separated includes voice information of at least one speaker. For example, the voice data to be separated may include voice information of N speakers. Wherein, N is a positive integer.

[0031] In addition, this embodiment of the present application does not limit the voice data to be separated, for example, the voice data to be separated may include at least one piece of overlapping audio data. Wherein, "overlapping audio data" refers to audio data generated by multiple speakers speaking at the same time.

[0032] In addition, in order to ...

example 1

[0086] Example 1, step 2311 may specifically include step 41-step 42:

[0087] Step 41: Arranging and combining the K pieces of predicted speech separation data corresponding to the g-th sample speech to obtain T permutations and combinations sequences corresponding to the K pieces of predicted speech separation data.

[0088] Step 42: According to the t-th permutation and combination sequence corresponding to the above K predicted speech separation data, the K pieces of predicted speech separation data and the K actual speech separation data corresponding to the g-th sample speech arranged in the first order respectively Establish a corresponding relationship among them, and obtain the corresponding relationship of the t-th candidate data corresponding to the g-th sample speech (as shown in formula (1)). Wherein, the "first order" can be preset; t is a positive integer, t≤T, and T is a positive integer.

[0089]

[0090] In the formula, Represents the corresponding rela...

example 2

[0092] Example 2, step 2311 may specifically include step 51-step 52:

[0093] Step 51: Permutate and combine the K actual speech separation data corresponding to the g-th sample speech, and obtain T permutation and combination sequences corresponding to the K actual speech separation data.

[0094] Step 52: According to the t-th arrangement and combination sequence corresponding to the K actual speech separation data, respectively establish the K prediction speech separation data corresponding to the K actual speech separation data and the g-th sample speech arranged in the second order Correspondence relationship, to obtain the tth candidate data correspondence relationship corresponding to the gth sample speech (as shown in formula (2)). Wherein, the "second order" can be preset; t is a positive integer, t≤T, and T is a positive integer.

[0095]

[0096] In the formula, Represents the corresponding relationship of the tth candidate data corresponding to the gth sampl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a speaker separation method and related equipment thereof, and the method comprises the steps: inputting to-be-separated voice data into a pre-constructed voice separation model after the to-be-separated voice data comprising the voice information of at least one speaker is obtained; obtaining at least one part of voice separation data output by the voice separation model, so that speakers of voice information carried by each part of voice separation data are different (namely, different parts of voice separation data are used for recording voice information of different speakers); and according to the at least one part of voice separation data, determining a speaker separation result of the to-be-separated voice data, so that the speaker separation result can accurately represent a voice segment corresponding to each speaker in the to-be-separated voice data. In this way, the adverse effect caused by the fact that the multiple speakers corresponding to the overlapped audio data cannot be accurately recognized can be effectively avoided, and therefore the speaker separation accuracy can be effectively improved.

Description

technical field [0001] The present application relates to the technical field of speech processing, in particular to a speaker separation method and related equipment. Background technique [0002] The speaker separation technology can classify and organize each frame of audio data in the speech data according to different speakers, so as to combine multiple frames of audio data belonging to the same speaker into one speech segment, so that at least one speech segment can be obtained, so that The speakers of each speech segment are different, so that the speaker information can be respectively marked on each speech segment later. [0003] At present, with the development of speaker separation technology, there are more and more application scenarios of speaker separation technology. For example, speaker separation technology can be applied to application scenarios such as conference content organization and speech transcription. [0004] However, due to defects in the rela...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L17/02G10L17/04G10L17/18G10L19/02G10L19/04G10L21/0272
CPCG10L17/02G10L17/04G10L17/18G10L19/02G10L19/04G10L21/0272
Inventor 孙磊方昕吴明辉李永超刘俊华
Owner IFLYTEK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products