Voice dereplication method, device thereof, server and storage medium

A voice and algorithm technology, applied in the field of Internet technology applications, can solve the problems of ignoring the deep information of voice content and rough evaluation, and achieve the effect of fast and effective deduplication processing.

Active Publication Date: 2018-11-20
WUHAN DOUYU NETWORK TECH CO LTD
View PDF7 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the voice deduplication method based on the hash value ignores the deep information of the voice content, and can only roughly evaluate two voices with similar content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice dereplication method, device thereof, server and storage medium
  • Voice dereplication method, device thereof, server and storage medium
  • Voice dereplication method, device thereof, server and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] figure 1 It is a flow chart of a voice deduplication method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation where voice deduplication is implemented based on the deep information of voice content in many voice data. The method can be deduplicated by voice device, wherein the device may be implemented by software and / or hardware. Such as figure 1 As shown, the method of this embodiment specifically includes:

[0028] S110. Obtain the MFCC feature matrix of the target short speech by using the MFCC algorithm of Mel-frequency cepstral coefficients, and convert the MFCC feature matrix into a target image.

[0029] Among them, the Mel frequency is proposed based on the auditory characteristics of the human ear, and has a nonlinear corresponding relationship with the HZ frequency. Among them, the auditory characteristic of the human ear is that the human ear has different perception capabilities to speech signals of differe...

Embodiment 2

[0084] figure 2It is a flowchart of a voice deduplication method provided by Embodiment 2 of the present invention. In this embodiment, on the basis of the above-mentioned embodiments, the optional conversion of the MFCC feature matrix into a target image includes: adjusting the row-column ratio of the MFCC feature matrix according to a first preset rule, so that the row-column ratio It is the same as the preset aspect ratio of the target image; the MFCC feature matrix after adjusting the row-column ratio is converted into a grayscale image, wherein each element in the MFCC feature matrix after adjusting the row-column ratio corresponds to the A grayscale value in the grayscale image; converting the grayscale image into an RGB three-primary-color image, and using the RGB three-primary-color image as the target image. Further, before adjusting the ratio of rows and columns of the MFCC feature matrix according to the preset first rule, it is optional to further include: perfor...

Embodiment 3

[0106] image 3 It is a flowchart of a voice deduplication method provided by Embodiment 3 of the present invention. In this embodiment, on the basis of the above-mentioned embodiments, an optional deep learning model and a feature dimensionality reduction algorithm are used to extract the target image features of the target image, including: inputting the target image into the deep learning model, and The feature dimensionality reduction adjustment is performed through the last fully connected layer, and the target image features with preset dimensions are output, wherein the fully connected layer is set using a feature dimensionality reduction algorithm. Further, the optional determination of the target index of the target image feature includes: performing normalization processing on the elements in each dimension of the target image feature; using the second preset rule, the normalized The subsequent elements in each dimension are subjected to binary quantization to obtai...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An embodiment of the invention discloses a voice dereplication method, a device thereof, a server and a storage medium, wherein the voice dereplication method comprises the steps of acquiring an MFCCcharacteristic matrix of target short voice by means of a Mel-frequency cepstral coefficients (MFCC) algorithm, and converting the MFCC characteristic matrix to a target image; based on a deep learning model and a characteristic dimension reducing algorithm, extracting the target image characteristic of the target image, and determining a target index of the target image characteristic; determining each historical image characteristic which corresponds with each historical short voice according to the target index, and determining whether the target short voice is a repetition voice by means of a repetition degree between each historical image characteristic and the target image characteristic. The voice dereplication method, the device thereof, the server and the storage medium overcome defects of ignorance to deep information of a voice content and rough evaluation to two voices with similar contents in an existing voice dereplication method, and realizes quick and effective dereplication processing on the voice data based on the voice content.

Description

technical field [0001] Embodiments of the present invention relate to the field of Internet technology applications, and in particular to a voice deduplication method, device, server and storage medium. Background technique [0002] With the rapid development of the Internet industry and the expansion of voice information, how to quickly and accurately deduplicate voice data in massive information and save computing resources is a difficult point at present. [0003] The existing voice deduplication method usually calculates the MFCC features of each frame in the voice data, and then stitches the MFCC features of each frame into the overall features of the short voice, calculates the feature hash index, and then compares the similarity of the hash values. However, the voice deduplication method based on the hash value ignores the deep information of the voice content, and can only roughly evaluate two voices with similar content. Contents of the invention [0004] The pre...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L25/24G10L25/27G10L25/48G06K9/62
CPCG10L25/24G10L25/27G10L25/48G06F18/213
Inventor 杨小龙张文明陈少杰
Owner WUHAN DOUYU NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products