Corpus noise reduction method and device, electronic equipment and storage medium

A corpus and noise reduction technology, applied in speech analysis, speech recognition, instruments, etc., can solve problems such as differences in the understanding of corpus by labelers, easy introduction of noisy data, and decrease in labeling accuracy, so as to achieve high-quality voice services and reduce training. The effect of frequency, reduction ratio and ambiguous information

Pending Publication Date: 2021-11-02
BEIJING XIAOMI MOBILE SOFTWARE CO LTD +1
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In practical applications, the corpus of the vertical domain service model comes from a variety of sources, and the expressions of different application scenarios are quite different, which is easy to introduce noise data; when a certain vertical domain has rich semantics, there will be blurred semantic boundaries and ambiguous information, resulting in Decrease in labeling accuracy; there are differences in the understanding of corpus by labelers, which leads to the introduction of noise in the labeling process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Corpus noise reduction method and device, electronic equipment and storage medium
  • Corpus noise reduction method and device, electronic equipment and storage medium
  • Corpus noise reduction method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0080] Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The exemplary described embodiments below do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of devices consistent with aspects of the present disclosure as recited in the appended claims.

[0081] In order to solve the above technical problems, embodiments of the present disclosure provide a corpus noise reduction method and device, electronic equipment and storage media, which can be applied to electronic equipment, and the electronic equipment may include but not limited to: personal computer (Personal Computer, PC), Smartphones, servers or server clusters, etc. figure 1 It is a flow chart of a corpus noise redu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a corpus noise reduction method and device, electronic equipment and a storage medium. The method comprises the following steps: acquiring estimated label distribution of an initial corpus set; obtaining a confidence matrix according to the estimated label distribution, wherein the confidence matrix is used for describing label noise distribution under a category condition; obtaining a noise corpus in the initial corpus set based on the confidence matrix; and processing the noise corpus in the initial corpus set to obtain a target corpus set. In the embodiment, the confidence matrix can be established through the prediction probability of the label and the labeling label, the noise corpus in the initial corpus set is identified through the confidence matrix, after the noise corpus is processed, the proportion of the noise corpus in the target corpus and ambiguous information can be reduced, the boundary of the target corpus is clearer, and the training frequency of the vertical domain model is reduced, so that computing resources and consumed time required by training are reduced, and the training efficiency is improved.

Description

technical field [0001] The present disclosure relates to the technical field of corpus noise reduction, and in particular, to a corpus noise reduction method and device, electronic equipment, and a storage medium. Background technique [0002] With the improvement of semantic understanding ability of intelligent voice assistant, it has become an important application of human-computer intelligent interaction. Existing intelligent voice assistants usually use multi-vertical-domain competition to realize intelligent services, that is, intelligent voice assistants initiate requests to multiple preset vertical-domain service models, and each vertical-domain service model analyzes the above requests and provides the service Feedback to the intelligent voice assistant, and inform the confidence level of the service it provides; the intelligent voice assistant will feedback the service with the highest confidence level to the user. Therefore, the quality of service provided by eac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/0208G10L21/0264G10L15/20
CPCG10L21/0208G10L21/0264G10L15/20
Inventor 牛海波
Owner BEIJING XIAOMI MOBILE SOFTWARE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products