Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Sample expansion method and device, electronic equipment and storage medium

A sample expansion and sample technology, applied in the field of sample expansion, can solve problems such as low efficiency, lack of intention recognition level, and customer intention recognition errors

Active Publication Date: 2019-08-27
度小满科技(北京)有限公司
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to differences in individual understanding levels and differences in professionalism in specific business scenarios, the method of artificially expanding speech samples often has problems such as low efficiency and high chance, so this method does not have a standardized intent recognition level. , which may lead to wrong recognition of customer intent by the final voice robot, reducing service quality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sample expansion method and device, electronic equipment and storage medium
  • Sample expansion method and device, electronic equipment and storage medium
  • Sample expansion method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0100] Embodiment 1 of the present application provides a sample expansion method, which is applied to expand speech samples for training voice robots to recognize customer intentions, which will be described in detail below with reference to the accompanying drawings.

[0101] see figure 1 , which is a flowchart of a sample expansion method provided in Embodiment 1 of the present application.

[0102] This method can expand new samples based on samples with known labels, and the labels of the new samples obtained from the expansion are known, and can also determine labels for samples with unknown labels based on samples with known labels. It can be understood that the samples actually used to train the recognition model are samples with known labels.

[0103] The method described in the embodiment of the present application includes the following steps:

[0104] S101: Determine an original sample from N known labeled samples, and perform word segmentation processing on the ...

Embodiment 2

[0130] The method for screening the samples to be verified will be described in detail below in conjunction with the accompanying drawings.

[0131] see figure 2 , which is a flow chart of the method for screening samples to be verified provided in Embodiment 2 of the present application.

[0132] The method described in the embodiment of the present application includes the following steps when screening the sample to be verified:

[0133] S201: Obtain word vectors of the original sample and the i-th sample to be verified; said i=1, . . . K.

[0134] In the embodiment of the present application, the screening of samples to be verified includes similarity screening and perplexity screening. When performing similarity screening, it is first necessary to obtain word vectors of samples to be verified.

[0135] The text collection of the vertical domain corpus is used for word segmentation training in advance to generate a vector model and a language model. Among them, the vec...

Embodiment 3

[0189] The method for screening samples whose labels can be determined from M samples with unknown labels based on N known label samples will be described in detail below with reference to the accompanying drawings.

[0190] see Figure 8 , which is a flow chart of another sample expansion method provided in Embodiment 3 of the present application.

[0191] The method described in the embodiment of the present application includes the following steps:

[0192] S301: Obtain the similarity between the jth unknown label sample and the N known label samples; j=1, . . . M.

[0193] In a possible implementation manner, for the jth sample with an unknown label, the similarities between it and the N samples with known labels are respectively obtained, that is, N similarities are obtained. Therefore, M samples with unknown labels need to obtain M×N similarities.

[0194] Firstly, the word vectors of the M unknown label samples and the N known label samples are obtained.

[0195] Th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a sample expansion method and device, electronic equipment and a storage medium, and the method is applied to expansion of a speech sample, and specifically comprises the steps:determining an original sample from N known tag samples, and carrying out word segmentation processing on the original sample to obtain a word segmentation vector, wherein the tag represents a clientintention; constructing a synonym vector of each segmented word in the segmented word vectors, and obtaining K samples to be verified according to a synonym replacement mode; performing sample screening on the K samples to be verified to obtain newly added samples; and based on the N known tag samples, screening samples capable of determining tags from the M unknown tag samples and adding the samples to the newly added sample. By utilizing the sample expansion method provided by the invention, sample expansion can be carried out according to the known label samples, and labels are added to the unknown label samples according to the known label samples, so that the sample expansion efficiency is improved.

Description

technical field [0001] The present application relates to the technical field of speech recognition, and in particular to a sample expansion method, device, electronic equipment and storage medium. Background technique [0002] The customer's words can represent the customer's intention, and different words may correspond to the same intention. For example, the customer intentions corresponding to "I don't have it now" and "I don't have it here" are "temporarily unable to repay the loan", so How to accurately identify the customer's intention from the complicated and varied words has attracted more and more attention. [0003] The traditional way of identifying customer intentions is mainly manual recognition, but due to the characteristics of convenient deployment, service standards, low cost, and full coverage of working hours, voice robots have gradually attracted the attention of various industries. People hope that voice robots can replace traditional ones. A way to ma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06K9/62G10L15/06G10L15/10G10L15/26
CPCG10L15/063G10L15/10G10L2015/0635G10L15/26G06F40/284G06F18/22G06F18/214
Inventor 张洪亮许庶孙振周建龙
Owner 度小满科技(北京)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products