Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data increment method and device, computer equipment and storage medium

An incremental and quantitative technology, applied in the field of devices, computer equipment and storage media, and data incremental methods, can solve problems such as the inability to guarantee the accuracy of model training and the imbalance of training text data.

Active Publication Date: 2019-08-23
PING AN TECH (SHENZHEN) CO LTD
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The embodiment of the present invention provides a data increment method, device, computer equipment and storage medium to solve the problem that the training text data used in the current text classification model training is unbalanced and the accuracy of model training cannot be guaranteed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data increment method and device, computer equipment and storage medium
  • Data increment method and device, computer equipment and storage medium
  • Data increment method and device, computer equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0035] The data increment method provided by the embodiment of the present invention can be applied in a data increment tool, which is used for automatic data increment for some samples with uneven distribution of samples in text classification, so as to make the distribution of various samples even and improve the follow-up The accuracy of text classification. Furthermore, this method can also achieve the purpose of increasing the training set, ensure...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data increment method and device, computer equipment and a storage medium, and the method comprises the steps: obtaining a scene classification sample corresponding to a specific scene and a specified sample proportion, carrying out the text preprocessing of the scene classification sample through employing a regular expression, and obtaining a to-be-trained text; carrying out incremental training on the to-be-trained text by adopting the original word vector model to obtain a target word vector model; based on the actual sample number corresponding to each classification label and the total sample number corresponding to the scene classification samples, determining an actual sample proportion corresponding to the classification labels; if the actual sample proportion is smaller than the specified sample proportion, taking the scene classification sample corresponding to the classification label as a sample to be incremented; and inputting the to-be-incremented sample into the target word vector model for processing, obtaining candidate phrases corresponding to the to-be-incremented sample, randomly selecting one target synonym from each candidate phrasefor replacing the to-be-incremented sample, and obtaining a first newly added sample. The method can effectively guarantee data balance.

Description

technical field [0001] The present invention relates to the technical field of data increment, in particular to a data increment method, device, computer equipment and storage medium. Background technique [0002] In text classification scenarios, data imbalance is a very common problem. In terms of intelligent interview scenarios, most candidates will give relatively moderate or good answers to express themselves, and rarely give poor answers. . Therefore, in the process of realizing the automatic scoring of the interviewer's answers in the intelligent interview, there are usually more middle and preferred answer samples, while the deviation samples will be very few, resulting in extremely unbalanced samples, resulting in the accuracy of using this sample for model training. low rate problem. Contents of the invention [0003] Embodiments of the present invention provide a data increment method, device, computer equipment, and storage medium to solve the problem that th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06F17/27
CPCG06F16/35G06F40/247
Inventor 郑立颖徐亮阮晓雯
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products