Annotation data quality evaluation method and device, computer device and storage medium

A technology for labeling data and quality evaluation, applied in the field of data processing, can solve problems such as omissions and judge the quality of labeling, and achieve the effects of saving costs, improving evaluation accuracy, and solving low efficiency

Active Publication Date: 2020-02-21
DATAGRAND TECH INC
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Omissions often occur in manual review, especially when the amount of data in the labeled sample is large, it is difficult for manual inspection to quickly and accurately judge the labeling quality of each labeled text in the labeled sample

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Annotation data quality evaluation method and device, computer device and storage medium
  • Annotation data quality evaluation method and device, computer device and storage medium
  • Annotation data quality evaluation method and device, computer device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] figure 1 It is a flow chart of a method for evaluating the quality of tagged data in Embodiment 1 of the present invention. This embodiment is applicable to the situation where the tagged text in the tagged sample is evaluated for tagging quality. This method can be provided by the embodiment of the present invention Annotate data quality evaluation means to implement, the means can be implemented in the form of software and / or hardware, and can generally be integrated into computer equipment, such as terminal equipment or servers. Such as figure 1 As shown, the method of this embodiment specifically includes:

[0027] S110. Acquire at least one labeled sample to be processed.

[0028] Specifically, the marked sample is used as a carrier of marked text, where the marked sample may be text, document, image text recognized by image or audio text recognized by audio, and the like.

[0029] Usually, a specific field is marked in a piece of text, and the text marked with ...

Embodiment 2

[0069] Figure 2aIt is a flow chart of a method for evaluating the quality of labeled data in Embodiment 2 of the present invention. This embodiment is embodied on the basis of the above-mentioned embodiments, and the analysis of the labeling accuracy of the at least one labeled sample is embodied. To: obtain the original text matched by the labeled sample; wherein, the original text does not include any labeled data; use a pre-trained model to label the original text to obtain predicted labeled data; the labeled sample includes The labeled data to be evaluated is compared with the predicted labeled data to obtain an accuracy analysis result of the labeled samples. Concretely analyzing the annotation consistency of the at least one labeled sample as: classifying the labeled data to be evaluated in the at least one labeled sample to form at least one class, each class including at least one initial labeled text; Carry out consistency analysis to the initial labeling text of ea...

Embodiment 3

[0122] image 3 It is a schematic diagram of a labeled data quality evaluation device in Embodiment 3 of the present invention. Embodiment 3 is a corresponding device for implementing the method for evaluating the quality of labeled data provided by the above embodiments of the present invention. The device can be implemented in the form of software and / or hardware, and can generally be integrated into computer equipment.

[0123] Correspondingly, the device of this embodiment may include:

[0124] Annotated sample acquisition module 310, configured to acquire at least one labeled sample to be processed;

[0125] An annotation accuracy analysis module 320, configured to perform an annotation accuracy analysis on the at least one annotated sample;

[0126] An annotation consistency analysis module 330, configured to perform an annotation consistency analysis on the at least one annotation sample;

[0127] An annotation quality evaluation result determining module 340, config...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses an annotation data quality evaluation method and device, a computer device and a storage medium. The method comprises the following steps: acquiring at leastone annotation sample to be processed; respectively carrying out annotation accuracy analysis on the at least one annotation sample; performing annotation consistency analysis on the at least one annotation sample; and generating an annotation quality evaluation result of the annotation sample according to the accuracy analysis result and the consistency analysis result. According to the embodiment of the invention, the quality of the annotation data can be accurately evaluated, the labor cost is reduced, and the evaluation efficiency is improved.

Description

technical field [0001] The embodiments of the present invention relate to the field of data processing, and in particular, to a method, device, computer equipment, and storage medium for evaluating the quality of labeled data. Background technique [0002] At present, in the field of text recognition, pre-trained models are usually used to recognize text. During the training process of the model, the annotation quality of text data is very important, and only high-quality annotation data can generate high-quality models. [0003] Existing training samples can be labeled manually or automatically. However, labeling may be wrong. For example, label data that should not be labeled; omit label data; label data partially overlap with standard data, etc. [0004] Typically, labeled samples are checked using manual review. Omissions often occur in manual review, especially when the amount of data in annotated samples is large, it is difficult for manual inspection to quickly and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06K9/62
CPCG06V30/40G06F18/214Y02P90/30
Inventor 章逸骋陈运文高翔王江陈宇纪达麒
Owner DATAGRAND TECH INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products