Artificial intelligence data labeling method and device

An artificial intelligence and data technology, applied in the field of data processing, can solve the problems that the scale of annotation is difficult to keep consistent, the subjective influence of annotators and reviewers is large, and the accuracy is not high. The effect of labeling errors

Active Publication Date: 2019-09-20
CHINA ACADEMY OF INFORMATION & COMM
View PDF10 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] High labor costs for data labeling: AI algorithm training requires a large number of labeled samples, and the current massive data labeling tasks rely on manual methods. "As much as there is labor, there is as much intelligence", resulting in high costs for making data sets;
[0005] The quality of data labeling is difficult to guarantee: labeling tasks are subject to the subjective influence of labelers and reviewers, which will introduce certain labeling errors, and data consistency is difficult to guarantee;
[0006] The threshold for labeling professional datasets is high: professional datasets such as medical care, education, and telecommunications networks need to be labeled by professionals in the professional field. It is also difficult to maintain a consistent scale
[0007] It can be seen that the labeling cost of artificial intelligence data is high and the accuracy is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Artificial intelligence data labeling method and device
  • Artificial intelligence data labeling method and device
  • Artificial intelligence data labeling method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0066] see figure 2 , figure 2 It is a schematic diagram of the artificial intelligence data labeling process in the embodiment of this application. The specific steps are:

[0067] Step 201, acquiring a data set to be labeled.

[0068] Step 202, based on the established AI model, obtain the AI ​​label with the highest probability score for each piece of data to be labeled, and the corresponding probability score.

[0069] In specific implementation, one or more established AI models can also be used to obtain the AI ​​label with the highest probability score for each piece of data to be labeled and the corresponding probability score.

[0070] Taking M AI models as an example, obtain the AI ​​label with the highest probability score for each piece of data to be labeled based on the established AI model, as well as the probability score, including:

[0071] Obtain the probability score corresponding to each AI label corresponding to the model based on the established M A...

Embodiment 2

[0089] see image 3 , image 3 It is a schematic flowchart of using the data marked by the AI ​​model as the data sample for training the AI ​​model in the embodiment of the present application. The specific steps are:

[0090] Step 301, acquiring a data set to be labeled.

[0091] Step 302, based on the established AI model, obtain the AI ​​label with the highest probability score for each piece of data to be labeled, and the corresponding probability score.

[0092] Step 303, for any data to be labeled, determine whether the probability score is greater than a first preset threshold.

[0093] Step 304, when it is determined that the probability score is greater than the first preset threshold and it is determined to sample the data to be labeled, label the data to be labeled with a manual label.

[0094] Step 305, determine whether the artificial label is consistent with the acquired AI label, if yes, perform step 309; otherwise, perform step 308.

[0095] Step 306, whe...

Embodiment 3

[0103] see Figure 4 , Figure 4 It is a schematic flowchart of determining whether to update the first threshold according to the accuracy rate in the embodiment of the present application. The specific steps are:

[0104] Step 401, acquiring a data set to be labeled.

[0105] Step 402, based on the established AI model, obtain the AI ​​label with the highest probability score for each piece of data to be labeled, and the corresponding probability score.

[0106] Step 403, for any data to be labeled, determine whether the probability score is greater than a first preset threshold.

[0107] Step 404, when it is determined that the probability score is greater than the first preset threshold, and it is determined to spot check the data to be labeled, and label the data to be labeled with a manual label; and record whether the manual label for the data is consistent with the acquired AI label, Execute step 406.

[0108] Step 405, when it is determined that the probability s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an artificial intelligence data labeling method and device. The method comprises: acquiring a to-be-labeled data set; obtaining an AI label with the highest probability score of each piece of to-be-labeled data and a probability score based on the established AI model; for any to-be-labeled data, determining whether the probability score is greater than a first preset threshold; when it is determined that the probability score is larger than a first preset threshold value and sampling inspection is carried out on the to-be-labeled data, or when it is determined that the probability score is not larger than the first preset threshold value, labeling an artificial label on the to-be-labeled data; and when it is determined that the probability score is greater than a first preset threshold and it is determined that the to-be-labeled data is not sampled, labeling the to-be-labeled data by using the acquired AI label with the highest probability score. According to the method, the manual marking cost and the implementation time cost are saved, and marking errors caused by human subjective factors and marking personnel technical backgrounds are reduced.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to an artificial intelligence data labeling method and device. Background technique [0002] With the rapid development of technologies such as the Internet, machine learning, big data, and cloud computing, all kinds of information data continue to grow at an exponential rate. In the context of the big data era, artificial intelligence has already empowered multiple industries relying on massive data , Breeding a variety of industry applications. [0003] At present, most of the machine learning and deep learning algorithms that artificial intelligence relies on are data-dependent, requiring a large amount of data to train algorithms in a supervised or semi-supervised manner for customized deployment. Due to the huge volume of big data in my country, the complex data types and high data dimensions of various industries pose a huge challenge to the data labeling task. In ge...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N20/00
CPCG06N20/00
Inventor 吕博
Owner CHINA ACADEMY OF INFORMATION & COMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products