Active learning big data mark method and system

An active learning, big data technology, applied in the field of big machine learning, can solve the problem of low accuracy of big data anchor point labeling

Active Publication Date: 2016-11-30
广州图普网络科技有限公司
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Based on this, it is necessary to provide an active learning big data labeling method and system for the low accuracy of big data anchor point labeling in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Active learning big data mark method and system
  • Active learning big data mark method and system
  • Active learning big data mark method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0055] As a specific implementation, the active learning big data labeling method also includes the following steps:

[0056] Use the kernel matrix K to perform nonlinear mapping on the data points, and obtain the distance after nonlinear mapping

[0057] Using a greedy sequential method, the anchor data set used for active learning is determined according to the following formula:

[0058] z t ∈X and

[0059] Among them, Z t-1 ={z 1 ,…,z t-1} is assumed to have determined t-1 anchor points, z i =x p(i) , p represents the subscript correspondence, Indicates that the t-th anchor point is determined according to the formula,

[0060]

[0061] Initialize Z=φ, according to t=1,...,m sequentially calculate coefficient, keep unchanged, calculated as well as Update according to the proximal point method Sure to make get the minimum yes, and where Tr(·) represents the trace of the matrix, represents the pth of the kernel matrix K i Row.

[006...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention relates to an active learning big data mark method and system. The method comprise: performing linearity reconstruction of each data point according to the anchor data set to be marked in the data set to be marked; calculating the distance between data points; taking the distances as the weight construction regular items of reconstruction parameters, wherein the distances are inversely proportional to the reconstruction parameters; constructing and obtaining a data mark model to perform corresponding processing and correction of the data mark model; and performing optimizing and solution to determine the anchor data for active learning. Because the distances are inversely proportional to the reconstruction parameters, the data mark model is sensitive to the distance among data points, and it is easier to determine whether the corresponding data points have representativeness or not in the solution and optimization process according to the size of the infinite norm value to accurately screen out the anchor data set for active learning in the data set to be marked so as to improve the big data anchor mark accuracy.

Description

technical field [0001] The invention relates to the technical field of big machine learning, in particular to an active learning big data labeling method and system. Background technique [0002] With the advent of the era of big data, especially the development of Internet technology, machine learning applications are faced with an increasing amount of data. Traditional supervised learning methods have better results than semi-supervised learning methods, but the application of supervised learning methods often requires a large amount of labeled data to achieve better results, although the advent of the era of big data makes machine learning tasks can be easily obtained A large amount of data, but to obtain accurately labeled data still requires a lot of manpower and material resources. Active learning technology in the field of big machine learning technology can realize the selection of the most valuable data from massive unlabeled samples for labeling, which can greatly...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N5/02G06N99/00
CPCG06N5/025G06N20/00
Inventor 李明强
Owner 广州图普网络科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products