A learning method for multi-label learning based on hashing method

A multi-label learning and learning method technology, applied in special data processing applications, instruments, unstructured text data retrieval, etc., can solve problems such as high-dimensional and sparse label space, reduce time and space complexity, and improve accuracy performance, increased scalability

Active Publication Date: 2018-09-11
NANJING UNIV OF POSTS & TELECOMM
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] The purpose of the present invention is to solve the problems encountered when the multi-label learning method is applied in a large-scale data scene, and propose a learning method based on hash method for multi-label learning. The method uses hash algorithm and Bayesian statistics Combined with the multi-label learning algorithm of learning, the correlation between labels is used to improve the prediction performance of the multi-label learning model; the MinHash algorithm is used to solve the problem that the label space is often more high-dimensional and sparse in the multi-label learning of large-scale data; Sensitive hashing (ie: LSH) performs neighbor search to solve the learning problem of large-scale data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A learning method for multi-label learning based on hashing method
  • A learning method for multi-label learning based on hashing method
  • A learning method for multi-label learning based on hashing method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The invention will be further described in detail below with reference to the accompanying drawings.

[0032] like figure 2 As shown, the present invention provides a learning method for multi-label learning based on a hash method, and the specific implementation steps of the method include the following:

[0033] (1) Mark correlation extension

[0034] In the multi-label learning algorithm based on Bayesian statistics theory, an important step is to calculate the posterior probability. Given a multi-label training set D={(x i ,Y i )|1≤i≤m} and test samples x, Y i is the corresponding sample x i The marker set vector of , for the jth class y j (1≤j≤q), the formula for calculating the posterior probability based on Bayes' theorem is as follows:

[0035]

[0036] Among them, H j represents x with class label y j This event, P(H j |C j ) represents when there is C in N(x) j samples have class labels y j , H j The posterior probability of being established...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a multi-label learning design method based on a hashing method. Through the combination of a hashing algorithm and a multi-label learning algorithm based on Bayesian statistics, the correlation between labels is effectively utilized so as to improve the predicting performance of a multi-label learning model, labels and neighbors of the labels are introduced to computation of the posterior probability through the characteristics of the neighbors, the correlation between the labels is fully considered, and the accuracy of the algorithms is improved; the problem that the label space in multi-label learning of large-scale data is higher in dimension and sparse is solved through an MinHash algorithm; the purpose of learning large-scale data is achieved by finding the neighbors through locality sensitive hashing (LSH), the neighbors can be rapidly and efficiently found, and the expandability of the multi-label learning algorithm is improved.

Description

technical field [0001] The present invention relates to the technical field of machine learning, and in particular, to a learning method for multi-mark learning based on a hash method. Background technique [0002] In the traditional supervised learning framework, samples generally have a clear and single semantic label, that is, each sample example belongs to only one category. In this supervised learning framework, a variety of algorithms have been proposed and achieved good results. However, in many real-world applications, the semantic labels of research objects are usually not unique, and there are often cases where a sample can be assigned a set of multiple labels. For example, in text classification, a news report may cover multiple aspects of an event and, therefore, should be assigned to multiple topics (e.g., politics and economics); in bioinformatics, a gene or protein often has Multiple functions; in image annotation, an image can often be annotated with multipl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 吴建盛孙永胡海峰
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products