Tag clustering method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A clustering method and labeling technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as inaccurate calculation of label similarity

Inactive Publication Date: 2011-07-20

UNIV OF SCI & TECH OF CHINA

View PDF0 Cites 40 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] In order to solve the above technical problems, the main technical purpose of the present invention is to propose a label clustering method and system to overcome the defects of inaccurate calculation of label similarity in the existing collaborative labeling system, alleviate the problems of label organization confusion and label semantic ambiguity, Effectively improve the accuracy of tag clustering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0072] In order to improve the accuracy of the label clustering result, an embodiment of the present invention provides a label clustering method, see figure 1 As shown in the schematic flow chart, the method may specifically include the following steps:

[0073] Step S101: Establish a feature vector of each tag to be clustered.

[0074] In this step, each label to be clustered is modeled and represented by a multi-dimensional feature vector.

[0075] This embodiment specifically provides the following three methods for establishing feature vectors for the tags to be clustered:

[0076] Method 1: Resource-based feature vector representation (item-based-vector, IBV).

[0077] A resource is usually marked by several tags, and each tag has a certain relationship with the resource. Using the above relationship, it can be seen that a tag can also be represented by multiple resources related to it.

[0078] Based on the above idea, the present invention can use the feature vector composed of t...

Embodiment 2

[0122] Corresponding to the tag clustering method provided in the first embodiment, this embodiment provides a tag clustering system to improve the accuracy of tag clustering. See Figure 7 Shown is a schematic diagram of the structure of the system, which specifically includes:

[0123] The feature vector establishment module 701 is used to establish the feature vector of each tag to be clustered;

[0124] The similarity calculation module 702 is used to calculate the cosine included angle of the two feature vectors in the Euclidean space to obtain the similarity between the labels to be clustered;

[0125] The clustering module 703 is configured to use the K-Means algorithm to cluster the tags to be clustered according to the similarity between the tags to be clustered.

[0126] Based on the three methods for establishing feature vectors for the tags to be clustered provided in the first embodiment, the feature vector establishing module 701 may include any one or more of the follo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The embodiment of the invention discloses a tag clustering method and a tag clustering system, wherein the method comprises the steps of; establishing characteristic vectors of every tag to be clustered; calculating a cosine included angle of two characteristic vectors in Euclidean space to obtain the similarity between every two tags to be clustered; and clustering the tags to be clustered by using K-Means algorithm according to the similarity between the tags to be clustered. The tag clustering system comprises: a characteristic vector establishing module which is used for establishing the characteristic vectors of every tag to be clustered, a similarity calculating module which is used for calculating the cosine included angle of two characteristic vectors in Euclidean space to obtain the similarity between every two tags to be clustered, and a clustering module which is used for clustering the tags to be clustered by using the K-Means algorithm according to the similarity between the tags to be clustered. The technical scheme can overcome the defect of inaccurate similarity calculation of tags in the current collaborative tag system, settle the problems of disordered tag organization and fuzzy tag semantics, and enhance the accuracy of tag clustering effectively.

Description

Technical field [0001] The present invention relates to the technical field of data mining, in particular to a collaborative labeling method, and particularly to a label clustering method and system under a large data set. Background technique [0002] Web 2.0, as a highly networked and liberalized Internet form based on users, content, and applications, has attracted a large number of Internet users, and has derived Web 2.0 applications such as blogs, podcasts, community networks, web digests, and Wikipedia. . The social labeling system is a typical web2.0 application, which is very popular and has a bright future. For example, websites such as Flickr, del.icio.us, and Douban.com all use collaborative labeling. One of their main characteristics is that they are open and uncontrolled systems. Users label resources with different tags according to their social and cultural background, expertise and world outlook, and use these user tags to complete the classification, organizati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30

Inventor 陈超周津俞能海

Owner UNIV OF SCI & TECH OF CHINA

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Tag clustering method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology