Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

User interest modeling method based on conceptual clustering

A user interest and concept clustering technology, applied in special data processing applications, instruments, electrical and digital data processing, etc., can solve problems such as being unsuitable to specifically express a single user interest, and achieve high accuracy and accurate text content.

Inactive Publication Date: 2009-11-04
BEIHANG UNIV
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the method using domain ontology focuses on mining concepts of common interest to user groups, and these concepts are not suitable for expressing the interests of individual users.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • User interest modeling method based on conceptual clustering
  • User interest modeling method based on conceptual clustering
  • User interest modeling method based on conceptual clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] First, document preprocessing is performed. The preprocessing step mainly uses three text preprocessing methods, namely stemming, stop word filtering and text segmentation. UIM 2 C 2 Use the documents selected by the user in each query session as potential feedback content;

[0024] Second, establish a suffix tree and identify basic clusters. First, establish a suffix tree based on the content of the document. Using three document examples, namely "cat ate cheese", "mouse ate cheese too" and "cat ate mouse too", establish a suffix tree (see attached figure 1 ), according to the established suffix tree, get the basic cluster (see attached figure 2 )Information;

[0025] Then, an improved basic cluster graph is established. According to the basic cluster information, according to the STC algorithm, the basic cluster graph is generated. The basic cluster graph represents the similarity relationship between the basic clusters. The measurement of the similarity relations...

Embodiment 2

[0030] According to the steps in the first embodiment, select the 20NewsGroup data set. The dataset contains about 20,000 documents distributed in 20 UseNet discussion groups. The present invention randomly selects 10 groups wherein a total of 2823 documents form a data subset (see attached Figure 7 ) to conduct experiments and compare with the results of the WebDCC method.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a new user interest modeling method based on conceptual clustering UIMC for solving the shortcomings in the aspects of accuracy and incremental processing capability of the traditional user interest modeling method. The method firstly constructs a suffix tree structure by analyzing a history document accessed by a user, then selects the different similarity thresholds and combines base clusters according to the different particle sizes. An interest level of the user is generated according to the inclusion relation in the base clusters merged according to the different threshold conditions. The UIMC method is the incremental and unsupervised conceptual studying method against the document, thereby being capable of easily obtaining and updating a user description file. Finally, the effectiveness of the UIMC method on the interest forecast aspect is verified by experiments over 20 News Group data set.

Description

technical field [0001] The invention relates to a user interest modeling method based on concept clustering, which can be applied to Web search. Background technique [0002] With the development of the Internet, online resources grow rapidly at an exponential rate. Currently, Web search engines have become the primary tool for users to obtain network resources. However, compared with the limited information needs of users, the huge amount of information is still easy to cause problems such as "information overload" and "information confusion". User interest information is a relatively stable and long-term information demand, so it is necessary to establish an effective user interest model to provide users with personalized information services. User interest model provides a structured description of user interest. From the analysis of the structure adopted, common user interest representation methods include representation based on linear model, representation based on ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 刘永利欧阳元新张平安熊璋
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products