Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Detection and filter method of network community garbage information based on topic consensus coverage rate

A spam and network community technology, applied in the field of network community spam detection and filtering based on topic consensus coverage, can solve the problems of not taking into account the convergence of posting content, low detection accuracy and recall rate, etc.

Inactive Publication Date: 2013-05-08
WUHAN UNIV
View PDF5 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the above detection methods do not take into account the similarity of posting content under the same topic. Therefore, for those spam messages that are similar to normal posting content and appear relatively hidden, the detection accuracy and recall rate of these methods are not high.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Detection and filter method of network community garbage information based on topic consensus coverage rate
  • Detection and filter method of network community garbage information based on topic consensus coverage rate
  • Detection and filter method of network community garbage information based on topic consensus coverage rate

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0078] The present invention will be further described below in conjunction with accompanying drawing.

[0079] figure 2 It is the principle of the implementation scheme of the embodiment of the present invention, which is divided into the following processes: firstly, the content of the website is sampled, and some main posts and their replies are extracted to form a training set, and the rest of the website content is a set to be detected; The content of each reply in the training set is classified, and some of the reply content is manually identified as spam, as shown in (1), and the rest of the reply is marked as normal content; for each group of main posts and replies in the training set and the set to be tested, calculate their The word frequency vectors and text feature values ​​corresponding to the text content of the text content, and the word frequency vectors of these text contents are aggregated to obtain the topic consensus corresponding to each group of main pos...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a detection and filter method of network community garbage information based on a topic consensus coverage rate, belongs to the research category of data quality, relates to the technical field of feature research of user behaviors, evaluation of network information quality, feature value extraction of text content, building and optimization of a text classification model, and the like. Mainly aiming at the situation that an effective automatic detection and filter mechanism for the network community garbage information does not exist, a garbage information detection model is built, a topic convergence restricted relationship is constructed according to main topic content and normal replay content, a feature value of the topic consensus coverage rate is provided and applied to a text classifier, and accordingly automatic detection and filter of the network community garbage information are achieved. The method can be widely applied to the problems of screening of various contents in network community quality management, automatically judge and clean irrelevant advertisements, invalid contents, even malicious opinions, and improve the network community information quality to a certain degree.

Description

technical field [0001] The invention belongs to the research field of data quality, in particular to a method for detecting and filtering spam information in network communities based on topic consensus coverage. Background technique [0002] Topic Consensus: According to an important conclusion in the research field of user behavior characteristics, the content posted by ordinary users who communicate on the same topic in the online community is related to the topic, while users with bad intentions will publish some irrelevant topics. content. J.M.Reagle pointed out in his monograph "Good Faith Collaboration --The Culture of Wikipedia" that the reason why users who post different opinions in the online community can successfully communicate on the same topic is because they have basic knowledge of the topic. Consensus, and believe that the other party that communicates with oneself will also release the reply content according to this basic consensus [1]. And Jim Giles, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
Inventor 李石君汤小月余伟杨莎刘晶丁永刚胡亚慧王凯
Owner WUHAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products