Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for classifying short texts based on probability topic

A technology with short text and themes, applied in transmission systems, digital transmission systems, and devices that provide special services in branch offices, etc. It can solve problems such as true relationships relying on application backgrounds, and achieve accurate information distribution, authoritative responses, and strong logical correspondence. Effect

Inactive Publication Date: 2010-01-06
北京百问百答网络技术有限公司
View PDF2 Cites 39 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Generally speaking, this method can more accurately discover the correlation between words, but in many cases, the real relationship between words depends on the specific application background

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for classifying short texts based on probability topic
  • Method and system for classifying short texts based on probability topic
  • Method and system for classifying short texts based on probability topic

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0065] The invention discloses a method and system for classifying short texts based on probability topics, which can find out the real relationship between words according to the probability topics, and then calculate the similarity between short texts. To efficiently locate the target data.

[0066] In a mail server, a search engine server, a firewall of a mobile communication system, or a data processing device at the server end of a question answering system, a system for classifying short texts based on probabilistic topics is set.

[0067] Taking the question answering system as an example, the question answering system is an online interactive system, also called a question answering system, which is a computer processing system that realizes user interactive question and answer. And please refer to the Chinese patent whose patent application number is 200510130778.5.

[0068] The technical problem solved by the present invention in the question answering system is to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and a system for classifying short texts based on the probability topic; the method is used in a data processing device of a question answering system for classifying the short texts according to the similarity of the short texts. The method comprises the following steps: respectively acquiring initialized text vectors according to an input target short text and a short text acquired from the database of the question answering system; scanning the two short texts to respectively acquire differentiating words of the two short texts; when the relevance degrees of the differentiating words of the two short texts and a probability topic are higher than a threshold, modifying the text vectors of the two short texts according to the relevance degrees; working out the similarity of the two short texts according to the modified text vectors of the two short texts; acquiring another short text form the database of the question answering system till all short texts in the database are traversed; and then executing a scanning step; classifying the target short text according to the similarity.

Description

technical field [0001] The invention relates to the field of information technology for text processing by using data mining, in particular to a method and system for classifying short texts based on probability topics, a short text retrieval method, and a spam identification method. Background technique [0002] Today, with the rapid development of information technology, users can obtain a large amount of information through various channels. For example, by browsing the web, using search engines to retrieve, receiving text messages, emails, operating network question-answering systems, etc. However, the usual problem is that there is a huge amount of data and a lack of information. [0003] For example, a large number of emails appear in the mailbox, including both normal work emails or personal emails, and spam emails. The received text messages include a large number of useless advertisement text messages. In the network question answering system, when a user asks a ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L12/18H04L12/58
Inventor 刘文印权小军张加龙
Owner 北京百问百答网络技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products