Text classification technology-based information processing method

An information processing method and a text classification technology, which are applied in text database clustering/classification, electronic digital data processing, special data processing applications, etc., can solve problems such as difficulty in information resource retrieval, improve feature information content, and have strong practicability , improve the effect of classification

Inactive Publication Date: 2016-12-07
HEFEI MINZHONGYIXING SOFTWARE DEV CO LTD
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

While such a large amount of information has brought great convenience to people's lives, there are also some increasingly prominent problems: First, a large number of information resources bring difficulties to retrieval, and the retrieval results based on keywords will have many irrelevant results. documents; the second is the problem of network security; how to quickly and effectively discover useful knowledge and information with great potential value in these massive and heterogeneous massive information resources; how to reasonably classify and accurately locate the required information, and simultaneously process a large number Useless or irrelevant content has become the bottleneck of knowledge acquisition and information filtering and a hot topic in today's network security technology

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] An information processing method based on text classification technology, comprising the steps of:

[0024] (1) Text preprocessing

[0025] Filter information containing irrelevant text information and error text information. The processing of irrelevant text information is mainly to delete tags, scripts and non-text objects that describe web page information. For the error message, this article mainly focuses on the error message of splitting Chinese characters into radicals. The HTML text tag weighting scheme is used to preprocess the text. Before scanning the HTML document, the HTML tag needs to be correctly identified first. and processing, and weighting of text in different parts of web pages according to HTML tags;

[0026] (2) word segmentation processing

[0027] Utilize Chinese automatic word segmentation system to carry out word segmentation processing to the text after preprocessing, described Chinese automatic word segmentation system is Chinese lexical a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification technology-based information processing method. A text is preprocessed by adopting an HTML text mark weighting scheme; before an HTML document is subjected to scan processing, an HTML mark needs to be correctly identified and processed first and texts of different parts of a webpage are subjected to weighting processing according to the HTML mark; descriptive information such as titles, page descriptions, keywords, hyperlinks and the like can be reserved, so that the classification effect is improved; a symbol dictionary is established for filtering non Chinese characters, so that the dimension of an initial text vector is reduced and characteristic information content in the text vector is increased; and stop words are removed, so that the subsequent text filtering accuracy can be improved and the subsequent text filtering rate can be increased. The information processing method is simple in operation and high in practicality, and is capable of improving subsequent information filtering accuracy and efficiency.

Description

technical field [0001] The invention belongs to the field of network methods, and more specifically, the invention relates to an information processing method based on text classification technology. Background technique [0002] At present, network information presents the characteristics of huge quantity, extensive content, and various forms. As far as my country is concerned, relevant survey reports show that by the end of 2005, the total number of national web pages was 187 million, including information on science and technology, news, education, business, entertainment, etc. content. While such a large amount of information has brought great convenience to people's lives, there are also some increasingly prominent problems: First, a large number of information resources bring difficulties to retrieval, and the retrieval results based on keywords will have many irrelevant results. documents; the second is the problem of network security; how to quickly and effectively d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/35G06F40/279
Inventor 董雄飞
Owner HEFEI MINZHONGYIXING SOFTWARE DEV CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products