Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Industry classification method and system for text publishing

A classification method and classification system technology, applied in the Internet field, can solve the problem of low accuracy rate and recall rate, and achieve the effect of improving the accuracy rate and recall rate

Active Publication Date: 2013-09-25
TENCENT TECH (SHENZHEN) CO LTD
View PDF9 Cites 46 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the main purpose of the present invention is to provide an industry classification method and system for published texts to solve the problem of low accuracy and recall of text industry classification methods in existing information retrieval systems

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Industry classification method and system for text publishing
  • Industry classification method and system for text publishing
  • Industry classification method and system for text publishing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The technical solution of the present invention will be further described in detail below in conjunction with the drawings and specific embodiments.

[0042] The invention organizes the complex industry categories into a two-level hierarchical category system, and uses different methods to automatically classify the first-level industry and the second-level industry, so that the accuracy and recall rate of each industry category are optimized status.

[0043] The classification methods of the first-level industries mainly include: the initial first-level industry category feature words set based on manual labeling (the set includes a small number of manually-labeled first-level industry category feature words), and full text matching is used for hundreds of millions of web pages The classification method of each web page is classified; the full-text word segmentation is performed on the web pages with classification attributes, the category feature words are extracted, and t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an industry classification method and system for text publishing. The method comprises the following steps: conducting first level industry category feature word collection excavating and second level industry category module training, and establishing a two-level hierarchical category system composed of the first level industry category feature word collection and the second level industry category feature word collection; conducting first level industry and second level industry classification to the published text according to the two-level hierarchical category system. The method and system provided by the invention can improve the accuracy and recall rate of the industry category of the published text.

Description

Technical field [0001] The present invention relates to the field of Internet technology, in particular to an industry classification method and system for publishing text. Background technique [0002] At present, the industry classification of search retrieval strings (query), and search terms and published texts submitted by users in an information retrieval system is essentially a short text classification technology. Based on the application scenarios of the information retrieval system, it is usually necessary to manually label the classification system. Common text classification methods in the prior art include: Naive Bayes (Bayes), Neural Network (Nnet), Support Vector Machine (SVM, Support Vector Machine), k-Nearest Neighbor (kNN, k-Nearest Neighbor) statistical classification Algorithms and classification methods based on manual inference rules. [0003] Limited by the application scenarios of the information retrieval system, and the characteristics of the short and s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 叶莎妮姚伶伶朱鉴王迪
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products