Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A company industry classification calculation method based on natural language processing

A technology of natural language processing and calculation method, applied in the Internet field, it can solve the problems of difficult company characteristics to formulate industry classification rules, new companies emerge in an endless stream, and classification accuracy is reduced, so as to save manpower and material resources, reduce repetitive labor, and increase real-time performance. Effect

Inactive Publication Date: 2019-05-03
厦门笨鸟电子商务有限公司
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] (1) Manual method: There are knowledge barriers among various industries, and a large number of industry experts are required to effectively complete the labeling, which consumes a lot of manpower and material resources;
[0004] (2) Rule method: the number of companies is huge, it is difficult to formulate industry classification rules taking into account the characteristics of all companies; and new companies emerge in endlessly, it is difficult to update in time; at the same time, the formulation of rules requires the participation of a large number of people, which is difficult to implement;
[0005] (3) Traditional classification methods: feature extraction processing is required, and the document loss information after processing may easily lead to a decrease in classification accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to further explain the technical solution of the present invention, the present invention will be described in detail below through specific examples.

[0024] The products produced by a company or the services it provides reflect the company's own characteristics, and the company's industry classification can be analyzed by calculating the similarity of the own characteristics.

[0025] The present invention is a kind of company industry classification computing method based on natural language processing, comprises the following steps:

[0026] Step 1. Data Acquisition

[0027] By crawling the webpage data, the text data containing the text descriptions of the products or services of the pre-classified companies is acquired.

[0028] The above web page data comes from the pre-classification company's official website homepage, first-level page, social network homepage or enterprise yellow pages and other platforms where the pre-classification company will pu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a company industry classification calculation method based on natural language processing. Text data of a pre-classification company are obtained through a crawler, feature extraction, noise reduction processing and word vector training are conducted on the text data, hierarchical classification is conducted on the text data after a language model and a transfer learning pre-training classification model are adopted, and classification of a target company is achieved. The method is simple in process, high in efficiency and capable of saving manpower and material resources; according to the method, about 30 primary classifications and about 300 secondary classifications can be obtained through a hierarchical classification system, so that the classification accuracyis greatly improved; the model can accept text input of different lengths and forms, the model does not need to be adjusted at all, the application range is wider, and the practicability is higher.

Description

technical field [0001] The present invention relates to the field of Internet technology, in particular to a method for calculating company industry classification based on natural language processing. Background technique [0002] In data search, accurate industry positioning can help users quickly judge whether the target company meets their needs. The existing industry classification mainly uses manual methods to mark the company's industry category, formulate industry classification rules to judge the company's industry or traditional classification methods (such as support vector machines / decision trees, etc.), there are the following problems: [0003] (1) Manual method: There are knowledge barriers among various industries, and a large number of industry experts are required to effectively complete the labeling, which consumes a lot of manpower and material resources; [0004] (2) Rule method: the number of companies is huge, it is difficult to formulate industry cla...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35
Inventor 王凯锋吴承霖金立达
Owner 厦门笨鸟电子商务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products