Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Business name extraction method of a company name

A technology for trade names and companies, applied in the field of trade name extraction of company names, which can solve the problems of a large number of manual labeling and high cost

Pending Publication Date: 2019-03-26
INSPUR SOFTWARE CO LTD
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the disadvantage is that a lot of manual labeling is required, and the cost of manual labeling is relatively high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Business name extraction method of a company name
  • Business name extraction method of a company name
  • Business name extraction method of a company name

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0045] The specific implementation method is as follows:

[0046] First of all, data preprocessing is carried out: the data extracted from the enterprise directory database includes the unique identification number of the enterprise and the enterprise name. It can be seen from the data that it contains many punctuation marks, spaces, and blank lines. as follows:

[0047]

[0048] This information is formed by misoperation during the input process, so it is first necessary to remove these redundant information, which can be effectively removed through regular expressions. The case where the business name is an empty string is not considered.

[0049] Then construct the administrative division dictionary: from the public administrative division network, there are complete division codes and names from the country, province (municipalities), cities, counties (districts), townships (sub-district offices), and villages (communities). We sort out the dictionaries that meet the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a business name extraction method of a company name. The specific method comprises the following steps: firstly, text preprocessing is carried out, and the smallest word meaning unit is obtained through word segmentation preprocessing of the text; Secondly, the dictionaries of administrative divisions, organizational forms and trade dictionaries are constructed according tothe format requirements, and the dictionaries of administrative divisions, organizational forms and trade dictionaries are loaded into the word segmentation machine in the form of user-defined dictionaries to accurately segment words. the position information of administrative divisions and industries are obtained in the string; location information of the firm through the administrative divisionand the industry location information are calculated and obtained; According to the location information of the firm, the character string of the firm is extracted. Compared with the prior art, the trade name extraction method of the company name of the invention reduces the tedious work of manual labeling, and reduces the labor cost and the time cost.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a company name extraction method of a company name. Background technique [0002] The business name extraction of company names has applications in many fields, such as the completion of search engine input boxes and the matching algorithm in entity links of company names. At present, the company name mainly consists of the following four parts, the name of the administrative division where the company is located, the company's trade name or name, industry, and organizational form. Due to the particularity of company naming, tokenizers in the field of natural language processing generally cannot separate the business name of the company name. Although the current machine learning (including deep learning) has certain advantages in accuracy, such as: an existing system and method for extracting company name components based on deep learning (application number...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/36G06F16/33
Inventor 王本强谢超周庆勇
Owner INSPUR SOFTWARE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products