An impression label extraction method and device

An impression and label technology, applied in the field of data processing, can solve the problems of large amount of training data, low recall rate, low accuracy rate, etc.

Active Publication Date: 2019-04-09
BEIJING GRIDSUM TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Among them, the existing rule-based extraction methods mainly use dictionaries to match words that appear in sentences. If three elements appear in a short sentence in the text at the same time: target product type, product attribute, and evaluation impression, then these three elements Elements are extracted as an impression label. However, the target product, attribute, and evaluation impression that appear simultaneously in a short sentence may not have any subordination or modification relationship with each other. Therefore, the existing rule-based extraction of impression labels Although the method has a high recall rate, its precision rate is low
However, the existing algorithm-based extraction methods mainly use training data to train extraction models such as CRF or LSTM models. This method requires a large amount of training data when training the extraction model, and the recall rate of this method is low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An impression label extraction method and device
  • An impression label extraction method and device
  • An impression label extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0060] Specific embodiments of the present application will be described in detail below in conjunction with the accompanying drawings.

[0061] figure 1 It is a schematic flow chart of the impression tag extraction method provided in the embodiment of the present application. Such as figure 1 As shown, the method includes:

[0062] S11: Obtain the document of the impression label to be extracted.

[0063] It should be noted that in the embodiment of the present application, the document to be extracted with the impression tag may be a document crawled from the Internet by a web crawler.

[0064] S12: Extract preset product attributes and their corresponding evaluation impressions from the document, record the position information of the preset product attributes in the document, and record the preset product attributes and their corresponding evaluations Impression pairs are combined to form the first dyad.

[0065] The specific implementation of this step will be descri...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an impression label extraction method and device. According to the method, a first binary group and a second binary group which are obtained through combination are combined into an impression label according to position information of attributes in a document. Therefore, according to the method, the impression label existing in the document can be extracted from the document without training data. The method comprises the following steps of: extracting a preset product attribute from a document through character string matching; evaluating the impression language and apreset product type; Therefore, the method can meet a certain recall rate, and related product types, product attributes and product attribute evaluation impressions are associated with one another in the document, so that the first binary group and the second binary group are combined into the impressions labels by using the position information of the product attributes in the document as a bridge. And the extracted impression label can meet a certain accuracy rate. The invention further discloses a storage medium and a processor.

Description

technical field [0001] The present application relates to the technical field of data processing, in particular to a method and device for extracting impression tags. Background technique [0002] Impression tags are used for impression views about a certain attribute of a certain product. Generally speaking, the impression tag is a triplet composed of three elements, which can be specifically <target product type (target), product attribute (aspect), evaluation impression language (opinion)>, and the extraction of impression tags is done in text analysis In , the impression of a certain attribute of a product is extracted from a large amount of user evaluation data. For example: the triplet that needs to be extracted from the text "Ford Maverick looks stylish and wild" is <Ford Maverick, shape, stylish and wild>; the triplet that needs to be extracted from the text "Volkswagen Polo is narrow" is < VW Polo, space, cramped >. [0003] Existing image tag e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/205G06F40/284
Inventor 马庆丽
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products