Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for recognizing text segments by using sequence annotation

A sequence labeling and text technology, applied in text database clustering/classification, unstructured text data retrieval, semantic analysis, etc., can solve problems such as high application cost, lack of context information, and high computational complexity

Active Publication Date: 2020-05-22
零氪科技(天津)有限公司
View PDF2 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method lacks context information, and it is difficult to cope with the scattered clause types.
[0004] There is also a deep learning question answering model method, which can achieve end-to-end recognition, but it has relatively high requirements for data volume and data quality, and at the same time, the computational complexity is also high, and the application cost is relatively high.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for recognizing text segments by using sequence annotation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] see below figure 1 The method for identifying text segmentation using sequence annotation in the present invention is described in detail, including the modeling training phase consisting of steps S100 to S400, the identifying text segmentation phase of step S500, and the application phase of step S600. The detailed description is as follows :

[0023] S100: Divide the text into several clauses, and obtain a set of semantic feature vectors of each clause.

[0024] Suppose a corpus sample set S1, the set S1 includes multiple text fields, and each text field is expressed as P i , i is a natural number greater than or equal to 1. For the text field P i Segment by punctuation to get clause S ij , the logical relationship between them is P i = S i1 , S i2 ,...,S ij . The clause S ij A set S2 of tokens describing different text fields is formed. For example, in a paragraph, some sentences are inspection descriptions (corresponding to the beginning of the paragraph)...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for recognizing text segments by using sequence annotation, which comprises the following steps: A, respectively segmenting different text segments of a sample set intoclause sets, and annotating the clause sets by using semantic feature vectors to form semantic feature vector sets; B, performing clustering training on the semantic feature vector set to obtain a clustering model, and performing cluster numbering on each object of the clustering model to form a sequence model; C, establishing mapping between the sequence model and the different text fields, andtraining a sequence labeling model for the mapped cluster sequence; and D, sequentially applying the sequence model and the sequence labeling model, and segmenting the text to be segmented. The methodperforms standardized modeling by taking the sample set as a database template. And during subsequent text segmentation recognition, the method includes standardizing the sentence pattern model in the to-be-segmented text, and mapping the standardized sentence to the sentence features according to the model, so that different expressions representing the same semantics can be expressed to complete text segmentation recognition.

Description

technical field [0001] The invention relates to the technical field of word processing, in particular to a method for identifying text segments using sequence annotation. Background technique [0002] With the advent of the era of artificial intelligence, the requirements for the ability of machines to understand large sections of text are getting higher and higher. Due to the complex information content and various writing methods of long texts, it is difficult for machines to directly understand them. Therefore, long texts can be decomposed first, and their key information fragments can be disassembled, and then the fragment information can be further extracted and analyzed one by one. It is a mainstream practice in the industry to adopt the method of dividing and conquering. [0003] In the division of target fragments for long texts, a multi-classification method is currently used, that is, a multi-classification model is established for the clauses, and then the clause...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/30G06F40/289G06F16/35G16H10/60
CPCG16H10/60
Inventor 罗立刚刘辉张正宽张天泽常涛王玲
Owner 零氪科技(天津)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products