Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text processing method and system based on parallel zero-redundancy long short-term memory network

A long-short-term memory and text processing technology, applied in unstructured text data retrieval, neural learning methods, electrical digital data processing, etc., can solve problems such as unfixed number of cycles, one-way propagation, and non-support for parallel computing, etc., to achieve The effect of high abstraction, improved accuracy and improved efficiency

Pending Publication Date: 2021-11-12
NANKAI UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The structural characteristics of LSTM (long-term short-term memory network) provide the ability to discover the associated information between words, but there are still the following problems: 1) Serial sequence characteristics: LSTM is a one-way model that can only read words in order, It may ignore the local semantic environment of natural language (such as flashbacks, emphasis, etc.), although bidirectional LSTM can provide additional reverse sequences, but its essence is still one-way propagation
Moreover, the design characteristics of LSTM determine that the model does not support parallel computing, and the next word must be processed after the current word is processed, which reduces the text processing speed
2) The number of cycles is not fixed: when the text data is long, the hidden state after multiple iterations may ignore long-span word associations, which brings difficulties to semantic analysis
Usually, shallower models only capture a small amount of contextual information, which is not conducive to the extraction of semantic information
[0006] To sum up, due to the existing LSTM structure, the text processing speed is slow, the amount of semantic information extraction is small, and the semantic analysis is difficult, which ultimately reduces the text processing efficiency.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text processing method and system based on parallel zero-redundancy long short-term memory network
  • Text processing method and system based on parallel zero-redundancy long short-term memory network
  • Text processing method and system based on parallel zero-redundancy long short-term memory network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0075] Such as figure 1 As shown, this embodiment provides a text processing method based on a parallelized zero-redundancy long-short-term memory network, which specifically includes the following steps:

[0076] Step 1: Obtain the text data to be processed and convert it into word embedding vector form.

[0077] Step 2: Adaptively calculate the context window coverage of each word in the text data to be processed according to the number of words contained in the text data to be processed.

[0078] In this embodiment, the calculation process of the coverage of the context window is:

[0079] According to the number of words contained in the text data to be processed and the number of layers of the parallelized zero-redundancy long-short-term memory network and then rounded up, the context window coverage of each word in the text data to be processed is obtained.

[0080] Wherein, the parallelized zero-redundancy long-short-term memory network is pre-trained, and the number ...

Embodiment 2

[0136] This embodiment provides a text processing system based on a parallelized zero-redundancy long-short-term memory network, which specifically includes the following modules:

[0137] (1) A word embedding vector conversion module, which is used to obtain text data to be processed and convert it into a word embedding vector form.

[0138] (2) A context window determination module, which is used to adaptively calculate the context window coverage of each word in the text data to be processed according to the number of words contained in the text data to be processed.

[0139] Wherein, the size of the coverage of the context window determines how many semantic features each word embedding vector corresponds to.

[0140] In the context window determination module, the calculation process of the context window coverage is:

[0141] According to the number of words contained in the text data to be processed and the number of layers of the parallelized zero-redundancy long-shor...

Embodiment 3

[0146] This embodiment provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the text processing method based on the parallelized zero-redundancy long-short-term memory network as described in the first embodiment is implemented. in the steps.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the field of text information processing, and provides a text processing method and system based on a parallel zero-redundancy long short-term memory network, wherein the method comprises the steps of obtaining to-be-processed text data and converting the to-be-processed text data into a word embedding vector form; according to the number of words contained in the to-be-processed text data, adaptively calculating a context window coverage range of each word in the to-be-processed text data; in a parallel zero-redundancy long-short-term memory network, compressing all word embedding vectors within the coverage range of the context window to form a local attention vector matrix, and performing parallel calculation through Hadamard product matrix multiplication to obtain local context vectors corresponding to all word embedding vectors; and processing the local context vector corresponding to the to-be-processed text data through the classification network model to obtain a text classification or labeling result.

Description

technical field [0001] The invention belongs to the field of text information processing, and in particular relates to a text processing method and system based on a parallelized zero-redundancy long-short-term memory network. Background technique [0002] The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art. [0003] With the rapid development and continuous maturity of the new generation of artificial intelligence (AI) technology, the computing power of cloud computing, cluster computing and small servers continues to increase, and the widespread popularity of smart devices, various Internet applications have become part of the daily life of users. Indispensable content. At the same time, natural language applications such as news topic tracking, social computing, and public opinion analysis have also been included in the development plan of e-government and smart government. The...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F16/35G06F40/289G06F40/30G06K9/62G06N3/04G06N3/08
CPCG06F16/3344G06F16/35G06F40/289G06F40/30G06N3/08G06N3/048G06N3/044G06F18/241Y02D10/00
Inventor 卫金茂朴乘锴王宇辰朱亚朋
Owner NANKAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products