Chinese short text correlation measurement method based on CNN convolutional layer and BILSTM

A technology of correlation measurement and similarity measurement, which is applied in the direction of neural learning methods, text database query, unstructured text data retrieval, etc., can solve the problems of affecting accuracy and losing some features, so as to achieve good accuracy and avoid features Loss, the effect of fast calculation efficiency

Active Publication Date: 2021-11-19
小鱼亲测科技(广州)有限公司
View PDF11 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention aims to solve the problem that the existing convolutional neural network-based text correlation measurement method will lose some features and affect its accuracy, and provides a Chinese short text correlation measurement method based on CNN convolution layer and BiLSTM

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese short text correlation measurement method based on CNN convolutional layer and BILSTM
  • Chinese short text correlation measurement method based on CNN convolutional layer and BILSTM
  • Chinese short text correlation measurement method based on CNN convolutional layer and BILSTM

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in combination with specific examples and with reference to the accompanying drawings.

[0030] 1. Siamese network structure

[0031] Siamese network is a framework of neural network, which is used for nonlinear measurement and learning similarity information. Siamese originally meant "Siamese" or "Thai", and later it was "twin" and "siamese" in English. It was first proposed It is used to verify whether the check signature is consistent with the signature reserved in the bank, and it is applied in many fields. The Siamese network generally has two inputs, and these two inputs will enter two identical neural network structures, which can be CNN or LSTM, etc., and these two sub-networks can share weights, and finally map the input to a new space, The vector representation in the new space is formed, and then the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention discloses a Chinese short text correlation measurement method based on CNN convolutional layer and BiLSTM. The twin neural network is used as the framework, and the Chinese word vector trained by Word2vec is used as the input. First, the text is extracted by CNN without the pooling layer. The n-gram information is used to simulate the word segmentation process of Chinese text; then it is input into the BiLSTM network to continue to extract text features of different granularities, and to encode text semantics more accurately. Finally, the text is vectorized, and the correlation is represented by calculating the distance of two vectors. The present invention extracts the n-gram information of the text by using the CNN that removes the pooling layer, which can effectively avoid the feature loss caused by the pooling layer, has better accuracy in the correlation measurement of Chinese short texts, and is more efficient in calculation Faster, does not require high configuration.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a method for measuring the relevance of short Chinese texts based on a CNN (Convolutional Neural Network) convolutional layer and a BiLSTM (Bidirectional Long Short-Term Memory Network). Background technique [0002] Text data accounts for a large part of Internet data. At present, real-time news, article titles, chat records, search questions, product reviews, etc. can all be called text. The research on the relevance measurement of these texts plays a key role in natural language processing tasks such as question answering systems and information retrieval. For the measurement of text semantic relevance, most methods based on convolutional neural network (CNN) are currently used. However, the convolutional layer of the convolutional neural network model mainly extracts local features in the text, not global features, and has been pooled. Layers often lose s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/289G06F40/30G06F16/33G06F16/35G06K9/62G06N3/04G06N3/08
CPCG06F40/30G06F40/289G06F16/3344G06F16/35G06N3/049G06N3/08G06N3/045G06F18/22G06F18/241
Inventor 朱新华吴晗张兰芳陈宏朝郭青松
Owner 小鱼亲测科技(广州)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products