Deep learning-based text similarity detection method for financial industry

A text similarity, deep learning technology, applied in the field of text similarity detection based on deep learning, can solve the problems of partial tree structure, complex syntax tree, biased and so on in the comparison process

Active Publication Date: 2019-09-03
SOUTH CHINA UNIV OF TECH +1
View PDF5 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] But simply considering the coincidence of words is biased in some fields
At the same time, for the method of constructing the syntax tree, if the

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep learning-based text similarity detection method for financial industry
  • Deep learning-based text similarity detection method for financial industry
  • Deep learning-based text similarity detection method for financial industry

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0087] A text similarity detection method based on deep learning for the financial industry, comprising the following steps:

[0088] S1, perform sentence segmentation and word segmentation processing on the Chinese text. Because Chinese text cannot be segmented by spaces, it is necessary to establish a proper noun lexicon, and further use the conditional random field to segment the words, and then remove stop words, numbers and letters.

[0089] S2, use the Bi-LSTM-RNN model to sequentially take out each word in the sentence, extract its information, and embed it into the semantic vector to obtain the semantic representation of the sentence.

[0090] S3, using the semantic information extracted by the neural network, uses non-deterministic automata to analyze the logical structure of the sentence, organizes the sentence into a tree structure, and finally expresses the paragraphs in the form of a vector tree. See image 3

[0091] S4, match the semantic tree extracted from ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a deep learning-based text similarity detection method for a financial industry, and the method comprises the steps: S1, building a special noun lexicon, obtaining a conditionalprobability model based on a conditional random field, and carrying out the probability calculation through the conditional probability model; S2, using a Bi-LSTM-RNN model to take out each word in the sentence according to the sequence, extracting the information of the word, and embedding the information into a semantic vector, thereby obtaining the semantic representation of the sentence; S3,analyzing a logic structure of the sentence according to the semantic information extracted by the neural network, organizing the sentence into a tree structure, and finally expressing the paragraph according to a vector tree mode; and S4, matching the vector tree extracted from the text with a historical data document in a database, and comparing similarities from two angles respectively, one being the similarity between the vector trees, and the other being the similarity between every two nodes, so as to finally obtain a result.

Description

technical field [0001] The invention belongs to the field of natural language processing, and specifically designs a text similarity detection method based on deep learning for the financial industry. Background technique [0002] With the development of information technology and artificial intelligence, more and more data are sorted, how to make good use of these large amounts of historical data has become very important. For many technology companies, historical data generated by users is an extremely precious asset. At this stage, many numerical and structured storage data have been well utilized, and various data mining methods emerge in endlessly, but for natural language data, they have not yet been well structured and mature. mining technology. At present, there are documents written in natural language in many fields, and these historical documents can also guide new problems. Due to the characteristics of the language in the Chinese field, natural language proce...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F17/27G06N3/04G06N3/08
CPCG06N3/08G06F40/289G06F40/30G06N3/044G06N3/045Y02D10/00
Inventor 杜广龙陈震星李方梁殷浩罗静邓勇达
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products