Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-model fusion Chinese vocabulary repeating and extracting method

A multi-model, Chinese word technology, applied in the direction of neural learning methods, biological neural network models, character and pattern recognition, etc., can solve problems such as inability to effectively filter retelling words and poor quality of retelling words

Pending Publication Date: 2021-03-09
HANGZHOU NORMAL UNIVERSITY
View PDF5 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, this method mainly uses a single model to screen and discriminate synonymous words, which cannot effectively filter out wrong paraphrased words, resulting in poor quality of paraphrased words obtained

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-model fusion Chinese vocabulary repeating and extracting method
  • Multi-model fusion Chinese vocabulary repeating and extracting method
  • Multi-model fusion Chinese vocabulary repeating and extracting method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments. It should be understood that these examples are only used to illustrate the present invention and are not intended to limit the scope of the present invention. The operating methods not indicated in the following examples are generally in accordance with conventional conditions, or in accordance with the conditions suggested by the manufacturer.

[0037] The multi-model fusion Chinese vocabulary paraphrase extraction method of this embodiment is as follows: figure 1Shown, specifically, include the steps:

[0038] (A) First extract the text content in the original corpus, and then process the text into sentences, filter the sentences according to the length, remove the longer and shorter sentences, and limit the length of the sentence to the interval of [3,100] Chinese characters, and then use The LTP platform of Harbin Institute of Technology p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-model fusion Chinese vocabulary replication extraction method, and the method comprises the following steps: arranging an original corpus set into a Chinese vocabularyset through text preprocessing, obtaining a corresponding candidate Chinese replication vocabulary based on a pivot method, and obtaining a word vector of each vocabulary by using Word2vec training; finally, calculating model scores of the two word vectors by utilizing a cosine similarity model, a feedforward neural network model and a convolutional neural network model on the basis of introducinga negative sampling mechanism, and performing weighted summation of a certain parameter to finally obtain a final score of each duplicated vocabulary in the candidate Chinese duplicated vocabulary; if the final score is greater than the specified threshold, judging that the group of replication pairs is reasonable, thereby extracting an optimal replication vocabulary set.

Description

technical field [0001] The invention relates to the technical field of repetition vocabulary extraction, in particular to a multi-model fusion Chinese vocabulary repetition extraction method. Background technique [0002] Paraphrase refers to a method of presenting the same semantics in different forms of expression, which can be used to rewrite words or sentences input by users into multiple words and sentences with the same semantics but different forms of expression. Accordingly, it can be used to generate synonymous corpus and expand the size of the corpus. Paraphrase-related research mainly includes the extraction of paraphrase vocabulary and the generation of paraphrase sentences. [0003] Among them, regarding the extraction of paraphrase vocabulary, the main methods include paraphrase vocabulary extraction based on thesaurus, paraphrase vocabulary extraction based on monolingual parallel corpus, and paraphrase vocabulary extraction based on the "pivot method", which...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/242G06F40/30G06K9/62G06N3/04G06N3/08
CPCG06F40/289G06F40/30G06F40/242G06N3/08G06N3/048G06N3/045G06F18/22
Inventor 黄剑平丰仕琦
Owner HANGZHOU NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products