Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Key phrase generation method and device based on pre-training model and storage medium

A technology for key phrases and training models, which is applied in semantic analysis, character and pattern recognition, and natural language data processing. The effect of extraction

Pending Publication Date: 2022-01-14
达而观数据(成都)有限公司
View PDF3 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Traditional keyword extraction algorithms are divided into two categories: unsupervised methods and supervised methods. The keyword classification method based on supervised learning, such as the method, device, and Equipment and storage media, the premise of this type of method is that a large number of text predictions and corresponding keyword tags are required, and a large number of neural network model training can be used to obtain an effective model, so the acquisition cost is high. The process is cumbersome, and it is not suitable for scenarios without a large amount of labeled data and computing resources
[0003] In the unsupervised method, the keyword extraction method based on TF-IDF is commonly used. The keywords are weak, scattered or far away from the topic, so that the extracted keywords cannot express the overall semantics of the text. The existing technology such as application An invention patent with CN201710369600.9 discloses an improved TF-IDF keyword extraction algorithm and an invention patent with application number CN201410056332.1 discloses a method for automatically extracting key phrases from patent documents

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Key phrase generation method and device based on pre-training model and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The technical solution of the present invention is further described below in conjunction with specific embodiments:

[0035] A method for generating key phrases based on a training model, the method comprising the following steps:

[0036] S1. Obtain text data to be processed;

[0037] For example, the input text content is: "Waveform generator based on single-chip microcomputer and FPGA, the core technology is direct digital frequency synthesis. FPGA integrates modules such as fixed frequency divider, single-chip computer communication module, waveform synthesizer and waveform selection, and the output 8 The bit data is converted by D / A and amplified by power to obtain the desired waveform. The single-chip microcomputer controls the keyboard and display module, providing a good human-machine interface. After design and circuit testing, the system can generate sine waves, triangle waves and square waves, etc. Waveform, flexible control, good output waveform performanc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a key phrase generation method based on a training model. The method comprises the following steps: S1, obtaining to-be-processed text data; S2, performing word segmentation and part-of-speech tagging on the acquired text data; S3, establishing a disabled lexicon, and removing words existing in the disabled lexicon; filtering out words which are not verbs and nouns; S4, performing N-gram combination to obtain a candidate word combination; S5, performing text vector conversion on the text data and the candidate word combination based on a pre-training model of Bert; S6, performing cosine similarity calculation on the vector representation of the document level and the vector representation of the candidate word, and performing semantic similarity sorting; and S7, according to a set value, selecting the words or phrases with the semantic similarity ranks in the top in the step S6 to form keywords. According to the method, the open-source pre-training model Bert is used for carrying out text vectorization expression, information of the semantic level of the text is completely obtained, keyword extraction is facilitated, phrase-level keywords are obtained according to N-gram combination, and the meaning is more complete compared with single words.

Description

technical field [0001] The invention relates to the field of text natural language processing, in particular to a key phrase generation method, device and storage medium based on a pre-trained model, which are used to identify and extract key phrase content in documents, filter redundant content, and quickly obtain valuable information. Background technique [0002] Traditional keyword extraction algorithms are divided into two categories: unsupervised methods and supervised methods. The keyword classification method based on supervised learning, such as the method, device, and Equipment and storage media, the premise of this type of method is that a large number of text predictions and corresponding keyword tags are required, and a large number of neural network model training can be used to obtain an effective model, so the acquisition cost is high. The process is cumbersome, and it is not suitable for scenarios without a large amount of labeled data and computing resource...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/335G06F40/194G06F40/289G06F40/30G06K9/62
CPCG06F16/335G06F40/194G06F40/289G06F40/30G06F18/214
Inventor 文敏陈运文纪达麒侯聪吴万杰
Owner 达而观数据(成都)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products