A generative summarization method based on image-text fusion

A generative and abstract technology, applied in the field of generative abstract generation based on image-text fusion, can solve the problem of missing key entities in generative abstracts, and achieve the effect of alleviating the problem of unregistered words

Active Publication Date: 2022-05-31
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] This application proposal can solve the problem of missing key entities in existing generative summaries, thereby improving the quality and readability of generated summaries

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A generative summarization method based on image-text fusion
  • A generative summarization method based on image-text fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

Like retrieved through a search engine. Finally, the (X, Y, I) triple dataset is obtained after manual screening, which includes training

The set is 66000 samples, and the validation set and test set are 2000 samples respectively.

[0050] Step 1, preprocessing the data set.

[0051] Step 1.1, the given original data set is text, abstract and image one-to-one correspondence, namely (X, Y, I).

[0052] In step 1.2, special characters, emoticons, and full-width characters, such as "¥", "300", etc., are removed from the text and abstract at the same time.

Step 1.3, the data set obtained by step 1.2, use " TAGURL " to replace all hyperlink URLs, use

"TAGDATA" replaces all dates, "TAGNUM" replaces all numbers, and "TAGPUN" replaces all punctuation.

[0054] Step 1.4, since the MMSS is a sentence-level summary, and the text is shorter, the corresponding data sets are not filtered.

stop words.

Step 1.5, the preprocessed text summary images (X, Y, I) are shuffled in one-to-one c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a generative abstract generation method based on image-text fusion, the steps of which include: 1) dividing a given text data set into a training set, a verification set and a test set; wherein, each sample in the text data set is a triplet (X, I, Y), X is the text, I is the image corresponding to the text X, and Y is the summary of the text X; 2) Extract the entity feature from the image of the text data set, and extract the entity The feature is represented as an image feature vector of the same dimension as the text; 3) Use the training set and the image feature vector corresponding to the training set to train the generative summary model; 4) Input a piece of text and the corresponding image and generate the image feature vector of the image, Then input the text and its corresponding image feature vector into the trained generative summarization model to obtain the corresponding summarization of the text. The summary generated by the invention can effectively adjust the weight of entities in the text, and alleviate the problem of unregistered words to a certain extent.

Description

A generative summary generation method based on image-text fusion technical field [0001] The invention belongs to the field of artificial technology, and relates to a method for generating a generative abstract based on image-text fusion. Background technique [0002] Existing generative summarization methods are mainly implemented based on the deep learning seq2seq framework and attention mechanism. The Seq2Seq framework is mainly composed of an encoder (encoder) and a decoder (decoder), both of which are encoded and decoded by neural networks. Implementation, the neural network can be a recurrent neural network (RNN) or a convolutional neural network (CNN). The specific process is as follows, the encoder Encode the input original text into a vector (context), which is a representation of the original text. Then, the decoder negative It is responsible for extracting important information and generating text summaries from this vector. Attention mechanism in order t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/34G06F16/35G06F16/36G06T11/60
CPCG06F16/345G06F16/35G06F16/367G06T11/60
Inventor 曹亚男徐灏尚燕敏刘燕兵谭建龙郭莉
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products