Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-modal data expansion method and system, medium, computer equipment and terminal

A multi-modal, data technology, applied in computer parts, computing, image data processing, etc., can solve the problems of accurate correction of semantic change text description, time-consuming and labor-intensive, low efficiency, etc., to achieve good data expansion effect and data expansion efficiency. High, enhance the effect of training effect

Pending Publication Date: 2022-04-26
XIDIAN UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] (1) The traditional method of manually collecting and labeling data is time-consuming, laborious and inefficient
[0006] (2) In the context of multimodal machine learning, traditional methods are difficult to meet the needs of data augmentation
[0007] (3) The existing data augmentation methods may lead to the loss of image semantics, and these semantic changes are currently difficult to automatically correct accurately on the text description
[0008] The difficulty of solving the above problems and defects is as follows: (1) It needs to consume a lot of labor costs, which is difficult
The above problems (2) and (3) have no unified solution at present, and it is very difficult to solve them.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-modal data expansion method and system, medium, computer equipment and terminal
  • Multi-modal data expansion method and system, medium, computer equipment and terminal
  • Multi-modal data expansion method and system, medium, computer equipment and terminal

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0072] In view of the problems existing in the prior art, the present invention provides a multi-modal data expansion method, which can automatically perform data expansion without changing the semantic information of any modal data.

[0073] 1. Program Description

[0074] Assuming a multimodal data set D={(I 1 , T 1 ),(I 2 , T 2 ),...,(I n , T n )}, where I i is a picture, T i is a piece of text corresponding to the picture, (I i ,T i ) to form a pair of samples, and there are n pairs of samples in the data set. For such data, the general process is to extract I respectively i Characteristics and T i Characteristics Then based on the multimodal machine learning model pair and Modeling the relationship between, so in fact constitute a pair of training samples. In particular, extract Divided into two steps, the first step is through the convolutional neural network target detection model from I i All the target objects in the picture are detected in , a...

Embodiment 2

[0093] This example describes the implementation process of one stitching. Taking the picture set I in the "COCO Caption train2014" data set as an example, k=2, m=10, that is, two pictures are stitched together, and each picture takes 10 detection targets Characteristics.

[0094] 1. Image stitching

[0095] Picture I with the label 000000190141 in I 190141 As an example, randomly fetch the picture collection {I 190141 , I 202099}, for k=2, this embodiment adopts the way of splicing left and right, splicing into pictures Stitching does not change the aspect ratio of the two pictures, before splicing I 190141 The resolution is 640*423, I 202099 The resolution is 640*480, because I 190141 and I 202099 The width is different, the unaligned part is filled with 0 value during splicing, after splicing The resolution is 1280*480. Figure 3 shows the images before and after stitching.

[0096] 2. Get the collection of detection frames

[0097] In this embodiment, the Faste...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of computer data processing, and discloses a multi-modal data expansion method and system, a medium, computer equipment and a terminal, under the condition that semantic information of any modal data is not changed, image features are disturbed by adjusting the image content in a receptive field of a target detection model, and the data expansion efficiency is improved. Therefore, the data expansion is automatically performed, the effects of reducing the labor cost and improving the data expansion efficiency are achieved, and the abundant data provided by the method can improve the performance of downstream tasks. According to the multi-modal data expansion method provided by the invention, data expansion is carried out by expanding the image features, and data expansion can be automatically carried out under the condition that semantic information of any modal data is not changed. Therefore, the semantic information of any mode in the multi-mode training data is not changed, and the data expansion effect is good. Meanwhile, data expansion can be automatically carried out, the labor cost is low, and the data expansion efficiency is high.

Description

technical field [0001] The invention belongs to the technical field of computer data processing, and in particular relates to a multimodal data expansion method, system, medium, computer equipment and terminal. Background technique [0002] At present, with the development of multimedia and Internet technology, it has become a common phenomenon to use multimodal information to describe events and things, for example, combining image and text modalities for news reporting, combining video and audio modalities for short video make etc. Generally speaking, there is a correlation between data of different modes that appear at the same time, and analyzing this correlation is of great significance for data mining and data protection. At present, in the field of multimodal machine learning, related research directions include: Image Captioning, Cross-modal Retrieval, Visual Question Answering, etc., which provide open source data sets , these datasets provide support for technica...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06V10/44G06V10/764G06V10/82G06K9/62G06N3/04G06N3/08G06T3/40
CPCG06N3/08G06T3/4038G06T2200/32G06N3/045G06F18/241
Inventor 李晖张剑吴杰彭莹
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products