Multi-modal data expansion method and system, medium, computer equipment and terminal

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A multi-modal, data technology, applied in computer parts, computing, image data processing, etc., can solve the problems of accurate correction of semantic change text description, time-consuming and labor-intensive, low efficiency, etc., to achieve good data expansion effect and data expansion efficiency. High, enhance the effect of training effect

Pending Publication Date: 2022-04-26

XIDIAN UNIV

View PDF0 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] (1) The traditional method of manually collecting and labeling data is time-consuming, laborious and inefficient

[0006] (2) In the context of multimodal machine learning, traditional methods are difficult to meet the needs of data augmentation

[0007] (3) The existing data augmentation methods may lead to the loss of image semantics, and these semantic changes are currently difficult to automatically correct accurately on the text description

[0008] The difficulty of solving the above problems and defects is as follows: (1) It needs to consume a lot of labor costs, which is difficult

The above problems (2) and (3) have no unified solution at present, and it is very difficult to solve them.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0072] In view of the problems existing in the prior art, the present invention provides a multi-modal data expansion method, which can automatically perform data expansion without changing the semantic information of any modal data.

[0073] 1. Program Description

[0074] Assuming a multimodal data set D={(I 1 , T 1 ),(I 2 , T 2 ),...,(I n , T n )}, where I i is a picture, T i is a piece of text corresponding to the picture, (I i ,T i ) to form a pair of samples, and there are n pairs of samples in the data set. For such data, the general process is to extract I respectively i Characteristics and T i Characteristics Then based on the multimodal machine learning model pair and Modeling the relationship between, so in fact constitute a pair of training samples. In particular, extract Divided into two steps, the first step is through the convolutional neural network target detection model from I i All the target objects in the picture are detected in , a...

Embodiment 2

[0093] This example describes the implementation process of one stitching. Taking the picture set I in the "COCO Caption train2014" data set as an example, k=2, m=10, that is, two pictures are stitched together, and each picture takes 10 detection targets Characteristics.

[0094] 1. Image stitching

[0095] Picture I with the label 000000190141 in I 190141 As an example, randomly fetch the picture collection {I 190141 , I 202099}, for k=2, this embodiment adopts the way of splicing left and right, splicing into pictures Stitching does not change the aspect ratio of the two pictures, before splicing I 190141 The resolution is 640*423, I 202099 The resolution is 640*480, because I 190141 and I 202099 The width is different, the unaligned part is filled with 0 value during splicing, after splicing The resolution is 1280*480. Figure 3 shows the images before and after stitching.

[0096] 2. Get the collection of detection frames

[0097] In this embodiment, the Faste...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of computer data processing, and discloses a multi-modal data expansion method and system, a medium, computer equipment and a terminal, under the condition that semantic information of any modal data is not changed, image features are disturbed by adjusting the image content in a receptive field of a target detection model, and the data expansion efficiency is improved. Therefore, the data expansion is automatically performed, the effects of reducing the labor cost and improving the data expansion efficiency are achieved, and the abundant data provided by the method can improve the performance of downstream tasks. According to the multi-modal data expansion method provided by the invention, data expansion is carried out by expanding the image features, and data expansion can be automatically carried out under the condition that semantic information of any modal data is not changed. Therefore, the semantic information of any mode in the multi-mode training data is not changed, and the data expansion effect is good. Meanwhile, data expansion can be automatically carried out, the labor cost is low, and the data expansion efficiency is high.

Description

technical field [0001] The invention belongs to the technical field of computer data processing, and in particular relates to a multimodal data expansion method, system, medium, computer equipment and terminal. Background technique [0002] At present, with the development of multimedia and Internet technology, it has become a common phenomenon to use multimodal information to describe events and things, for example, combining image and text modalities for news reporting, combining video and audio modalities for short video make etc. Generally speaking, there is a correlation between data of different modes that appear at the same time, and analyzing this correlation is of great significance for data mining and data protection. At present, in the field of multimodal machine learning, related research directions include: Image Captioning, Cross-modal Retrieval, Visual Question Answering, etc., which provide open source data sets , these datasets provide support for technica...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06V10/44G06V10/764G06V10/82G06K9/62G06N3/04G06N3/08G06T3/40

CPCG06N3/08G06T3/4038G06T2200/32G06N3/045G06F18/241

Inventor 李晖张剑吴杰彭莹

Owner XIDIAN UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-modal data expansion method and system, medium, computer equipment and terminal

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology