Cross-modal image generation method and device based on audio-tactile signal fusion

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of image generation and signal fusion, applied in neural learning methods, character and pattern recognition, biological neural network models, etc., can solve problems such as complex representation, blurred autoencoder images, and unstable training, to improve image quality, The effect of eliminating discrepancies and improving accuracy and completeness

Active Publication Date: 2021-11-09

NANJING UNIV OF POSTS & TELECOMM

View PDF5 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] In addition, the existing generative models are mainly represented by generative confrontation network (GAN) and variational autoencoder (VAE), but GAN has the defects of gradient disappearance and training instability, while the image generated by autoencoder is relatively blurred

At the same time, cross-modal image generation research is mainly based on text-based image generation, and these models can only handle cross-modal generation of a single modality, even if they can be extended, considering that the word-level features of text are relatively tactile and audio-usually in the form of Time-domain sequence representation is more complex, and text-based cross-modal models are not suitable for multi-modal scenarios

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0092] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

[0093] Efficient and accurate cross-modal image generation methods need to be formulated, capable of fusing different modalities and achieving high-quality image data reconstruction. In recent years, the confrontation generation model has achieved good success in the field of image generation, and the knowledge distillation model also provides a simple and efficient way to improve the fine-grained image generation. Therefore, the present invention proposes a cross-modal image generation method based on audio-tactile signal fusion. The fusion method based on deep semantics can improve the accuracy of model reconstruction; latent space learning maps the semantic features of cross-modal data to a "latent learning space" to measure the similarity of dif...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to view more

PUM

Login to view more

Abstract

The invention discloses a cross-modal image generation method based on audio-tactile signal fusion. The method comprises the following steps: 1), selecting a multi-modal data set containing audio data, image data and tactile signals, and dividing the data set into a training set and a test set; 2) designing an audio-tactile signal fused cross-modal image generation model, wherein the model comprises a deep semantic fusion module, a potential space learning module and a cross-modal image generation module; 3) training the model by using a training set to obtain an optimal parameter; and 4) generating a corresponding image in a cross-modal manner based on the trained model by using the tactile signal and the audio data in the test set. The invention also discloses a cross-modal image generation device based on audio-tactile signal fusion, which introduces a strong generative adversarial mechanism, utilizes label information and effectively improves the accuracy and robustness of image generation.

Description

technical field [0001] The invention relates to the technical field of image generation, in particular to a cross-modal image generation method and device based on audio-tactile signal fusion. Background technique [0002] With the rapid development of wireless communication and multimedia technology, people began to pursue a more realistic immersive experience. Touch, as a new sensory dimension, gradually began to integrate and sublimate traditional audio-visual services, forming a cross-modal business. The cross-modal communication of collaborative audio-visual-touch transmission is considered to be a reasonable and efficient communication method to support cross-modal business. However, due to the unreliability of transmission and the difference in communication quality of signals of different modalities, visual signals often face severe loss, and restoration and reconstruction are urgently needed. [0003] Existing image generation work mainly utilizes the inherent inf...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to view more

Application Information

Patent Timeline

Login to view more

Patent Type & Authority Applications(China)

IPC IPC(8): G06K9/62G06K9/46G06N3/04G06N3/08

CPCG06N3/08G06N3/045G06F18/241G06F18/253

Inventor 姚玉媛魏昕高赟周亮

Owner NANJING UNIV OF POSTS & TELECOMM

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Try Eureka

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.

Cross-modal image generation method and device based on audio-tactile signal fusion

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology