Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Cross-Modal Retrieval Method Based on Multi-layer Semantic Alignment

A cross-modal and semantic technology, applied in the field of cross-modal retrieval, can solve the problems of ignoring fine-grained image areas and text word relationships, and image features are highly dependent on detection, so as to make up for inaccurate detection, improve retrieval accuracy, and better Associated Effects

Active Publication Date: 2022-05-20
BEIFANG UNIV OF NATITIES
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the methods proposed above mainly establish associations from the global features of images and texts, ignoring the relationship between fine-grained areas of images and text words, and image features are highly dependent on the detection accuracy of images, and different areas affect each other

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Cross-Modal Retrieval Method Based on Multi-layer Semantic Alignment
  • A Cross-Modal Retrieval Method Based on Multi-layer Semantic Alignment
  • A Cross-Modal Retrieval Method Based on Multi-layer Semantic Alignment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0120] 1. Experimental method

[0121] This experiment is run on an NVIDIA 1080Ti GPU workstation. Experiments are carried out on two public datasets, Flickr30k and MSCOCO. Each picture in the dataset corresponds to five associated sentences. The data information is shown in Table 1. Since the data set only contains two modal data, image and text, this method verifies the mutual retrieval of text and image. In the experiment, 36 regions and 2048-dimensional features were extracted from each image, and the data dimensionality was reduced to 1024 common spaces through the fully connected layer. For each sentence, the word embedding size is set to 300, sentences with insufficient length are padded with zeros, the sentence words are encoded using Bi-GRU, and the hidden unit dimension is 1024.

[0122] Table 1 Details of Flickr30k and MSCOCO datasets

[0123]

[0124] In this paper, R@K is used to evaluate the method. R@K indicates the correct query percentage among the K ret...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cross-modal retrieval method based on multi-layer semantic alignment, which uses a self-attention mechanism to obtain significant fine-grained regions, promotes the alignment of entities and relationships between modal data, and proposes an image-text matching strategy based on semantic consistency , extract semantic labels from the given text dataset, and perform global semantic constraints through multi-label prediction to obtain more accurate cross-modal associations. Thus, the semantic gap problem of cross-modal data is solved.

Description

technical field [0001] The invention relates to the technical field of cross-modal retrieval, in particular to a cross-modal retrieval method based on multi-layer semantic alignment. Background technique [0002] With the wide application of artificial intelligence in various fields, data presentation forms are becoming more and more diverse, and multi-modal data such as text, images, and videos are growing rapidly. Data of different modalities presents heterogeneous low-level features and high-level semantic correlation. For example, text in the same webpage is represented by dictionary vectors, while images are represented by visual features. They are in completely different feature spaces, but they represent the same semantics. However, the current retrieval method is usually a single medium, that is, the query and retrieval results belong to the same modality type, the retrieval content is single, and is limited by the query conditions. Since the content of multi-modal ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/43
CPCG06F16/43Y02D10/00
Inventor 王海荣杜锦丰
Owner BEIFANG UNIV OF NATITIES
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products