Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Cross-modal semantic clustering method based on bidirectional CNN

A clustering method and cross-modal technology, applied in the field of computer vision, can solve problems such as different efficiency and accuracy, and achieve the effects of strengthening recognition ability, enhancing correlation, and improving accuracy and efficiency

Pending Publication Date: 2021-10-22
HANGZHOU DIANZI UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Although cross-modal retrieval is based on mapping to the same subspace, depending on the selection and setting of the loss function, there will be different efficiencies and accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-modal semantic clustering method based on bidirectional CNN
  • Cross-modal semantic clustering method based on bidirectional CNN
  • Cross-modal semantic clustering method based on bidirectional CNN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The purpose and effects of the present invention will become more apparent by referring to the accompanying drawings in detail of the present invention.

[0048] Step 1: Data preprocessing, pre-training the text samples of the training set.

[0049] Using the existing data set, divide it into training set and test set according to the set ratio, and pre-train the text samples of the training set.

[0050] Step 2: Build a cross-modal retrieval network.

[0051] Such as figure 1As shown, the cross-modal retrieval network adopts a two-layer CNN structure, including a ResNet-50 network and a text CNN network, namely TextCNN. The network structure adopts double CNN simultaneously. The feature vector of the image sample is extracted through the ResNet-50 network. For text samples, first use Word2Vec to pre-train word vectors, and then use TextCNN to extract text feature vectors.

[0052] The main idea of ​​ResNet-50 is to add a direct connection channel in the network, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cross-modal semantic clustering method based on a bidirectional CNN, and the method comprises the steps: firstly carrying out the preprocessing of data, and carrying out the pre-training of a text sample of a training set; then constructing a cross-modal retrieval network, training the cross-modal retrieval network through the training set, and calculating a loss function of the network; carrying out back propagation, and optimizing the connection weight through a selected optimizer and corresponding parameters; performing training for multiple rounds to obtain a final network model; and finally, testing the trained model through the test set, and calculating various evaluation indexes. According to the method disclosed by the invention, the accuracy and the efficiency of cross-modal retrieval are improved by utilizing clustering of semantic information. According to the method, the loss of the sample and the clustering center in the target space, the distribution difference loss of the categories in different modes and the discrimination loss are designed to help semantic clustering, so that the recognition capability among different categories is enhanced, and the correlation among different modes is enhanced.

Description

technical field [0001] The invention relates to the field of computer vision, in particular to a cross-modal retrieval method based on deep learning. Background technique [0002] In the era of the explosion of new media information, every new media user will publish a variety of multimedia information in different modalities anytime and anywhere, such as pictures, music, video or text. Due to the rapid development of multimedia information, with the increase in the number and types of multimedia information, it becomes difficult for users to accurately obtain the information they want. When obtaining information, it is always accompanied by other information with different degrees of relevance. These data are not only huge in number, but most of them are unlabeled data, and there is a "heterogeneous gap" between data in different modalities, so the main technical problem of cross-modal retrieval is to span these different modal data The "gap" between extraction precision a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/04G06N3/08
CPCG06N3/08G06N3/045G06F18/23213G06F18/214
Inventor 颜成钢王超怡孙垚棋张继勇李宗鹏
Owner HANGZHOU DIANZI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products