Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multi-modal scene recognition method based on deep learning

A scene recognition and deep learning technology, applied in the field of pattern recognition and artificial intelligence, can solve problems such as complex implementation methods, achieve the effect of improving accuracy and facilitating scene recognition methods

Active Publication Date: 2019-07-23
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The results obtained by the method of feature fusion are more objective, but the actual implementation method is too complicated

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-modal scene recognition method based on deep learning
  • Multi-modal scene recognition method based on deep learning
  • Multi-modal scene recognition method based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The present invention provides a new multi-modal scene recognition method based on deep learning for the problems of inaccurate results and high complexity of the existing scene recognition methods. Feature information of text modalities, and fusion of multi-modal feature information to improve the accuracy of scene recognition.

[0036] Further, the deep learning-based multi-modal scene recognition method of the present invention includes the following steps.

[0037] S1. Use the stammer word segmentation tool to perform word segmentation processing on short texts.

[0038] S2. Input a group of pictures and short text word segmentation and corresponding labels into respective convolutional neural networks for training.

[0039] S3. Training a short text classification model. Specifically include the following steps:

[0040] S31. During the training process of the short text classification model, quantify the word segmentation results of the input short text and inp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multi-modal scene recognition method based on deep learning. The multi-modal scene recognition method comprises the following steps: S1, carrying out word segmentation processing on a short text; S2, inputting a group of pictures, short text segmented words and corresponding tags into respective convolutional neural networks for training; S3, training a short text classification model; S4, training a picture classification model; S5, respectively calculating cross entropies of the full connection layer outputs in S3 and S4 and a standard classification result, calculating an average Euclidean distance which serves as a loss value, then feeding back the loss value to the respective convolutional neural network, and finally obtaining a complete multi-modal scene recognition model; S6, adding the text and the image prediction result vector to obtain a final classification result; and S7, respectively inputting the short text and the image to be identified into the trained multi-modal scene identification model, and performing scene identification. The invention provides a multi-modal scene searching mode, and more accurate and convenient scene recognition isprovided for users.

Description

technical field [0001] The invention relates to a multimodal scene recognition method, in particular to a deep learning-based multimodal scene recognition method, which belongs to the fields of artificial intelligence and pattern recognition. Background technique [0002] Deep learning is a brand-new field of machine learning. Its purpose is to make machine learning closer to human intelligence. Convolutional neural network is a representative algorithm of deep learning. It has the characteristics of simple structure, strong adaptability, few training parameters and many connections. , therefore, this network has been widely used in the fields of image processing and pattern recognition for many years. [0003] Specifically, the convolutional neural network is a hierarchical model whose input is the original data. Through a series of operations such as convolution operations, pooling operations, and nonlinear activation functions, the high-level semantic information is layer...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N3/04
CPCG06N3/045G06F18/214G06F18/24
Inventor 吴家皋刘源孙璨郑剑刚
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products