Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Monocular depth estimation method fusing multi-modal information

A depth estimation, multimodal technology, applied in the cross field, to achieve the effect of high depth estimation accuracy

Pending Publication Date: 2022-06-07
BEIJING UNIV OF TECH
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the present invention, the problem of predicting a depth map from a single RGB image under a supervised learning framework is addressed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Monocular depth estimation method fusing multi-modal information
  • Monocular depth estimation method fusing multi-modal information
  • Monocular depth estimation method fusing multi-modal information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The present invention will be further described below with reference to the accompanying drawings and specific embodiments.

[0035] The process of the present invention is as follows figure 1 shown, including the following steps:

[0036] Step 1, the backbone network extracts the basic feature map.

[0037] After reading the image, perform feature extraction on the input RGB image. The available deep convolutional neural networks include ResNet (Deep residual network) and HRNet (Deep High-Resolution Representation Learning), and they are all pre-trained on the MIT ADE20K dataset.

[0038] Step 2, Cross Region Context Aggregation (CSC)

[0039] Existing methods consider context aggregation at multiple scales. For example, the 2018 CVPR article "Deep OrdinalRegression Network for Monocular Depth Estimation", or DORN for short, employs atrous Spatial Pyramid Pooling (ASPP) to capture spatial context at multiple local scales and employs a full-image encoder (global ave...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a monocular depth estimation method fusing multi-modal information. Firstly, a single RGB image is input, features are extracted through a conventional backbone network (such as ResNet), and then the RGB image passes through parallel global pooling operators in the horizontal and vertical directions, cross-channel 1 * 1 convolution, multi-scale cavity convolution and a semantic segmentation prediction module. In this way, two feature maps D and S (depth and semantic modals) with different modalities are obtained. And multiplying the probability distribution vector, belonging to the specific semantic category, of each pixel in the S by the depth representation vector in the D to obtain a multi-modal fusion similarity matrix, and finally performing residual connection with the feature map D to obtain a final output feature map F. In addition, a loss function is redesigned to adapt to and train the deep neural network model. Compared with an existing method, the method can better reflect contours of different object types in a scene and has higher depth estimation precision.

Description

technical field [0001] The invention belongs to the intersection fields of computer vision, robot visual perception and deep learning, and relates to a monocular depth estimation method integrating multimodal information. Background technique [0002] Ordinary cameras can only record the color information of the scene during the imaging process, but cannot record the distance information between the actual object and the camera, that is, in the process of projecting the three-dimensional space to the two-dimensional plane, the depth information is lost. Compared with obtaining the depth of a certain number of points on the surface of the object through various hardware devices such as laser rangefinders, the depth estimation method based on a single image has a wider range of applications because it does not require expensive equipment, instruments and professionals. Methods for introducing semantic cues to improve the performance of depth estimation can be roughly divided i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06V10/774G06V10/82G06F17/16G06K9/62G06N3/04G06N3/08
CPCG06F17/16G06N3/08G06N3/045G06F18/214
Inventor 马伟严武斌
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products