Multi-target detection method and device based on multi-modal information fusion

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A detection method and target detection technology, applied in neural learning methods, character and pattern recognition, biological neural network models, etc., can solve problems such as inability to make full use of the correlation between multimodal data, complex network structure, and sensitive data alignment , to achieve high algorithm efficiency, solve the problem of excessive computing cost, and improve efficiency

Pending Publication Date: 2022-05-10

TIANJIN UNIV

View PDF0 Cites 4 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The main challenges faced by the current multi-modal fusion target detection methods include: early fusion and late fusion cannot make full use of the correlation between multi-modal data

Deep fusion often has the disadvantages of being sensitive to data alignment and complex network structure

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0045] A multi-target detection method based on multi-modal information fusion, see figure 1 , the method includes the following steps:

[0046] 101: Process LiDAR point cloud data, and extract LiDAR point cloud features, that is, three-dimensional feature maps;

[0047] According to the sparsity of LiDAR data, the embodiment of the present invention adopts a resampling method and adds sampling points, which can increase the density of data to a certain extent, thereby improving the effect of the three-dimensional feature map and the effectiveness of detection.

[0048] 102: Perform two-dimensional image data processing on the RGB image, and output RGB image features through a feature extraction network, that is, a two-dimensional feature map;

[0049] Since two-dimensional images naturally lack three-dimensional information, in the detection stage after feature extraction, it is necessary to correlate with three-dimensional information based on spatial position and pixel info...

Embodiment 2

[0055] The scheme in embodiment 1 is further introduced below in conjunction with specific examples and calculation formulas, see the following description for details:

[0056] 201: Process LiDAR point cloud data, and output LiDAR point cloud features, that is, three-dimensional feature maps;

[0057] Specifically, the point cloud is evenly grouped into several voxels, and the sparse and uneven point cloud is converted into a dense tensor structure. The list of voxel features is obtained by stacking the voxel feature coding layer. The voxel features are aggregated in the enlarged receptive field, and the LiDAR point cloud features are output, that is, the three-dimensional feature map.

[0058] 202: Perform two-dimensional image data processing on the RGB image, and output RGB image features through a feature extraction network, that is, a two-dimensional feature map;

[0059] Specifically, a uniform grouping operation is performed on the two-dimensional RGB image, and the w...

Embodiment 3

[0093] Below in conjunction with specific example, the scheme in embodiment 1 and 2 is carried out feasibility verification, see the following description for details:

[0094] The KITTI dataset is used to evaluate the performance of the algorithm. The KITTI dataset is currently the largest algorithm evaluation dataset in the world for autonomous driving scenarios, including 7481 point clouds and images for training and 7518 point clouds and images for testing, including: cars, pedestrians and bicycles three categories of people. For each category, the detection results are evaluated according to three difficulty levels: easy, medium, and difficult. The three difficulty levels are determined according to the target size, occlusion state, and truncation level. The algorithm is comprehensively evaluated, and the training data is subdivided into a training set and a validation set, resulting in 3712 data samples for training and 3769 data samples for validation. After splitting...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a multi-modal information fusion-based multi-target detection method and device, and the method comprises the steps: carrying out the forward transmission of three-dimensional and two-dimensional feature maps through a parameter-sharing convolutional neural network, training the convolutional neural network according to cross entropy, and achieving the fusion of three-dimensional and two-dimensional features; the fused three-dimensional feature map is input into a three-dimensional region generation network, the network maps the three-dimensional feature map through two branches, and a three-dimensional target detection frame position map and a probability score map are output; inputting the fused two-dimensional feature map into a two-dimensional region generation network, outputting a two-dimensional target detection frame position map and a probability score map, and at the moment, respectively obtaining the point cloud data, the position of the target in the RGB two-dimensional image and the probability score map of detection; and fusing the target detection frames in the two modes by adopting a post-fusion strategy to obtain a final target detection result. The device comprises a processor and a memory. The limitations of single information and poor robustness in a traditional single-mode method are overcome.

Description

technical field [0001] The invention relates to the fields of three-dimensional target detection and two-dimensional target detection, and in particular to a multi-target detection method and device based on multi-modal information fusion. Background technique [0002] In recent years, with the development of Light Detection and Ranging (LiDAR) technology, the acquisition speed and accuracy of point cloud data have been greatly improved. How to realize efficient and accurate point cloud target detection is an important issue in the fields of intelligent driving, remote sensing, augmented reality, virtual reality and so on. Compared with traditional 2D object detection, 3D object detection requires more output parameters to determine the bounding box of the object. Due to the data characteristics of LiDAR point cloud, in the target detection task, it often faces problems such as low resolution of input data, missing texture and color information, and high computational overh...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06V20/64G06V20/00G06V10/22G06V10/40G06V10/56G06V10/764G06V10/80G06V10/82G06K9/62G06N3/04G06N3/08

CPCG06N3/08G06N3/047G06N3/045G06F18/2415G06F18/253

Inventor 聂为之高思洁马瑞鑫刘通

Owner TIANJIN UNIV

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Multi-target detection method and device based on multi-modal information fusion

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology