Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Pedestrian multi-target tracking method combining attention mechanism end-to-end training

A multi-target tracking and attention technology, applied in the field of computer vision, can solve the problems of complex calculation process, increased calculation amount and memory overhead, etc., to achieve the effect of improving calculation efficiency, reducing calculation overhead, and improving discrimination ability.

Pending Publication Date: 2021-03-26
成都东方天呈智能科技有限公司
View PDF0 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] At present, most pedestrian multi-target tracking methods based on deep learning divide the pedestrian tracking algorithm into a tracking part and a data association part for separate training and calculation, which makes the whole calculation process complicated and increases redundant calculation and memory overhead.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Pedestrian multi-target tracking method combining attention mechanism end-to-end training
  • Pedestrian multi-target tracking method combining attention mechanism end-to-end training
  • Pedestrian multi-target tracking method combining attention mechanism end-to-end training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] A pedestrian multi-object tracking method trained end-to-end with a joint attention mechanism, such as figure 1 As shown, it includes the following steps: Step S100: Collect a pedestrian data set of labeled video sequences, and use the first frame of the real bounding box of each video in the label to initialize the tracking box as a template sample, and then according to the center of the tracking box in Cut out the positive search area sample in the second frame, and cut out the negative search area sample in the area that is not the same target; The template sample, the positive search area sample, and the negative search area sample form a triplet, which is output as a training sample;

[0054] Step S200: Construct a deep neural network model, use the convolutional neural network to partially extract the feature information of the sample, and then use the attention mechanism module to guide the network model to tend to important feature information, and finally calcu...

Embodiment 2

[0059] This embodiment is optimized on the basis of embodiment 1, such as figure 1 , figure 2 As shown, the step S200 includes the following steps:

[0060] Step S201: Construct three network structure branches respectively processing template samples, positive search area samples, template sample branches of negative search area samples, positive search area sample branches, and negative search sample branches, the template sample branch, positive search area samples The backbone network structures of branches and negative search sample branches are the same, and share weight parameters;

[0061] Step S202: Both the positive search region sample branch and the negative search sample branch use the region-of-interest alignment layer downsampling feature point information, and the backbone network of the positive search region sample branch and the negative search sample branch is connected to the region-of-interest alignment layer An attention mechanism module is set betwee...

Embodiment 3

[0071] This embodiment is optimized on the basis of Embodiment 1 or 2. The verification loss function in step S300 adopts a flexible maximum loss function, and the calculation formula is as follows:

[0072]

[0073] where: z i 、x i 、x j Respectively represent template samples, positive search area samples, and negative search area samples;

[0074] Respectively represent the predicted probability value of the template sample, the predicted probability value of the positive search area sample, and the predicted probability value of the negative search area sample;

[0075] Increase the classification ability of the model by minimizing the validation loss function;

[0076] The single target tracking loss function is calculated by convolving the heat map obtained from the feature map output by the backbone network part. The calculation formula is as follows:

[0077]

[0078] Where: p is a feature point on the heat map,

[0079] P is the feature map,

[0080] v p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a pedestrian multi-target tracking method combining attention mechanism end-to-end training, and the method comprises the steps of collecting a pedestrian data set of a video sequence with a label, employing a first frame real boundary frame of each video in the label as a template sample, cutting a positive search region sample in a second frame according to the center ofthe sample, and cutting out a negative search area sample in an area which is not the same kind of target; forming triple data input; extracting feature information of the sample by using a convolutional neural network; using an attention mechanism module for guiding the network model to tend to important feature information, and finally calculating similarity and data association. According to the invention, single-target tracking based on the twin network and the association network are integrated into a unified network structure, and the attention mechanism is combined to serve as meaningful feature information for network bias learning, so that the feature expression capability of a network model is improved, the calculation efficiency is improved, and the training process is simplified.

Description

technical field [0001] The invention belongs to the technical field of computer vision, and in particular relates to a pedestrian multi-target tracking method for end-to-end training of a joint attention mechanism. Background technique [0002] With the rapid development of deep learning and computer computing power, the field of computer vision has become a very important research branch in the discipline of computer science, and many research methods have been implemented, and derived products have accelerated the process of social intelligence. In real life, pedestrian multi-target tracking is a direction that is widely used in the field of computer vision, such as intelligent video surveillance, human-computer interaction, guardianship robots and so on. [0003] Pedestrian multi-target tracking is a visual task that obtains the position information and motion trajectory of multiple pedestrians in the image by processing and analyzing the images of the video sequence, and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/00G06K9/62G06N3/04G06T7/11
CPCG06T7/11G06V20/46G06N3/045G06F18/213G06F18/214Y02T10/40
Inventor 闫超黄俊洁韩强
Owner 成都东方天呈智能科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products