Fight recognition method and device based on dual-channel cross-attention mechanism

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A recognition method and attention technology, applied in character and pattern recognition, neural architecture, computer components, etc., can solve the problems of long time-consuming optical flow extraction, affecting spatial feature extraction, and large resource consumption, so as to reduce the cost of manual screening , improve the recognition accuracy, improve the effect of operating efficiency

Active Publication Date: 2022-04-22

ZHEJIANG LAB +1

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The disadvantage of the network based on optical flow image input is that optical flow extraction takes a long time, consumes a lot of resources, and the entire network cannot be end-to-end.

The network based on RGB images as input does not set different number of image frames for input according to the different characteristics of the video in the time and space domains. If the number of input frames is large, the resource consumption will be large, and the algorithm training and testing time will be longer. If it passes If the number of sparse image frames is sampled, time-domain features cannot be extracted well, because behaviors and actions vary greatly between frames, and sparse sampling will affect spatial feature extraction

[0007] Based on the convolutional neural network to capture short-distance spatiotemporal information through the convolution kernel, it cannot model dependencies beyond the range of the receptive field

Although the deepening of the network can expand the receptive field and solve this problem to a certain extent, the problem still exists. There are ways to consider integrating local information into the attention mechanism, but the fusion method of local information and global information is relatively simple, which makes the feature fusion not comprehensive enough. Not strong enough

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0060] In order to make the object, technical solution and technical effect of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

[0061] Such as figure 1 As shown, a two-stream Transformer fight recognition method based on the cross-attention mechanism, the training set and the test set are obtained by screening the CCTV-Fights data set or labeling the video data collected independently; the training set and the test set are set to two kinds of frames The image sequence of different frames is obtained at a high rate, and the image sequence of the training set is preprocessed and sent to the fast and slow channel Transformer encoder and the cross-attention module. Finally, the network prediction result is obtained through the multi-layer perception head, and the loss is calculated according to the prediction result and the true value. The function trains the entire netwo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a fight recognition method and device based on a dual-channel cross-attention mechanism. The method first collects and generates video data sets, sets two frame rates of fast and slow to obtain different frame image sequences, and sends them to the fast and slow channels respectively after preprocessing. The fast and slow channel adopts the Transformer encoder based on the separate spatiotemporal self-attention mechanism to extract the spatiotemporal encoding features of the image sequence; then, through the cross-attention module, the CLS token of one channel and the patch token information of the other channel are fused to realize the fusion of dual-channel spatiotemporal encoding features; finally , the fused spatio-temporal coding features are passed through the multi-layer perception machine head for fighting behavior recognition. The invention can effectively extract the spatio-temporal features of the video through the dual-channel Transformer model and the cross-attention module, improve the accuracy of fighting behavior recognition, and is suitable for indoor and outdoor monitoring systems.

Description

technical field [0001] The invention relates to the field of intelligent video monitoring and deep learning, in particular to a transformer fight recognition method and device based on a dual-channel cross-attention mechanism. Background technique [0002] Nowadays, surveillance video has been widely used in social public places and plays an extremely important role in maintaining social public safety. Effective identification of abnormal behaviors and events in surveillance video can better play the role of surveillance video. Fighting is a common behavior in the video that disturbs public order. In severe cases, it is suspected of gathering crowds to fight and picking quarrels and provoking trouble, which affects social stability. It is very important to maintain social stability to detect fighting behaviors in a timely manner through intelligent means in massive videos. [0003] Most of the existing video-based fighting behavior recognition methods are implemented by con...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06V40/20G06V20/40G06V20/52G06V10/774G06V10/82G06K9/62G06N3/04

CPCG06N3/045G06F18/214

Inventor 李玲徐晓刚王军祝敏航曹卫强朱亚光

Owner ZHEJIANG LAB

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Fight recognition method and device based on dual-channel cross-attention mechanism

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology