The invention discloses a lip
language recognition method and
system based on cross-
modal attention enhancement, and the method comprises the steps of extracting a lip
image sequence and the lip motion information, obtaining a corresponding
lip feature sequence and a lip motion sequence through a pre-training feature extractor, inputting the obtained feature sequences into a cross-
modal attention network, obtaining a lip enhancement feature sequence; through a multi-
branch attention mechanism, establishing the
time sequence relevance of an intra-
modal feature sequence, and specifically selecting the related information in input at an output end. According to the method, the relevance between the
time sequence information is considered,
optical flow calculation is carried out on the adjacent frames to obtain the motion information between the visual features, the lip visual features are represented and fused and enhanced by using the motion information, the context information in the mode is fully utilized, and finally, the correlation representation and selection of the intra-modal features are carried out through the multi-
branch attention mechanism, so that the lip reading recognition accuracy is improved.