The invention discloses an RGB-T image significance target detection method based on multi-level depth feature fusion, which mainly solves the problem that in the prior art, a saliency target cannot be completely and consistently detected in a complex and changeable scene. The implementation scheme comprises the following steps: 1, extracting rough multi-level features from an input image; 2, constructing an adjacent depth feature fusion module, and improving single-mode features; 3, constructing a multi-branch-group fusion module, and fusing the multi-mode characteristics; 4, obtaining a fusion output feature map; 5, training an algorithm network; 6, predicting a pixel-level saliency map of the RGB-T image. Supplementary information from different modal images can be effectively fused, image salient targets can be completely and consistently detected in a complex and changeable scene, and the method can be used for an image preprocessing process in computer vision.