The invention provides a cross-
modal retrieval method based on
modal relation learning, which comprises the following steps of: inputting image text pairs with the same
semantics in a
data set and class labels to which the image text pairs belong into a cross-
modal retrieval
network model based on modal relation learning for training until the model converges, thereby obtaining a
network model M; s2, respectively extracting feature vectors of an image / text to be queried and each text / image in the candidate
library by utilizing the
network model M obtained by training in S1, thereby calculating the similarity between the image / text to be queried and the text / image in the candidate
library, carrying out descending sorting according to the similarity, and returning a
retrieval result with the highest similarity; an inter-modal and intra-modal dual
fusion mechanism is established for inter-modal relation learning, multi-scale features are fused in the modals, complementary relation learning is directly performed on the fused features by using
label relation information between the modals, and in addition, an inter-modal attention mechanism is added for feature joint embedding, so that multi-scale multi-scale
feature fusion is realized. And the cross-modal retrieval performance is further improved.