The invention discloses an image text cross-
modal retrieval method based on category information alignment, and aims to keep distinguishing between different semantic category instances (image texts) and eliminate isomerism differences. In order to achieve the purpose, category information is innovatively introduced into a public representation space, namely an image text
public space to minimize distinguishing loss, and cross-
modal loss is introduced to align different
modal information. In addition, a category
information embedding method is adopted to generate false features instead of other methods marking information based on DNN; at the same time, modal invariance loss is minimized in a category
public space to learn modal invariance features. Under the guidance of the learning strategy,
pairwise similarity semantic information of image-text
coupling items is fully utilized as much as possible, and it is guaranteed that learned representation has both the discrimination of a semantic structure and the cross-modal invariance.