The invention provides a text visual question-answering
system and method based on concept interaction and associated
semantics. The
system comprises an object position extraction module, a first fullconnection layer, a text
information extraction module, a second full connection layer, an OCR-
object graph convolutional network, a multi-gate-step mechanism graph convolutional network, a converternetwork and a bidirectional converter representation
encoder BERT. According to the invention, modeling is carried out by using a position relationship between an object and text information in an image, then modeling is performed on text information and object information through the OCR-
object graph convolutional network, thus learning abundant and directional features for relationship coding through a gate mechanism, and finally, precisely paying attention to objects and texts in an image through a converter network, thereby obtaining a more accurate answer.