Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

67 results about "Semantic alignment" patented technology

Computer-assisted computing method of semantic distance between short texts

A computer-assisted computing method of the semantic distance between short texts belongs to the technical field of Chinese written message treatment and is characterized in that the semantic distance between two short texts is defined as the sum of the syntactic structure distance and unit semantic distance for computation. Webpage mark removing, variation short text treatment and participle treatment are conducted on the texts to obtain a series of word strings, semantic alignment is conducted on corresponding word strings in the two short texts according to a word similarity array, the syntactic structure distance is obtained according to the word adjustment times in the process, the five-grade structure in words in the <extended synonym thesaurus>, simultaneously Chinese key words and near-synonym concept are introduced, so that 5 kinds of operations including insertion, deletion, replacement and the like are conducted on the words on the basis of semantic alignment with the words as unit, and weight of the sum of various operations after weight is added is used for showing unit semantic distance between the word strings. The relative accuracy of the semantic distance between the texts is higher than that of classical compile distance algorithm.
Owner:BEIJING UNIV OF TECH

Discrete supervision cross-modal hashing retrieval method based on semantic alignment

The invention discloses a DSAH (discrete semantic alignment hashing) method based on semantic alignment for cross-modal retrieval. In the training process, a heterogeneous gap is reduced by the aid ofimage attributes and modal alignment semantic information. In order to reduce memory overhead and training time, a latent semantic space is learned by synergistic filtering, and the internal relationbetween a hash code and a label is directly built. Finally, in order to decrease quantization errors, a discrete optimization method is proposed to obtain a hash function with better performances. Inthe retrieval process, samples in a testing set are mapped to a binary space by the hash function, the Hamming distance between a binary code of a query sample and a heterogeneous sample to be retrieved is calculated, and front ranked samples are returned according to the sequence from small to large. Experimental results of two representative multi-modal data sets prove superior performances ofDSAH.
Owner:LUDONG UNIVERSITY

Image-text mutual retrieval method based on complementary semantic alignment and symmetric retrieval

The invention belongs to the technical field of computer vision and natural language processing, and discloses an image-text mutual retrieval method based on complementary semantic alignment and symmetric retrieval, comprising: using convolution neural network to extract the depth visual features of images; Using the model of object-based convolutional neural network and scene-based convolutionalneural network to extract depth visual features to ensure that the visual features contain multiple complementary semantic information of the object and the scene; encoding the text by using short-term and long-term memory network, and extracting the corresponding semantic features. mapping visual features and text features into the same cross-modal embedding space by using two mapping matrices; Using the k-nearest neighbor method, retrieving the initial list in the cross-modal embedding space. Using the neighborhood relation of symmetrical bi-directional retrieval based on mutual nearest neighbor method, the initial retrieval list is reordered and the final retrieval level list is obtained. The invention has the advantages of high accuracy.
Owner:XIDIAN UNIV

Attention mechanism-based multi-modal emotion feature learning and recognition method

The invention relates to an attention mechanism-based multi-modal emotion feature learning and recognition method, and the method comprises the steps: carrying out the feature extraction of an audio and text sample, and obtaining an FBank acoustic feature and a word vector feature; taking the obtained features as original input features of an audio emotion feature encoder and a text emotion feature encoder respectively, and extracting emotion semantic features of different modes through the encoders; performing audio attention, modal jump attention and text attention learning on the obtained emotion semantic features respectively, and extracting four complementary emotion features including an audio feature with remarkable emotion, an audio feature with semantic alignment, a text feature with semantic alignment and a text feature with remarkable emotion; and fusing the four features and then classifying to obtain corresponding emotion categories. According to the invention, the problemof low emotion recognition rate caused by intra-modal emotion irrelevant factors and inter-modal emotion semantics inconsistency in traditional multi-modal emotion recognition is solved, and the multi-modal emotion recognition accuracy can be effectively improved.
Owner:JIANGSU UNIV

Cross-modal retrieval method based on multilayer semantic alignment

The invention discloses a cross-modal retrieval method based on multilayer semantic alignment, which comprises the following steps of: acquiring a remarkable fine-grained region by utilizing a self-attention mechanism, promoting the alignment of entities and relationships among modal data, providing an image text matching strategy based on semantic consistency, extracting semantic tags from a given text data set, and performing global semantic constraint through multi-label prediction, so that more accurate cross-modal association is obtained. Therefore, the problem of cross-modal data semantic gap is solved.
Owner:BEIFANG UNIV OF NATITIES

Multi-modal emotion recognition method based on attention enhancing mechanism

The invention belongs to the technical field of emotion calculation and relates to a multi-modal emotion recognition method based on an attention enhancement mechanism. The method comprises steps of obtaining a voice coding matrix through a multi-head attention mechanism, and obtaining a text coding matrix through a pre-trained BERT model; performing point multiplication on the coding matrixes ofthe voice and the text respectively to obtain alignment matrixes of the voice and the text, and calibrating the alignment matrixes with original modal coding information to obtain more local interaction information; and finally, splicing the coding information, the semantic alignment matrix and the interaction information of each mode as features to obtain a feature matrix of each mode; aggregating the voice feature matrix and the text feature matrix by using a multi-head attention mechanism; converting the aggregated feature matrix into vector representation through an attention mechanism; and splicing the vector representations of the voice and the text, and obtaining a final emotion classification result by using a full connection network. According to the method, a problem of multi-modal interaction is solved, and accuracy of multi-modal emotion recognition is improved.
Owner:HANGZHOU DIANZI UNIV

Knowledge graph question and answer training and application service system with automatically generated template

The invention discloses a knowledge graph question and answer training system with an automatic template generation function, and the system comprises a predicate dictionary and category dictionary construction module which is used for constructing a predicate dictionary and a category dictionary in a remote supervision mode; a backbone query generation module which is used for obtaining sub-graphs of the theme entity and the answer entity of each training question and answer pair in the knowledge graph, and using variables to replace answer nodes in the sub-graphs to form backbone queries; asemantic alignment module which is used for aligning question phrases with backbone query semantic elements by using dependency syntactic analysis and a shaping linear alignment technology; a templateubiquitous module which is used for storing the dependency syntax tree, the backbone query and the corresponding relationship into a template library as templates; and a sorting model training modulewhich is used for performing classified learning on every two matching templates by using a machine learning binary classifier according to the matching degree to obtain a question template sorting model, so that the problems of high labor cost and low problem coverage rate in the prior art are solved.
Owner:来康生命科技有限公司

Small sample picture classification model and method based on semantic auxiliary attention mechanism

The invention discloses a small sample picture classification model and method based on a semantic auxiliary attention mechanism, and belongs to the field of small sample picture classification in computer vision. The system comprises a convolutional neural network, an extension model for zero sample picture classification, a spatial attention module and a semantic alignment module. The method comprises the following steps: selecting a training data set; constructing a network structure of the small sample picture classification model based on the semantic aided attention mechanism; preprocessing the training data, dividing the training data into a training set, a verification set and a test set, and subdividing each sub-data set into data packets including a support set and a test set; training a small sample picture classification model; and verifying the small sample picture classification model. According to the method, an attention mechanism and a multi-module learning principle are combined, the method is divided into two sub-modules, namely a spatial attention module and a semantic alignment module, the method can focus on a local area, and better small sample picture classification can be realized.
Owner:CHENGDU KOALA URAN TECH CO LTD

Cross-language text representation method and device

The invention provides a cross-language text representation method and device, and the method comprises the steps: obtaining a first training text and a first cross-language representation model corresponding to a first language, and enabling the first cross-language representation model to comprise a first universal vector sub-model and a text representation sub-model; obtaining a second trainingtext of a target language corresponding to the to-be-processed text; training a first universal vector sub-model according to the first training text and the second training text to obtain a second universal vector sub-model; and obtaining a second cross-language representation model of the target language according to the second universal vector sub-model and the text representation sub-model. Therefore, the universal vectors among different languages are mined based on semantic alignment processing, and cross-language text processing is performed based on the universal vectors, so that therepresentation effect of the cross-language processing model is ensured. The technical problem that in the prior art, a cross-language processing model difficultly crosses obstacles of different languages, and consequently the representation effect is poor is solved.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Training method and device for multi-language semantic representation model, equipment and storage medium

The invention discloses a training method and device for a multi-language semantic representation model, electronic equipment and a storage medium, and relates to the field of natural language processing based on artificial intelligence. According to the specific implementation scheme, the method comprises the steps of: adopting a plurality of training corpora containing multiple languages to train a multi-language semantic representation model, so that the multi-language semantic representation model learns the semantic representation capacity of the various languages; for each training corpus in the plurality of training corpuses, generating a corresponding hybrid language corpus, the hybrid language corpus comprising corpuses of at least two languages; and training the multi-language semantic representation model by adopting each hybrid language corpus and the corresponding training corpus, so that the multi-language semantic representation model learns semantic alignment information of different languages. According to the technical scheme, the multi-language semantic representation model can learn semantic alignment information between different languages, semantic interactionbetween different languages can be achieved on the basis of the multi-language semantic representation model, and practicability is very high.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Visual question-answering method and system based on semantic alignment and storage medium

The invention provides a visual question-answering method and system based on semantic alignment and a storage medium, and relates to the technical field of visual question-answering. According to theembodiment of the invention, the method comprises the steps: firstly obtaining and preprocessing a data set, extracting original image features and target position features according to an original image, generating an image description statement according to the target position features, obtaining an image description word, question features and image description statement features, and carryingout the semantic alignment of the original image features and the image description word; and obtaining a first image feature, obtaining a second image feature according to the original image featureand the image description statement feature, obtaining a third image feature according to the original image feature and the question feature, fusing the three image features, the image description statement feature and the question feature to obtain a comprehensive feature, and predicting a final answer result. Therefore, the importance of the image information is highlighted, the information involved in the feature fusion process is perfected, and the finally generated answer result is more accurate.
Owner:HEFEI UNIV OF TECH

Semantic enhanced hash method for zero-sample image retrieval

A semantic enhanced hash method for zero-sample image retrieval belongs to the technical field of computers and comprises the following steps: 1) performing image feature semantic alignment; 2) maintaining a domain structure; 3) hash code learning; 4) constructing and optimizing a total objective function; and 5) hash function learning for new data. The method mainly aims at solving the problem oflarge-scale image retrieval, and due to the fact that large-scale image data is generated from the Internet, for some newly-generated affairs and new categories, it is difficult for an existing algorithm to collect enough training pictures of new affairs to train a retrieval model. Therefore, the category semantic space is used as the middle transition space between the image visual features andthe binary codes, alignment of the visual space and the category semantic space is achieved, and the purpose of migrating knowledge from visible data to invisible data is achieved. Experimental verification shows that knowledge can be effectively learned from visible data and migrated to an invisible class, and the problem of zero-sample image retrieval is solved.
Owner:DALIAN UNIV OF TECH

Dynamic knowledge graph representation learning method and system based on anchor points

The invention provides a dynamic knowledge graph representation learning method and system based on anchor points, and the method comprises the steps: firstly finding key entities which play a role insupporting global information in an existing knowledge graph, and building a base coordinate system through the vectors of the entities; secondly, performing semantic alignment, including entity alignment and relationship fusion, on the newly added knowledge and the existing knowledge graph; finally, carrying out representation learning under a base coordinate system, so that only newly-added knowledge and related local knowledge of an existing knowledge graph need to be combined for training, a new knowledge entity is placed at a proper position in a knowledge space, and self-adaptive growthof the dynamic knowledge graph is achieved. The method has the beneficial effects that text information of entities and relationships is used as a semantic basis, and an information basis of knowledge fusion is provided, so that entity alignment and relationship fusion are more comprehensive and sufficient; a word2vec vector generation model is utilized to convert text information of entities andrelations into a vector form, so that the text information is used for mathematical operation.
Owner:CHINA UNIV OF GEOSCIENCES (WUHAN)

News event searching method and system based on multistage image-text semantic alignment model

The multi-level vision-text semantic alignment model MSAVT used for image-text matching is provided, the news event retrieval method based on the multi-level vision-text semantic alignment model MSAVT used for image-text matching is provided, news event cross-modal image-text search is achieved, and the current news retrieval requirement is met. The image-text alignment precision of the cross-modal retrieval model provided by the invention is higher, and when the cross-modal retrieval model is applied to news event cross-modal image-text retrieval, indexes such as recall rates of multiple levels, average accuracy and the like are remarkably improved. And meanwhile, a pre-trained BERT model is introduced to extract text features, so that the generalization performance of the algorithm is improved. The model adopts a public space feature learning method, vector representations of images and texts can be independently obtained, namely, vector representations of retrieval results can be stored in advance, retrieval time is short, and the method can be applied to actual scenes.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Visual dialogue generation system based on semantic alignment

The invention relates to a visual dialogue generation system based on semantic alignment. According to the invention, the image information is extracted from two aspects, i.e., a global image information and a local image information. Global image representation based on semantics is obtained through semantic alignment, meanwhile, local dense image description is obtained through dense caption, and high-level semantics of text representation is beneficial to better information acquisition. The two jointly provide clues of image information for generating replies. Meanwhile, comprehensive constraint is carried out from the aspects of text fluency, text coherence and correctness, and generation of replies is guided. In addition, the embodiment of the invention provides a keyword constraint method to constrain the correctness of the reply, so as to enrich the representation form of the generated reply.
Owner:HEFEI UNIV OF TECH

Fine-grained image weak supervision target positioning method based on deep learning

The invention relates to a fine-grained image weak supervision target positioning method based on deep learning. The fine-grained image weak supervision target positioning method is used for solving the problem that only weak supervision language description information easy to collect is used for recognizing and positioning a fine-grained image. According to the fine-grained image weak supervision target positioning method, inter-modal fine-grained semantic alignment is directly carried out on the pixel level of the image and the word described by the language; the image is input into a convolutional neural network to extract a feature vector, and the language description is encoded to extract the feature vector of the language description; and feature matching is performed on the convolution feature map and the language description feature vector, and the feature matching map is processed to obtain a saliency map of the target to obtain a final positioning result according to the feature matching map. According to the fine-grained image weak supervision target positioning method, weak supervision target positioning of the fine-grained image is realized under the condition of notneeding a strongly supervised annotated bounding box.
Owner:BEIJING UNIV OF TECH

Employing abstract meaning representation to lay the last mile towards reading comprehension

An autonomous agent creates a first semantic tree from a question and second semantic tree from a candidate answer. The agent identifies, between the first semantic tree and the second semantic tree, common subtrees and calculates a semantic alignment score from a sum of sizes of each of the common subtrees. The agent forms a first syntactic tree for the question and a second syntactic tree for the candidate answer. The agent identifies a number of common syntactic nodes between the first syntactic tree and the second syntactic tree. The agent calculates a syntactic alignment score based on the number of common syntactic nodes. Responsive to determining that a sum of the semantic alignment score and the syntactic alignment score is greater than a threshold, the agent outputs the candidate answer to a device.
Owner:ORACLE INT CORP

Training method of image search model and image search method

The invention provides a training method of an image search model and an image search method, relates to the field of artificial intelligence, in particular to a computer vision and deep learning technology, and can be applied to an image search scene. According to the specific implementation scheme, a sample text is obtained, wherein the sample text comprises a first language text and a second language text; and a semantic transformation network in the cross-modal image-text retrieval model is trained based on the sample text to obtain a target semantic transformation network, and a final target cross-modal image-text retrieval model is generated based on the target semantic transformation network. Therefore, the cross-modal image-text retrieval method and the cross-modal image-text retrieval system have the advantages that rich and accurate feature representation of the cross-modal image-text retrieval model subjected to large-scale data training can be kept, migration is not lost, meanwhile, only semantic alignment is performed, cross-modal search from any language text to image is realized, and efficiency and reliability in the training process of the image search model are improved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Model training method and device for image classification and storage medium

The invention discloses a model training method and device for image classification and a storage medium, and is used for solving the technical problem that a model obtained by an existing model training method cannot achieve a better image classification effect. The method comprises the steps of obtaining a visual feature vector of a sample picture; extracting a shallow semantic feature and a deep semantic feature in the visual feature vector based on a preset algorithm, and integrating the shallow semantic feature and the deep semantic feature to obtain a joint semantic feature; performing semantic space alignment on the joint semantic features to obtain a semantic alignment loss function; reconstructing the visual features, and determining an auto-encoder loss function according to the reconstructed visual features; and determining a target function training neural network model based on the semantic alignment loss function, the auto-encoder loss function and a preset parameter regular term. According to the method, the discrimination of the semantic embedding space is improved, and the domain bias problem of the zero sample learning model is relieved.
Owner:云鹏智汇(深圳)科技有限公司 +1

Language conversion method and device, electronic equipment and storage medium

The invention provides a language conversion method and device, electronic equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the steps of obtaining an original question text; the original question text is a natural language text; carrying out keyword extraction on a pre-obtained database table to obtain a header keyword; performing field extraction on the original problem text according to the header keyword to obtain a problem index field; performing semantic alignment processing on the header keyword and the problem index field to obtain a problem feature sequence, and splitting the problem feature sequence into candidate problem column features and candidate query condition features; performing classification processing on the candidate problem column features through a classification function to obtain target label column features; screening the candidate query condition features to obtain target query features; and splicing the target tag column feature and the target query feature to obtain a target text, the target text comprising an SQL statement. According to the method, the accuracy of generating the SQL statement can be improved.
Owner:PING AN TECH (SHENZHEN) CO LTD

Human face key point detection method, system and device based on semantic alignment

The invention belongs to the field of face recognition and particularly relates to a face key point detection method based on semantic alignment, system and device, The objective of the invention is to improve the accuracy of face key point detection. The method comprises the following steps of: obtaining a basically converged face key point detection network by a traditional method; adopting theconstructed training sample comprising the face image sample marked with the key points and the standard Gaussian response graph taking the positions of the key points as the center, and using the probability model containing the hidden variables as the target of maximum likelihood estimation to optimize the face key point detection network; And predicting the coordinates of the face key points through the finally optimized face key point detection network. According to the method, the problem of training oscillation caused by labeling randomness is effectively solved in the network training process, and the accuracy of face key point detection is improved.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

Image matching method based on deep semantic alignment network model

The invention discloses an image matching method based on a deep semantic alignment network model, and the method comprises the steps of gradually estimating the alignment between two semantic similar images through building an object position perception semantic alignment network model OLASA; using a triple sampling strategy to train a network model OLASA, and estimating translation, affine transformation and spline transformation respectively through three sub-networks Ntran, Naffi and Nttps of potential object co-location POCL, affine transformation ATR and bidirectional thin-plate spline regression TTPS; and obtaining an image matching result by establishing and optimizing an alignment relationship between the images in a layering manner. By using the technical scheme provided by the invention, the image alignment effect with relatively large position difference can be improved, and the image matching accuracy is improved. The invention can be applied to target tracking, semantic segmentation, multi-view three-dimensional reconstruction and the like in the field of computer vision.
Owner:PEKING UNIV

Statement similarity judgment method and judgment system

The invention discloses a statement similarity judgment method and judgment system, and relates to the technical field of natural language semantic similarity calculation. Improvements are performed on a modeling layer, a multi-semantic embedding layer, a semantic importance calculation layer, a semantic alignment layer and an output layer, a multi-granularity level similarity matrix is calculatedby utilizing the multi-semantic matrix, and true semantic alignment of the two sentences is discovered according to the matrix. The fact that different semantics have different importance is considered, and semantic importance calculation is provided. The proposed model does not need sparse features, WordNet and other external resources, is successfully trained in a short time, and obtains a competitive result on a similarity calculation task. Visual analysis shows that the model has good performance and interpretability.
Owner:CHONGQING UNIV

Video description method based on target space semantic alignment

The invention discloses a video description method based on target space semantic alignment. According to the method, firstly, appearance features and action features are extracted from sampled video frames containing text description, and the appearance features and the action features are spliced and then input into a time sequence Gaussian mixture cavity convolution encoder to obtain time sequence Gaussian features; constructing a decoder by using two layers of long and short-term memory neural networks to obtain probability distribution and hidden vectors of generated statements; establishing a semantic reconstruction network and calculating semantic reconstruction loss; and optimizing the model by using a stochastic gradient descent algorithm, sequentially carrying out the steps on a new video to obtain statement generation probability distribution, and obtaining a video description statement by using a greedy search algorithm. According to the method, modeling is carried out on the long-term time sequence relation of the video by using the time sequence Gaussian mixture cavity convolution, and the statement-level probability distribution difference is obtained through the semantic reconstruction network, so that the semantic gap between the generated statement and the video content can be reduced, and the natural statement which more accurately describes the video content is generated.
Owner:HANGZHOU DIANZI UNIV

Three-dimensional face recognition method based on semantic alignment multi-region template fusion

The invention provides a three-dimensional face recognition method based on semantic alignment multi-region template fusion, which comprises the following steps of: 1, determining a data set of registered faces and test faces in a three-dimensional face database; 2, preprocessing all registered and to-be-identified three-dimensional face models, and performing dense alignment on the preprocessed three-dimensional face models and the reference model; 3, pre-dividing the face region into a plurality of template regions which do not contain expression influence and can be overlapped; 4, for eachtemplate area, directly calculating a similarity value between the template areas on the three-dimensional structure of the human face; And 5, independently voting each region according to the similarity value, synthesizing a plurality of region matching results, and determining a final matching result by adopting a majority voting mode. According to the face recognition method provided by the invention, similarity prediction is carried out by utilizing mutual independence of the multi-template areas, the dependence of an algorithm on accurate division of a single area is reduced, and meanwhile, a multi-area template common voting strategy is adopted, so that certain robustness is also achieved on expressions and other factors influenced by the areas.
Owner:JIAXING UNIV

Monitoring video intelligent early warning method based on multimedia semantic analysis

The invention discloses a monitoring video intelligent early warning method based on multimedia semantic analysis, and the method comprises the steps of carrying out the accurate understanding of complex objects and interaction in a video through building a cross-modal semantic alignment model, generating a video clip space-time position map and a video semantic tree, introducing a text coding module based on a bidirectional long-short-term memory network, and deeply understanding and representing text semantics in the query statement. According to the invention, feature mapping and fusion from multi-modal features to a common space are achieved, refined video clip-query statement pairs are screened out through a semantic pruning strategy and coarse granularity, fine-granularity semantic matching calculation is carried out, and therefore the precision and efficiency of cross-modal video positioning are ensured.
Owner:SHANDONG ARTIFICIAL INTELLIGENCE INST +3

Image semantic feature matching method based on geometric consistency

The invention discloses an image semantic feature matching method based on geometric consistency. The method comprises the steps of semantic feature extraction, feature matching initialization, feature matching positioning optimization, image semantic alignment and the like, wherein the semantic feature extraction is to extract high-level semantic features by using a convolutional neural network to construct a five-layer semantic feature pyramid; the feature matching initialization is to design a semantic feature matching constraint rule on the top layer of a feature pyramid based on geometricconsistency, and construct an energy function; the feature matching positioning optimization is used for improving the positioning precision of feature matching, and the accuracy of feature matchingpairs is improved layer by layer through a pyramid back propagation algorithm; and finally, geometric transformation model parameters between the to-be-matched images is estimated by adopting a localgeometric transformation model, and image deformation is performed to realize image semantic alignment. The method can improve the precision of semantic feature matching, and achieves the alignment ofthe geometric attitude and orientation of the foreground target.
Owner:BEIHANG UNIV

Image recognition method and device

The invention provides an image recognition method, and the method comprises the steps: S1, obtaining a training image set, carrying out the training of each training image in the training image set and a category index corresponding to the training image, and learning and extracting a non-semantic visual expression; S2, aligning each training image in the training image set with the semantic label corresponding to the training image, and learning to extract visual expression of semantic alignment; S3, identifying and analyzing the non-semantic visual expression and the semantic alignment visual expression to obtain a visual prejudice elimination model; and S4, inputting the to-be-identified image into the visual bias elimination model, and identifying the to-be-identified image. Accordingto the image recognition method provided by the invention, the visual bias elimination model is established, so that the perception effect on known domain and unknown domain samples can be improved,and accurate recognition of an unbiased zero sample is further realized.
Owner:UNIV OF SCI & TECH OF CHINA +1

Image description generation method based on text hierarchical structure

The invention discloses an image description generation method based on a text hierarchical structure. According to the invention, a double-layer LSTM decoder is constructed, a visual and language information selection mechanism is introduced, effective selection is performed between image features and language information by utilizing image global features, word embedding and an attention guiding mechanism, and semantic information is generated through decoding to describe sentences more accurately. Aiming at the defect that a traditional language model is insufficient in semantic feature extraction capability, an ordered long-short term memory network improved by an FARIMA filter is introduced in a decoding stage. Semantic information of different text hierarchies is reserved through hierarchical structures of coded sentences, and semantic alignment is performed on the content by utilizing image space information. The cross-modal representation capability of a decoder in alignment of image features and semantic features is improved, and the long-time dependence of a network is increased. The method extracted according to the invention is richer in semantic relationship and more in line with natural language habits.
Owner:HUBEI UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products