Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Image-based data processing method, apparatus and device, and readable storage medium

A data processing and image technology, applied in the field of computer vision, can solve problems such as low accuracy and text processing errors, and achieve the effect of improving accuracy and avoiding distraction distribution

Active Publication Date: 2019-06-07
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF6 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The current image-based data processing methods need to learn the relationship between text and objects in the image based on image features and text features, which makes the accuracy of the relationship low and leads to text processing errors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Image-based data processing method, apparatus and device, and readable storage medium
  • Image-based data processing method, apparatus and device, and readable storage medium
  • Image-based data processing method, apparatus and device, and readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] Figure 1a It is a flow chart of an image-based data processing method provided by Embodiment 1 of the present invention. This embodiment is applicable to the situation of processing text by recognizing images. This method can be implemented by an image-based data processing device. Execution, the device may be composed of hardware and / or software, and generally integrated in electronic equipment, specifically including the following operations:

[0035] S110. Acquire an image and text to be processed.

[0036] In this embodiment, the image may be a photo, a screenshot, a video frame, and the like. The text to be processed is a free and open natural language text about the image. The text to be processed includes the understanding of the text, such as true and false judgments, text content interpretation, etc.; the text to be processed also includes natural language questions, and the types of questions raised by the text include but are not limited to fine-grained rec...

Embodiment 2

[0050] This embodiment is further optimized on the basis of the optional implementation manners of the foregoing embodiments. Optionally, before "according to the matching degree between the text and the features of each object in the multiple objects, the features of multiple objects are fused into the fusion features of the image", the additional operation "in turn the corresponding bounding box of each object Input the image and text of the matching model into the matching model, and obtain the matching degree of the characteristics of each object output by the matching model with the characteristics of each word in the text; according to the matching of the characteristics of each object with the characteristics of each word in the text degree to get the matching degree between the text and the features of each object”. Figure 2a It is a flowchart of an image-based data processing method provided in Embodiment 2 of the present invention. The method provided in this embodi...

Embodiment 3

[0077] image 3 It is a flowchart of an image-based data processing method provided by Embodiment 3 of the present invention. The embodiments of the present invention carry out operation refinement on the basis of the technical solutions of the foregoing embodiments. Optionally, refine the operation "according to the matching degree between the text and the features of each of the multiple objects, fuse the features of multiple objects into the fusion features of the image" to "according to the matching of the text and the features of each object Degree, the weighted summation of the features of each object is obtained to obtain the fusion features of the image". Such as image 3 An image-based data processing method is shown, including:

[0078] S310. Acquire the image and the text to be processed.

[0079] S320. Extract features of multiple objects in the image, and extract features of text.

[0080] S330. Perform weighted summation on the features of each object accord...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses an image-based data processing method, apparatus and device and a readable storage medium. The method comprises the steps of obtaining an image and a to-be-processed text; extracting features of a plurality of objects in the image, and extracting features of the text; according to the matching degree of the text and the features of each of the plurality ofobjects, fusing the features of the plurality of objects into fused features of the image; and processing the text according to the fusion features of the image and the features of the text. According to the embodiment of the invention, the association relationship between the text and each object in the image can be accurately learned, and the processing accuracy is improved.

Description

technical field [0001] Embodiments of the present invention relate to computer vision technology, and in particular to an image-based data processing method, device, equipment, and readable storage medium. Background technique [0002] With the development of computer vision technology, image-based data processing methods such as visual question answering have emerged. Visual Question Answering (Visual Question Answer, VQA) is one of the cutting-edge applications of multimodal data mining, aiming at natural language question answering of visual images, as a research direction of visual understanding (Visual Understanding), connecting vision and language, VQA It is necessary to deal with specific text problems on the basis of understanding images. [0003] In the current image-based data processing method, first of all, two sets of different underlying representation systems are used to extract the underlying features of images and texts, and learn the high-level features of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06V30/224
CPCG06V2201/10G06V10/82G06V30/19173G06T11/60G06T7/11G06T2207/20081G06V10/40G06F18/22G06T2210/12G06F16/583G06F40/30G06V30/224
Inventor 黄剑辉黄苹苹乔敏李盈
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products