Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Document layout classification method based on multi-modal fusion

A classification method and multi-modal technology, applied in the field of deep learning, can solve the problems of difficult to distinguish image features and similar image features, and achieve the effect of improving the accuracy.

Pending Publication Date: 2021-11-23
达观数据(苏州)有限公司
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] 1. For the header and footer target detection method, the location features cannot be extracted well. The image features of the header and footer are similar to the image features of some paragraphs, and it is not easy to distinguish only based on the image features;
[0007] 2. Paragraphs and titles will have similar image features, and they need to rely on text to better distinguish them

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document layout classification method based on multi-modal fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0025] In describing the present invention, it should be understood that the terms "longitudinal", "transverse", "upper", "lower", "front", "rear", "left", "right", "vertical", The orientation or positional relationship indicated by "horizontal", "top", "bottom", "inner", "outer", etc. are based on the orientation or positional relationship shown in the drawings, and are only for the convenience of describing the present invention and simplifying the descriptio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a document layout classification method based on multi-modal fusion, which is characterized by aiming at a target document and comprises the following steps of: detecting the target document, and obtaining a to-be-classified detection frame; obtaining text information, frame coordinate information and image features of the detection frame; and adopting a multi-modal fusion model, taking the text information, the frame coordinate information and the image features as input, and outputting the type of the detection frame. Through multi-modal fusion of the text, the position and the image information, the accuracy of document layout classification is improved.

Description

technical field [0001] The invention belongs to the field of deep learning, and in particular relates to a document layout classification method based on multimodal fusion. Background technique [0002] The layout information of the document is needed to analyze and extract the information in the document. The layout information generally includes several categories: header, footer, title, paragraph, table of contents, table and image. [0003] Documents are generally divided into electronic documents and image documents. Electronic documents can be parsed to obtain character information in the document, including text and position information, but cannot directly obtain the layout information of the document. The text and location information in the image document cannot be obtained directly, but needs to be obtained through OCR (Optical Character Recognition) technology. [0004] The layout information of the electronic document can be divided according to the text and po...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06K9/00
CPCG06F18/241G06F18/253
Inventor 陶提许诺高翔纪达麒陈运文
Owner 达观数据(苏州)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products