Vision-based document segmentation

A document-based technology, applied in permanent visual display devices, unstructured text data retrieval, text database browsing/visualization, etc., can solve problems such as reducing the accuracy of the search process

Inactive Publication Date: 2005-02-09
MICROSOFT CORP
View PDF0 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] These characteristics of web pages can reduce the accuracy of the search process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Vision-based document segmentation
  • Vision-based document segmentation
  • Vision-based document segmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] This invention describes vision-based document segmentation. Vision-based document segmentation identifies portions of a document that include the semantic content of the document, based on the document's visual appearance. Vision-based document segmentation can be used in a number of different ways. For example, segmentation can be used when searching for documents to base search results on the semantic content parts of the documents.

[0021] The discussion that follows is in terms of documents and the models used to describe the structure of documents. Documents may be in any of a variety of formats, such as in accordance with Standard Generalized Markup Language (SGML) such as Extensible Markup Language (XML) format or Hypertext Markup Language (HTML) format. In several embodiments, these documents are web pages in HTML format. The model discussed here may be any of a variety of models that describe the structure of a document. In several embodiments, the model ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Vision-based document segmentation identifies one or more portions of semantic content of a document. The one or more portions are identified by identifying a plurality of visual blocks in the document, and detecting one or more separators between the visual blocks of the plurality of visual blocks. A content structure for the document is constructed based at least in part on the plurality of visual blocks and the one or more separators, and the content structure identifies the one or more portions of semantic content of the document. The content structure obtained using the vision-based document segmentation can optionally be used during document retrieval.

Description

technical field [0001] The present invention relates to segmenting documents, and more particularly to vision-based document segmentation. Background technique [0002] People have access to vast amounts of information. However, finding the specific information they need in any given situation can be quite difficult. For example, through the Internet, a vast amount of information is accessible to people in the form of web pages. The number of such web pages may be on the order of 1 million or more. In addition, the available web pages are constantly changing, with some pages being added, others being deleted, and others being modified. [0003] Thus, when one desires to find out certain information, such as an answer to a question, the ability to extract specific information from this large source of information becomes very important. Processes and technologies were developed to allow users to search for information over the Internet, and are generally made available to...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F15/00G06FG06F17/00G06F17/30G06F40/143G06K9/72G06K15/00
CPCG06F17/30716G06F17/218G06F17/2247G06F16/34G06F40/117G06F40/143G06F15/00G06F17/00
Inventor 文继荣俞诗鹏蔡登马维英
Owner MICROSOFT CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products