Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Systems and methods to automatically classify electronic documents using extracted image and text features and using a machine learning subsystem

Inactive Publication Date: 2009-05-07
GRUNTWORX
View PDF41 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0013]Systems and methods to automatically organize electronic jobs by automatically classifying electronic documents using extracted image and text features and using a machine-learning recognition subsystem are provided. In some embodiments, a document analysis system that automatically classifies documents by recognizing in each document distinctive features that have been automatically learned by the system, so that the system may organize jobs according to the categories of documents the job contains, is provided. The document analysis system includes a document acquisition system, a document recognition training system, a document classification system, a document recognition system, and a job organization system. The document acquisition system receives jobs from a plurality of users, each job containing at least one electronic document having at least one page that includes image aspects and text. The document feature recognition system automatically extracts image and text features from each received electronic document. The document classification system automatically classifies recognized electronic documents as belonging to a corresponding category of document by finding the best match between the extracted features of the document and feature sets associated with each category of document, in which each feature set includes a set of image and text features and corresponding weights that distinguishes the respective category of document from the other categories of documents. The document recognition training system automatically trains the feature set for each corresponding category of documents, the training system using extracted features of unrecognized electronic documents to automatically modify the feature set for a document category so that the ability of the document classification system to automatically classify documents improves as the training system is subjected to more and more unrecognized documents and the feature sets are modified accordingly. And the job organization system automatically organizes each job according to the categories of documents it contains by organizing electronic documents associated with each job based on at least one business rule that corresponds to the categories of documents.

Problems solved by technology

In many instances, however, the paper documents are scanned in a random, unorganized sequence, which makes it difficult and time-consuming to find a particular page within the electronic document.
One solution can be to manually organize the paper documents prior to scanning; however, the individual organizing the paper documents or performing the scanning may not have the skill, knowledge or time needed to correctly organize the paper documents.
Additionally, organizing the paper documents prior to scanning can be very time-consuming and expensive.
Further, organizing the pages prior to scanning might properly order the pages, but it does not generate a table of contents, metadata, bookmarks or a hierarchical index that would facilitate finding a particular page within the complete set of pages.
Manually organizing an electronic document, including typing a table of contents, metadata, bookmarks or a hierarchical index, is time-consuming and expensive.
Manual organization tends to be ad-hoc, failing to deliver a standardized table of contents, metadata, bookmarks or a hierarchical index for the electronic document.
This approach requires the recipient to manually categorize each page, a time-consuming and expensive process.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Systems and methods to automatically classify electronic documents using extracted image and text features and using a machine learning subsystem
  • Systems and methods to automatically classify electronic documents using extracted image and text features and using a machine learning subsystem
  • Systems and methods to automatically classify electronic documents using extracted image and text features and using a machine learning subsystem

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028]While the prior art attempts to reduce the cost of electronic document organization through the use of software, none of the above methods of document organization (1) eliminates the human labor and accompanying requirements of education, domain expertise, training, and / or software knowledge, (2) minimizes time spent entering and quality checking page categorization, (3) minimizes errors and (4) protects the privacy of the owners of the data on the electronic documents being organized. What is needed, therefore, is a method of performing electronic document organization that overcomes the above-mentioned limitations and that includes the features numerated above.

[0029]Preferred embodiments of the present invention provide a method and system for converting paper and digital documents into well-organized electronic documents that are indexed, searchable and editable. The resulting organized electronic documents support more rapid and accurate data entry, retrieval and review th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A document analysis system that automatically classifies documents by recognizing in each document distinctive features comprises a document acquisition system, a document recognition training system, a document classification system, a document recognition system, and a job organization system. The document acquisition system receives jobs wherein each job containing at least one electronic document. The document feature recognition system automatically extracts image and text features from each received document. The document classification system automatically classifies recognized electronic documents by finding the best match between the extracted features of each of the document and feature sets associated with each category of document. The document recognition training system automatically trains the feature set for each corresponding category of documents, wherein the training system using extracted features of unrecognized documents automatically modifies the feature set for a document category. The job organization system automatically organizes each job according to the document categories it contains.

Description

CROSS REFERENCE TO RELATED APPLICATIONS[0001]This application claims the benefit under 35 U.S.C. §119(e) of U.S. Provisional Patent Application Ser. No. 60 / 985,851, filed on Nov. 6, 2007, which is hereby incorporated by reference herein its entirety.[0002]This application is related to the following applications filed concurrently herewith, the entire contents of which are incorporated by reference:[0003]U.S. patent application Ser. No. (TBA), entitled “Systems and Methods for Classifying Electronic Documents by Extracting and Recognizing Text and Image Features Indicative of Document Categories;”[0004]U.S. patent application Ser. No. (TBA), entitled “Systems and Methods for Training a Document Classification System Using Documents from a Plurality of Users;”[0005]U.S. patent application Ser. No. (TBA), entitled “Systems and Methods for Parallel Processing of Document Recognition and Classification Using Extracted Image and Text Features;”[0006]U.S. patent application Ser. No. (TBA)...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62G06V30/40
CPCG06K9/6885G06K9/00442G06V30/40G06V30/1985
Inventor NEOGI, DEPANKARLADD, STEVEN K.AHMED, DILNAWAJKUMAR, ARJUNMAHATA, TUSHAR
Owner GRUNTWORX
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products