Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

239 results about "Document recognition" patented technology

Intelligent document recognition is a new technology that promises to transform the way businesses handle document processing. An Intelligent document recognition system analyzes the content of the document that it receives, and looks for certain keywords that match its database of business terms.

Query-less searching

Some embodiments of the invention provide a method for identifying relevant documents. The method receives a set of reference documents. The method analyzes the received set of reference documents. Based on this analysis, the method then identifies one or more documents that are potentially relevant to the discussion in one or more reference documents. In some embodiments, the method identifies the relevant documents by examining candidate documents that are on a computer or are accessible by a computer through a computer network (e.g., a local area network, a wide area network, or a network of networks, such as the Internet). In these embodiments, the method uses its analysis of the reference document set to determine whether the discussion (i.e., content) of the candidate document is relevant to the topics discussed in one or more of the reference documents. If so, the method of some embodiments identifies the candidate document as a potentially relevant document (i.e., as a document that is potentially relevant or related to the reference document set).
Owner:CALIFORNIA INST OF TECH +1

Method and system for semantic search and retrieval of electronic documents

A system and method for semantic search for electronic documents stored on a computer readable media, and providing a search result in response to a query. The system includes a corpus including a plurality of electronic documents that are domain tagged at a document level and analyzed based on the tags to identify word usage patterns. An index of word usage patterns is provided that indexes the plurality of documents in the corpus according to their word usage patterns. The system also includes a query pre-processing module that receives a query from a user, and analyzes the query to determine probable word usage patterns in the query. The system further includes a processor that uses the index to identify documents having word usage patterns that matches the probable word usage patterns in the query as a candidate electronic document, and retrieves the candidate electronic document.
Owner:TEXTDIGGER

Information processor, document management system, and processing method and program of information processor

A client terminal acquires from a server terminal one or more document information which includes at least a thumbnail image and document identification information for identifying document data corresponding to the thumbnail image, and includes first annotation data and / or second annotation data associated with the document identification information. If first annotation data is included in respective acquired document information, the client terminal displays a thumbnail image with which the first annotation data is combined, as a list with thumbnail view, on a display unit. If the second annotation data is included in specified document data, the client terminal individually displays specified document data with which the second annotation data is combined, on a display unit.
Owner:CANON KK

Methods and systems for analyzing XML documents

Methods and systems for analyzing XML documents. The system scans an XML document, identifies different dimensions that span the XML document and detects scoping relationships amongst them. The system uses the dimensional information to create a logical hierarchical scoped dimension analysis model, maps the logical XML tree to this model, and then implements the analytical method over the logical model. The logical model allows both structural features and numeric / non-numeric data to be used for analysis. The analytical method allows users to query irregular structural properties of the XML documents using the XPath navigational API.
Owner:IBM CORP

Assigning document identification tags

Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.
Owner:GOOGLE LLC

System, method, and computer program product for identifying multi-page documents in hypertext collections

A system, method, and computer program product for identifying compound documents as a coherent body of hyperlinked material on a single topic as created by an author or collaborating authors, analyzing the content and structure of the compound documents and related hyperlinks, and responsively selecting a preferred entry point at which to begin processing such documents. The body of material may include the internet, an intranet, or other digital library that typically has content distributed over several separate pages or URLs, sometimes in a hierarchical directory structure. The processing may include creating at least one taxonomy, as well as searching or indexing the compound documents. The identification and analysis schemes include a observation of a number of heuristics run on component documents in the compound documents.
Owner:IBM CORP

Web page authoring apparatus, web page authoring method, and program

The present invention improves application of a style to a view object when a document for a Web page to be edited is edited on a browser-type edit screen. First, a view object is detected from a managed document. Then, a direct style directly described in the managed document and an indirect style identified only by referring to an external document are collected. A browser-type edit screen is generated in which the direct and indirect styles are applied to each view object. The content of the managed document is synchronized with the edited content on the browser-type edit screen based on the editing operations on the browser-type edit screen.
Owner:IBM CORP

System and method for using text analytics to identify a set of related documents from a source document

A system and method for processing a document to generate a set of related documents. A system is provided that includes a textual analytics system that analyzes unstructured data contained in a source document and extracts a set of structured information about the source document; and a compare system that identifies a set of related documents by comparing the set of structured information with metadata indexed from a set of publications.
Owner:IBM CORP

A self-learning system and methods for automatic document recognition, authentication, and information extraction

A computerized system for classifying and authenticating documents is provided. The Classification process involves the creation of a Unique Pair Feature Vector which provides the best discrimination information for each pair of Document Classes at every node in a Pairwise Comparison Nodal Network. The Nodal Network has a plurality of nodes, each node corresponding to the best discrimination information between two potential document classes. By performing a pairwise comparison of the potential documents using this nodal network, the document is classified. After classification, the document can be authenticated for validity.
Owner:META PLATFORMS INC

System and method for performing electronic information retrieval using keywords

Output documents similar to an input document are identified. A query is formulated using a list of best keywords from the input document to search for a first set of output documents. The list of best keywords is defined with a maximum number of keywords less than the total number of keywords in the list of best keywords that are identified as belonging to a domain specific dictionary of words and as having no measurable linguistic frequency. Lists of keywords are identified for each output document in the first set of documents. A second set of similar documents is determined using a measure of similarity that is computed between keywords identified in the input document and each output document in the first set of documents.
Owner:XEROX CORP

Network system for directing the transmission of facsimiles

A general document recognition system is described which is intended to be used in connection with an electronic document transmission function used on a computer network. The general document recognition system is set up to recognize any number of document types created by application programs in the network and is also set up with rules as to how to extract destination data from each document type. The extracted data from each document can be the actual intended destination, such as a facsimile telephone number, or can be the identity of the intended recipient individual. If a recipient, rather than a destination, is extracted from the document, the general document recognition system can query a previously designated external database to recover the destination information for that recipient. An LDAP database is the preferred external database for this function.
Owner:ESKER SA

Insurance document imaging and processing system

According to some embodiments, an insurance document is received at a document conversion system. The received document may be converted to a document image, and document identification data may be assigned to the document image. The assigned document identification data may be automatically matched to (and / or associated with) insurance information. It may then be arranged to provide the document image, the insurance information, and / or the document identification data for review. Subsequent to review, an insurance claim may be processed in accordance with the document image, the insurance information, and / or the document identification data.
Owner:HARTFORD FIRE INSURANCE

Systems, methods and computer program products for labeled forms processing

A system, method, and computer product for processing paper documents for electronic storage and retrieval where a label containing a document identification code is generated and is affixed to a paper document. The paper document is then converted to a digital format and transmitted to a central processing center. The digital document is separated into two or more individual pages and may be presented to a user with a viewer program. Through the viewer program the user may then identify a portion of the label to image and convert the imaged portion of the label to textual data relating to the document and its contents. The textual data may then be used in archiving the documents in an archiving program. The data also may be retrieved from a stored database location and verified with information entered in a particular field by a user.
Owner:U S SECURITY ASSOCS

Authoritative document identification

A system determines documents that are associated with a location, identifies a group of signals associated with each of the documents, and determines authoritativeness of the documents for the location based on the signals.
Owner:GOOGLE LLC

Print control mechanism based on printing environment

An image forming apparatus implements a print restriction depending on the environment of the image forming apparatus, such as who is or is not near the image forming apparatus. The image forming apparatus communicates with a short-range wireless terminal for authenticating print data with reference to access right information in which document identifying information identifying the print data is associated with wireless terminal identifying information identifying the short-range wireless terminal. The image forming apparatus includes an acquiring unit for acquiring the wireless terminal identifying information from the short-range wireless terminal; a determining unit for determining whether the printing of the print data should be permitted or not based on the wireless terminal identifying information acquired by the acquiring unit and the access right information; and a control unit for controlling the printing of the print data depending on a result of the determination made by the determining unit.
Owner:RICOH KK

Near-duplicate document detection for web crawling

A system generates a hash value for a fetched document and compares the hash value with a set of stored hash values to identify ones of the stored hash values with a sequence of bit positions, less than all of the bit positions, that match a corresponding sequence of bit positions of the hash value. The system also determines whether any of the identified hash values are substantially similar to the hash value and identify the fetched document as a near-duplicate of another document when one of the identified hash values is substantially similar to the hash value.
Owner:GOOGLE LLC

Systems and methods to automatically classify electronic documents using extracted image and text features and using a machine learning subsystem

A document analysis system that automatically classifies documents by recognizing in each document distinctive features comprises a document acquisition system, a document recognition training system, a document classification system, a document recognition system, and a job organization system. The document acquisition system receives jobs wherein each job containing at least one electronic document. The document feature recognition system automatically extracts image and text features from each received document. The document classification system automatically classifies recognized electronic documents by finding the best match between the extracted features of each of the document and feature sets associated with each category of document. The document recognition training system automatically trains the feature set for each corresponding category of documents, wherein the training system using extracted features of unrecognized documents automatically modifies the feature set for a document category. The job organization system automatically organizes each job according to the document categories it contains.
Owner:GRUNTWORX

Preserving user applied markings made to a hardcopy original document

What is disclosed is a novel system and method for identifying and removing print defects from an original document such that user markings applied to the hardcopy originally can be more readily identified and extracted. In one embodiment, an image of an original document and a marked document are received. The original document was printed using a print device which caused a print defect in the hardcopy print. Methods for identifying the print defect in the difference image are provided herein. The identified print defect is removed from the difference image. The difference image retains the user-applied markings once the print defects have been identified and removed. The user markings can then be provided to a storage device for subsequent retrieval and added into the image of the original document to generate an image of a new marked document containing the user markings without the defect. Various embodiments are disclosed.
Owner:XEROX CORP

Systems and methods for providing data-driven document suggestions

Systems and methods are disclosed for providing at least one document suggestion from a computer system using at least one information source, the method comprising storing in the information source a plurality of associations, each of which includes a numeric coefficient that corresponds to at least one action of a user and at least one document; receiving a triggering action related to the at least one action of the user; comparing the numeric coefficients stored in the information source with a suggestion threshold based on the triggering action; and for each numeric coefficient that exceeds the suggestion threshold, identifying the corresponding at least one document as a suggested document.
Owner:APPSENSE

Dynamically and customizably managing data in compliance with privacy and security standards

Systems and methods for managing data in compliance with privacy, security and / or retention standards in business industries. A dynamic and customizable archival and retrieval system allows for information and documentation to be placed and made available in the system. The document type and identifying information for that document type are described. Definitions are established for the documents being managed, the data identifying the documents, and the retention policies for the documents. The documents are associated with the identifying data for a particular set of records. A single point of entry is provided for external and / or internal requests, and / or a single point of exit is provided for transmissions of information, wherein the transmissions to requestors include information that is individually approved. Moreover, digital authorizations and consents for retrieval from external data sources may be utilized.
Owner:VERISMA SYST

Efficient passage retrieval using document metadata

A system, method and computer program product for efficiently retrieving relevant passages to questions based on a corpus of data. A processor device receives an input query and performs a query analysis to obtain searchable query terms. The processor performs: matching metadata associated with one or more documents against the query terms. The document metadata includes one or more of: a title of the documents, one or more user tags or clouds. Then the processor device performs: mapping matched document metadata to corresponding one or more documents; identifying corresponding matched documents to form a subcorpus of documents; and conducting a search in the data subcorpus using the searchable query terms to obtain one or more passages relevant input query from the identified documents.
Owner:IBM CORP

E-dictionary search apparatus and method for document in which korean characters and chinese characters are mixed

A method for providing a correct e-dictionary search result for a document recognition result includes performing character recognition of a document in which Korean characters (Hangul) and Chinese characters are mixed and displaying a recognition result. If a character string to be searched is selected by a user from the recognition result, determining whether the selected character string corresponds to Hangul or Chinese characters, detecting a Hangul word or a Chinese word included in the selected character string, and outputting an e-dictionary search result corresponding to the detected Hangul or a Chinese word. Accordingly, the user can use an e-dictionary function without directly inputting a search word and obtain a correct e-dictionary search result for a document in which Hangul and Chinese characters are mixed.
Owner:SAMSUNG ELECTRONICS CO LTD

Intelligently driven visual interface on mobile devices and tablets based on implicit and explicit user actions

A method for identifying a desired document is provided to include forming K clusters of documents and, for each cluster: for each respective document of the cluster determining a sum of distances between (i) the respective document and (ii) each of the other documents of the cluster; and identifying a medoid document of the cluster as the document of the cluster having the smallest sum of determined distances of all of the documents of the cluster. The method also includes selecting M representative documents for each cluster, identifying for dynamic display toward the user K groupings of documents, wherein each of the K groupings of documents identifies the selected M representative documents of a corresponding cluster, and, in response to user selection of one of the K groupings of documents, identifying for dynamic display toward the user P documents of the cluster that corresponds to the selected grouping.
Owner:EVOLV TECH SOLUTIONS INC

Method for generating a graph lattice from a corpus of one or more data graphs

A document recognition system and method, where images are represented as a collection of primitive features whose spatial relations are represented as a graph. Useful subsets of all the possible subgraphs representing different portions of images are represented over a corpus of many images. The data structure is a lattice of subgraphs, and algorithms are provided means to build and use the graph lattice efficiently and effectively.
Owner:PALO ALTO RES CENT INC

Information processing device, information processing system, information processing method, program, and storage medium

An information processing device includes: a feature extracting section for extracting, as format information, a format feature of a process-target document from image data of the process-target document, on which filling-in spaces of plural items are printed; a document recognizing section for comparing the format information of the process-target document with registered format information stored in a storage device, and specifying a registered document that corresponds to the process-target document, the registered format information regarding format features of registered documents; a data acquiring section for converting characters in the image data of the process-target document into text data; and a distributing section for grouping the image data and text data of the characters into plural groups according to a separation rule that is set for the registered document, the characters being written in the fill-in spaces of the items of the process-target document, and for transmitting the different groups to different external devices. With this, information such as personal information to be protected can be processed, preventing an operator dealing with the information from obtaining the whole information.
Owner:SHARP KK

A method and a terminal for creating paper document structured data based on a deep learning model

The invention relates to a method and a terminal for creating paper document structured data based on a deep learning model. The method comprises the following steps: training a sample set through a preset document; wherein each sample in the document sample set comprises a paper document OCR recognition result and a labeled document corresponding to the paper document OCR recognition result; wherein the labeled document records position information and category information of each key field in the OCR recognition result of the paper document; training a preset first deep learning model by using the training sample set to obtain a second deep learning model; enabling the second deep learning model to analyze a first paper document OCR recognition result to obtain position information and category information of each key field in the first paper document OCR recognition result; and creating a structured document corresponding to the first paper document OCR recognition result accordingto the position information and the category information of each key field in the first paper document OCR recognition result. The accuracy of converting the OCR result of the paper document into thestructured document is improved.
Owner:厦门商集网络科技有限责任公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products