Patents

Literature

Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.

117 results about "Document structuring" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Document Structuring is a subtask of Natural language generation, which involves deciding the order and grouping (for example into paragraphs) of sentences in a generated text. It is closely related to the Content determination NLG task.

Method and system for automated structuring of textual documents

ActiveUS7251777B1Maximize accuracyMaximize completenessData processing applicationsNatural language data processingDocument structuringAmbiguity

Disclosed is a method for customizable schema-guided conversion of plain-text documents, rich-text documents and textual data records to an XML-compatible structured form. The method makes substantial use of element content model definitions from a chosen target XML schema / DTD to optimize, closely guide, and disambiguate element pattern matching and recognition. Highly granular structure can be inferred, in best possible conformance with the schema. One embodiment operates based on a finite state machine derived via recursive aggregation of the schema element content models. Additionally disclosed is a method for automated document structuring within the environment of an XML-enabled wordprocessor application. The method entails using the host's API to perform element pattern search and matching and to apply markup to the document in accordance with the inferred XML structure. A GUI framework integrated in the wordprocessor workspace can be provided for developing and executing document conversion / structuring definitions.

Method and system for automated structuring of textual documents

Method and system for automated structuring of textual documents

Method and system for automated structuring of textual documents

Owner:ICTECT

Method for classifying sub-trees in semi-structured documents

InactiveUS20060288275A1Easy to organizeSemi-structured data indexingSemi-structured data queryingDocument structuringClassification methods

A method and system for classifying semi-structured documents by distinguishing sub-tree structural information as a distinct representative characteristic of a fragment of the document structure identified by a sub-tree node therein. The structural information comprises both an inner structure and an outer structure which individually can be exploited as representative data in a probabilistic classifier for classifying the sub-tree itself or the entire document. Additional representative feature data can also be independently used for classification and comprises the data content of the fragment structurally represented by the sub-tree and additionally with node attributes. The classification values independently generated from each of the different sets of features can then be combined in an assembly classifier to generate an automated classification system.

Method for classifying sub-trees in semi-structured documents

Method for classifying sub-trees in semi-structured documents

Method for classifying sub-trees in semi-structured documents

Owner:XEROX CORP

Full text retrieval inquiry index method for extensible markup language document in relational database

InactiveCN102033954AImprove management efficiencyImprove execution efficiencySpecial data processing applicationsDocument structuringRelational database

The invention provides a full text retrieval inquiry index method for an extensible markup language document in a relational database. The method comprises the following four steps of: storing XML document data in the way of a mark sequence-based dimensional relation table; constructing a document structure basic information table; creating a word-based inverted index on a node text column of the document structure basic information table; and carrying out full text retrieval inquiry on the basis of the index. By the index method, the management efficiency of the extensible markup language document and the execution efficiency of the full text retrieval operation of the extensible markup language document can be effectively improved, and the inquiry execution time is shortened. The method has relatively high commonality and can be seamlessly fused with existing relation database in the way that the XML document data and the index data are stored in a using relation mode. At the same time, the method can be applied to inquiry of keyword research of the XML document data and then the execution efficiency of inquiry is improved.

Full text retrieval inquiry index method for extensible markup language document in relational database

Full text retrieval inquiry index method for extensible markup language document in relational database

Full text retrieval inquiry index method for extensible markup language document in relational database

Owner:NORTHEASTERN UNIV

Topic model based document keyword extraction method and system

ActiveCN105843795AGood effectReduce the importanceNatural language data processingSpecial data processing applicationsDocument summarizationDocument structuring

The invention discloses a topic model based document keyword extraction method and system. The document keyword extraction method comprises the following steps of document information preprocessing, document structure graph construction, document topic distribution extraction, word weight extraction and keyword generation. The document keyword extraction system comprises the following modules: a document information preprocessing module, a document structure graph construction module, a document topic distribution extraction module, a word weight extraction module and a keyword generation module. According to the method and system, extracted keywords are more reasonable and related to a topic of a document more closely; and partial deficiencies in the keyword extraction field at present are overcome, a better document summarization effect is achieved, and a user can conveniently and quickly know an abstract of the document.

Topic model based document keyword extraction method and system

Topic model based document keyword extraction method and system

Topic model based document keyword extraction method and system

Owner:SOUTH CHINA UNIV OF TECH

Method and expert system for deducing document structure in document conversion

InactiveUS7313754B2Efficiently deducingImprove efficiencyDigital computer detailsNatural language data processingDocument preparationDocumentation

An expert system for more efficiently and accurately deducing document structure from document formatting, the expert system including a conversion engine for converting an unstructured file to a structured file, and a verification engine, responsive to the output of the conversion engine, for generating and displaying a representation of the structured file annotated with a visual depictions of the classified components thereof so that the annotations can be modified and / or classifications can be added and / or classifications can be suggested, and / or rules for classification can be suggested and the structured file reprocessed by the conversion engine.

Method and expert system for deducing document structure in document conversion

Method and expert system for deducing document structure in document conversion

Method and expert system for deducing document structure in document conversion

Owner:TEXTERITY INC

A method and a terminal for creating paper document structured data based on a deep learning model

ActiveCN109800761AImprove efficiencyImprove accuracyCharacter and pattern recognitionNeural architecturesDocument structuringDocument recognition

The invention relates to a method and a terminal for creating paper document structured data based on a deep learning model. The method comprises the following steps: training a sample set through a preset document; wherein each sample in the document sample set comprises a paper document OCR recognition result and a labeled document corresponding to the paper document OCR recognition result; wherein the labeled document records position information and category information of each key field in the OCR recognition result of the paper document; training a preset first deep learning model by using the training sample set to obtain a second deep learning model; enabling the second deep learning model to analyze a first paper document OCR recognition result to obtain position information and category information of each key field in the first paper document OCR recognition result; and creating a structured document corresponding to the first paper document OCR recognition result accordingto the position information and the category information of each key field in the first paper document OCR recognition result. The accuracy of converting the OCR result of the paper document into thestructured document is improved.

A method and a terminal for creating paper document structured data based on a deep learning model

A method and a terminal for creating paper document structured data based on a deep learning model

A method and a terminal for creating paper document structured data based on a deep learning model

Owner:厦门商集网络科技有限责任公司

An API knowledge graph construction method based on a reference document

ActiveCN109739994AEnergy efficient computingText database clustering/classificationStructure analysisSoftware development

The invention belongs to the technical field of software engineering and intelligent software development, and particularly relates to an API knowledge graph construction method based on a reference document. The method comprises the steps that a basic skeleton structure of an API element is obtained through document structure analysis, and function description and use mode description are recognized by automatically classifying sentences in descriptive content of the API element; performing common concept identification and linking among the description information of different API elements to realize internal knowledge fusion; and performing concept linkage between the common concept in the API element description information and the related technical concepts in the general knowledge graph to realize external knowledge fusion. The constructed API knowledge graph comprises API packets, classes, interfaces, methods, attributes, abnormalities, method parameters, return values and relations among the elements. According to the constructed API knowledge graph, intelligent applications such as API knowledge semantic query, automatic question answering, auxiliary code understanding andcode recommendation are supported through structured knowledge representation.

An API knowledge graph construction method based on a reference document

Owner:FUDAN UNIV

Model-based job supporting system and method thereof

InactiveUS6141665AEasy to changeDigital computer detailsOffice automationService modelSupporting system

A job model with which an organization model representing an organization structure, a document model representing a document structure, and a work model representing a work procedure are correlated, is stored independent from a service model defining each service. When a service is performed, with reference to the job model corresponding to the service model, a service executing module causes a tool control module to control a tool. Thus, the required service is accomplished.

Model-based job supporting system and method thereof

Model-based job supporting system and method thereof

Model-based job supporting system and method thereof

Owner:FUJITSU LTD

Method and system for document classification based on document structure and written style

ActiveUS8082248B2Richer search experienceHigh descriptive-imagesDigital data processing detailsMetadata still image retrievalWeb search engineDocument structuring

A classification method and system for documents containing text sentences and images having meta-data. The classification method and system categorizes document sentences into subjective and non-subjective sentences and categorizes document images into descriptive and non-descriptive. The categorization is further used to calculate subjectivity and descriptive-images classification of a document. This classification system can be used by a web search engine to filter, sort or tag a set of document references based on user selection.

Method and system for document classification based on document structure and written style

Method and system for document classification based on document structure and written style

Method and system for document classification based on document structure and written style

Owner:ABOUYOUNES RANIA

A design system and method for thermal power generation projects

InactiveCN106407572AReal-time interactionReal-timeSoftware designDesign optimisation/simulationSoftware engineeringStructure of Management Information

The invention relates to a design system and method for thermal power generation projects. The layers of the system include, from bottom to top, a data object layer, a computing and processing layer, a document flow management layer, a conversion output layer and a physical layer. The data object layer includes KKS codes, static attributes, correlation relationships and document structures. The computing and processing layer is used for the computing and finished drawing generation of each subject in project design. The document flow management layer sets corresponding operating permissions according to the roles of users in different projects and different flows. The conversion output layer converts required documents and data into a data format that can be analyzed by software in project design. The physical layer comprises a server cluster and clients. The system can design drawings and inventories without manual intervention. The system can automatically complete computing and computing sheet output, realize real-time interaction and concurrent collaboration work of data of different subjects, and generally increase the electronic control process design efficiency by more than 50%.

A design system and method for thermal power generation projects

A design system and method for thermal power generation projects

Owner:四川电力设计咨询有限责任公司

System for validating a document conforming to a first schema with respect to a second schema

ActiveUS20090063952A1Improve computing efficiencyNatural language data processingSpecial data processing applicationsDocument structuringDocument preparation

An improved system for determining compliance between a source document structure in accordance with a source schema and a target schema includes: data storage; and a processor for executing software code. The software code causes the processor to: create a source schema description and a target schema description; receive the source document which includes an ordered tree structure with labeled elements and including a subtree; identify all corresponding element types in the source and target schemas for grouping the corresponding element types into element type pairs; classify each element type pair; and confirm compliance of the source document.

System for validating a document conforming to a first schema with respect to a second schema

System for validating a document conforming to a first schema with respect to a second schema

System for validating a document conforming to a first schema with respect to a second schema

Owner:DOMO

Data version comparison method used for Excel documents

ActiveCN108009264AImprove the efficiency of data comparisonReduce complexitySpecial data processing applicationsXML schemaDocument structuring

The invention relates to a data version comparison method used for Excel documents. The method specifically comprises the steps of 1) selecting the Excel documents, performing data structuring processing on the Excel documents, and converting the Excel documents into structured data; 2) after the structured data in the step 1) is obtained, performing data description of original Excel documents ina predefined XML Schema data format, and converting the structured data of the Excel documents into structured data of XML documents, thereby obtaining the structured data of the first XML document;3) repeating the steps 1)-2) to obtain the structured data of the second XML document; and 4) comparing the step 2) with the step 3), performing comparison by adopting a bidirectional comparison method through traversing an XML data mode according to different data of an XML document memory to obtain a comparison result of the structured data of the first XML document and a comparison result of the structured data of the second XML document, storing the comparison results, and performing display.

Data version comparison method used for Excel documents

Data version comparison method used for Excel documents

Data version comparison method used for Excel documents

Owner:BEIJING AEROSPACE MEASUREMENT & CONTROL TECH

Financial document information processing method and device, electronic equipment and storage medium

ActiveCN110909226AImprove review efficiencyImprove accuracyDatabase management systemsFinanceInformation processingDocument structuring

The embodiment of the invention discloses a financial document information processing method and device, electronic equipment and a storage medium. The financial document information processing methodcomprises the steps: enabling a to-be-audited financial document to generate document structural data through a document processing module; generating financial subject structured data based on the document structured data; inputting the document structured data into a text error correction model, and outputting an error correction result; inputting the document structured data into a manager information casual inspection and verification module to generate a verification result of manager information; respectively inputting financial subject structured data into a financial index formula calculation module, a financial subject change verification module and a financial statement extraction verification module; respectively generating a verification result of the financial index formula,a verification result of financial subject change and a verification result of financial subject data and corresponding reference data; and displaying all verification results and error correction results. According to the technical scheme provided by the embodiment of the invention, the financial document auditing efficiency can be improved.

Financial document information processing method and device, electronic equipment and storage medium

Financial document information processing method and device, electronic equipment and storage medium

Financial document information processing method and device, electronic equipment and storage medium

Owner:DATAGRAND TECH INC

A method and a device for acquiring document information

ActiveCN109685056ACharacter and pattern recognitionDocument structuringAlgorithm

The invention relates to a document information extraction method and device based on sequence labeling and a learning model. The method comprises the following steps: training at least one sequence labeling algorithm model to obtain at least one offline sequence labeling algorithm model; Determining the accuracy of the annotation information in each of the offline sequence annotation algorithm models, and converting a to-be-processed document into a text document; Obtaining document structure format property information from the to-be-processed document; And inputting the text document and the structural format property information into the offline sequence labeling algorithm model to obtain labeling information corresponding to the document information in the document. According to the method, the key information of the document can be extracted by using the sequence labeling technology. And by using a multi-model fusion technology, different key information in the document can be extracted by using an optimal model. In addition, business rule reasoning and calculation are carried out on a typeface extraction result, and the application range is wider.

A method and a device for acquiring document information

A method and a device for acquiring document information

A method and a device for acquiring document information

Owner:DATAGRAND TECH INC

Method and system relating to salient content extraction for electronic content

ActiveUS20130311169A1Improve artNatural language translationSemantic analysisStructure analysisSubject matter

Individuals receive overwhelming barrage of information which must be filtered, processed, analysed, reviewed, consolidated and distributed or acted upon. Automatic approaches to “scraping” salient content from sources of content are provided allowing the salient content to be provided to the user or subjected to further processing such as clustering or sentiment analysis for example.Embodiments of the invention provide for:automated scraper induction based on document and / or contextual semantic cues and document structure analysis.identifying salient text, removing boiler-plate text, off-topic content and other non-salient content;deriving reusable descriptive extraction patterns for subsequent documents;applying descriptive extraction patterns for extraction from subsequent documents from the same source;intelligent identification of extraction success confidence score, using historical success scores; andemploying confidence scores to automatically trigger new extraction pattern identification if extracted confidence is below an acceptable confidence threshold.

Method and system relating to salient content extraction for electronic content

Method and system relating to salient content extraction for electronic content

Method and system relating to salient content extraction for electronic content

Owner:WHYZ TECH

Systems and methods for machine content generation

ActiveUS20220237368A1Low costImprove communication efficiencySemantic analysisDigital data authenticationDocument structuringComputerized system

Computerized systems and methods are disclosed to generate a document by providing a document structure having one or more seed landmark texts therein, each landmark text including a milestone overview text and a plurality of component texts; from the milestone overview text, generating one or more computer-generated text suggestions to supplement the milestone overview text; combining the milestone overview text with each component text and generating one or more computer-generated component text suggestions; and creating the document by combining the milestone overview, the one or more computer-generated text suggestions, and each component text with corresponding one or more computer-generated component text suggestions.

Systems and methods for machine content generation

Systems and methods for machine content generation

Systems and methods for machine content generation

Owner:TRAN BAO

Link management of document structures

InactiveUS7275066B2Avoid learningData processing applicationsDigital data processing detailsLink managementData science

Links are managed and units of information are linked based on a list having identifiers placed in a hierarchical order relative to other identifiers, the identifiers for identifying the units of information. Lists are stored and examined to determine the hierarchical order of the identifiers relative to the other identifiers, and a unit of information is linked to at least one other unit of information based on a relative hierarchical order between an identifier identifying the unit of information and another identifier identifying at least one other unit of information.

Link management of document structures

Link management of document structures

Link management of document structures

Owner:GOOGLE LLC

Electronic conversation text summarization

ActiveUS7886012B2Natural language data processingMultiple digital computer combinationsDocument structuringDocument preparation

Summarization of text in a document may be requested in dependence upon the position of the text in relation to other text within the document or the position of the document containing the text within a plurality of documents in a document structure. Summarization of text in a document may also be requested in dependence upon a user's interaction with an application in conjunction with a version of the document or with a document structure including the document. Different levels of summarization may be applied to different segments of text within a document.

Electronic conversation text summarization

Electronic conversation text summarization

Electronic conversation text summarization

Owner:SNAP INC

Probabilistic learning method for XML annotation of documents

InactiveUS20070022373A1Natural language data processingSpecial data processing applicationsDocument structuringTheoretical computer science

A document processor includes a parser that parses a document using a grammar having a set of terminal elements for labeling leaves, a set of non terminal elements for labeling nodes, and a set of transformation rules. The parsing generates a parsed document structure including terminal element labels for fragments of the document and a nodes tree linking the terminal element labels and conforming with the transformation rules. An annotator-annotates the document with structural information based on the parsed document structure.

Probabilistic learning method for XML annotation of documents

Probabilistic learning method for XML annotation of documents

Probabilistic learning method for XML annotation of documents

Owner:XEROX CORP

Search methods and various applications

InactiveUS20110112993A1Easy accessDigital data processing detailsDigital computer detailsInformation retrievalKnowledge structure

The present invention relates to a system and method for information process using artificially constructed apparatus. More specially, in one preferred embodiment of the present invention, documents can be processed so that the most relevant terms of the contents of the documents can be obtained, and searched. In another preferred embodiment of the present invention, the present invention provides a system and method that can search for information in a document structure and provide precise results by analyzing the inputs and search results using the executing system and the knowledge structure of the think system.

Search methods and various applications

Search methods and various applications

Search methods and various applications

Owner:ZHANG QIN

OFD document webpage end browsing method and system

ActiveCN110765385AReduce time lossReduce waiting timeSpecial data processing applicationsWeb data browsing optimisationDocument structuringDocumentation

The invention provides an OFD document webpage end browsing method and system, and the method comprises the steps: a browser transmitting an OFD document identification to a server, and the server returning the page structure information of an OFD document to the browser; and the browser grouping the OFD documents according to the received information, generating HTML tags of all groups, and generating HTML tags containing pages under the current group; if the current page is the nth page, the browser querying whether the (n-1)th page, the nth page, the (n + 1)th page and the (n + 2)th page are loaded or not in a browser cache, if yes, not processing, and if not, requesting corresponding page data to be loaded to the server. An asynchronous loading mode is adopted, loading is carried out according to needs, waiting time of a browser end is shortened, an HTML document structure is simplified, rendering pressure of a browser can be reduced, and response speed of the browser is improved.

OFD document webpage end browsing method and system

OFD document webpage end browsing method and system

Owner:BEIJING THUNISOFT INFORMATION TECH

Document structured data embedding method and system

ActiveCN111259202AExtended editingEasy to manageSemi-structured data indexingNatural language data processingData ingestionData control

The invention relates to the field of computer knowledge management. The invention relates to the field of data embedding, in particular to a document structured data embedding method and system. Thesystem comprises a template generator, a document editor, a structured data collector, a data authentication processor, a structured data controller, a template library and a data extraction and conversion interface, the method specifically comprises the following steps of: constructing a document structured framework template; pre-loading into a document editor; editing of the structured data label and the extensible semi-structured data label is completed, the edited document data is extracted and converted into xml structural body data and document attribute fields, the structural body datais embedded into a target format file, and the structured data and the extensible semi-structured data in the structural body data are extracted. By means of the method, the documents can meet the requirements for manual reading, understanding, using and filing, automatic collection and processing of the documents embedded with the structural data can be achieved, and the requirements for the standardization degree and data precision of the documents can be effectively controlled.

Document structured data embedding method and system

Document structured data embedding method and system

Document structured data embedding method and system

Owner:XINING NINGGUANG ENG CONSULTATION +1

Global normalized reader systems and methods

ActiveCN108733742AMathematical modelsSemantic analysisBeam searchTheoretical computer science

Presented herein are systems and methods for question answering (QA). In embodiments, extractive question answering (QA) is cast as an iterative search problem through the document's structure: selectthe answer's sentence, start word, and end word. This representation reduces the space of each search step and allows computation to be conditionally allocated to promising search paths. In embodiments, globally normalizing the decision process and back-propagating through beam search makes this representation viable and learning efficient. Various model embodiments, referred to as Globally Normalized Readers (GNR), achieve excellent performance. Also introduced are embodiments of data-augmentation to produce semantically valid examples by aligning named entities to a knowledge base and performing swaps new entities of the same type. This methodology also improved the performance of GNR models and is of independent interest for a variety of natural language processing (NLP) tasks.

Global normalized reader systems and methods

Global normalized reader systems and methods

Global normalized reader systems and methods

Owner:BAIDU USA LLC

Document structuring method and device

PendingCN110175322ARealize the structureGuaranteed accuracySemantic analysisSpecial data processing applicationsDocument structuringDocumentation

The invention provides a document structuring method and device, wherein the method comprises the steps of dividing a to-be-structured document into a plurality of single chapter documents according to a text structure recognition model, calculating the similarity between the chapter title and each template name in the structured template to obtain an adaptive template name, calculating the similarity between the elements corresponding to the adaptive template names and the subordinate statements of corresponding chapter titles to obtain the adaptive statements, and filling the adaptive statements of all the single chapter documents into the corresponding fillable areas in the structured template to obtain the structured document, so that according to the document structuring method and device provided by the invention, the unstructured document can be accurately divided according to the preset structuring template, and the structured document having the corresponding relationship withthe template name and elements can be accurately generated, and accordingly the accuracy of subsequently determining the key points is ensured.

Document structuring method and device

Document structuring method and device

Document structuring method and device

Owner:ZHONGKE DINGFU BEIJING TECH DEV

Encoding/decoding apparatus, method and computer program

InactiveUS8250465B2Small data sizeDigital data processing detailsCode conversionInformation processingDocument preparation

An information processing apparatus comprises a readout unit adapted to read out, from a storage unit, correspondence information that includes a document structure of a structured document and a first code for encoding the document structure; a verification unit adapted to verify whether grammar of a portion included in a structured document for processing is valid, based on the document structure included in the correspondence information; and an encoding unit adapted to encode the structured document using the first code, in relation to a portion whose grammar is verified as being valid by the verification unit.

Encoding/decoding apparatus, method and computer program

Encoding/decoding apparatus, method and computer program

Encoding/decoding apparatus, method and computer program

Owner:CANON KK

Document structuration organizing method and device

ActiveCN103678302AImplement automatic organizationEasy to readSpecial data processing applicationsText database clustering/classificationKnowledge FieldDocumentation

The invention discloses a document structuration organizing method and device. The document structuration organizing method includes the steps of obtaining a theme framework of a hierarchical structure, forming a searching condition through a theme text in the theme framework, carrying out searching in a preset document set with the searching condition, and adding a document into a corresponding theme document set in the theme framework according to the matching condition of the searching result and the searching condition. Compared with the prior art, the technical scheme of the document structuration organizing method and device can be used for automatically building proper classification systems according to different knowledge fields; as the theme framework is built with mature expert knowledge, inner links of classifications can be well reflected, and a user can conveniently read a large number of texts in a systematized mode.

Document structuration organizing method and device

Document structuration organizing method and device

Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Text segmentation and topic annotation for document structuring

InactiveCN1894686ANatural language analysisSpecial data processing applicationsDocument structuringComputerized system

The invention relates to a method, a computer program product and a computer system for structuring an unstructured text by making use of statistical models trained on annotated training data. Each section of text in which the text is segmented is further assigned to a topic which is associated to a set of labels. The statistical models for the segmentation of the text and for the assignment of a topic and its associated labels to a section of text explicitly accounts for: correlations between a section of text and a topic, a topic transition between sections, a topic position within the document and a (topic-dependent) section length. Hence structural information of the training data is exploited in order to perform segmentation and annotation of unknown text.

Text segmentation and topic annotation for document structuring

Text segmentation and topic annotation for document structuring

Text segmentation and topic annotation for document structuring

Owner:KONINKLIJKE PHILIPS ELECTRONICS NV

Document online editing system and method based on authority control

PendingCN112258140AAchieve sharingGuaranteed independenceNatural language data processingDigital data authenticationDocument structuringInternet privacy

One embodiment of the invention discloses a document online editing system and method based on authority control, and the system comprises a document management module which is used for storing document templates and all documents edited by users, and setting authority levels corresponding to the document template and the documents; a personnel management module used for dividing authority levelsof the users; a document structuralization module used for performing structuralization division on the document templates or the documents; an authority module used for distributing and structuring the divided document templates or documents to the users according to the authority levels of the document modules or documents and the authority levels of the users; and an online editing module usedfor carrying out online editing on the document templates or the documents by the users and storing the content.

Document online editing system and method based on authority control

Document online editing system and method based on authority control

Owner:BEIJING SIMULATION CENT

General document identification method and system, terminal and storage medium

PendingCN112699234AImprove accuracyField changes lessNatural language data processingNeural architecturesDocument structuringStructured content

The invention provides a universal document recognition method, which comprises the following steps of: obtaining text information of one or more text fields in a document, the text information comprising text content and a text bounding box; obtaining category information in one-to-one correspondence with one or more text fields in the document, wherein the category information at least comprises a primary key field category Key and a value field category Value; obtaining a connection relationship between the character field of which the category is Key and other character fields; and on the basis of the connection relationship, obtaining a Value-class text field connected or disconnected with a Key-class text field and / or a Key-class text field as structured content corresponding to the Key-class text field, determining class information and text information of the structured content, and completing identification of the document. Meanwhile, the invention provides a corresponding system, a terminal and a storage medium. According to the method and the device, the accuracy and universality of document structured content identification are improved.

General document identification method and system, terminal and storage medium

General document identification method and system, terminal and storage medium

General document identification method and system, terminal and storage medium

Owner:上海深杳智能科技有限公司 +1

Structured document retrieval device and program

InactiveCN103425719ASpecial data processing applicationsDocument structuringRoot element

The invention provides a structured document retrieval device and program, capable of performing structure retrieval combining both of the structure information based on an XML label and the structure information based on a comment label. The device comprises: a processor, which executes the program; a first storage region, which stores the program; a second storage region, which stores a structured document satisfying a tree structure condition and comment data added onto the document; a document structure list building part, which aims at a root element generalized structure of a DOM tree individually obtained based on including relations of the labels of the structured document and the comment data, distributes a text of the structured document, and generates a text common DOM tree; and a retrieval process part, which indexes elements according with the retrieval from the text common DOM tree.

Structured document retrieval device and program

Structured document retrieval device and program

Structured document retrieval device and program

Owner:HITACHI LTD

Popular searches

Computer science Working space Text document Pattern matching Application software Workspace Data records Finite-state machine Recursion Data mining

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com