Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

117 results about "Document structuring" patented technology

Document Structuring is a subtask of Natural language generation, which involves deciding the order and grouping (for example into paragraphs) of sentences in a generated text. It is closely related to the Content determination NLG task.

Method and system for automated structuring of textual documents

Disclosed is a method for customizable schema-guided conversion of plain-text documents, rich-text documents and textual data records to an XML-compatible structured form. The method makes substantial use of element content model definitions from a chosen target XML schema / DTD to optimize, closely guide, and disambiguate element pattern matching and recognition. Highly granular structure can be inferred, in best possible conformance with the schema. One embodiment operates based on a finite state machine derived via recursive aggregation of the schema element content models. Additionally disclosed is a method for automated document structuring within the environment of an XML-enabled wordprocessor application. The method entails using the host's API to perform element pattern search and matching and to apply markup to the document in accordance with the inferred XML structure. A GUI framework integrated in the wordprocessor workspace can be provided for developing and executing document conversion / structuring definitions.
Owner:ICTECT

Method for classifying sub-trees in semi-structured documents

A method and system for classifying semi-structured documents by distinguishing sub-tree structural information as a distinct representative characteristic of a fragment of the document structure identified by a sub-tree node therein. The structural information comprises both an inner structure and an outer structure which individually can be exploited as representative data in a probabilistic classifier for classifying the sub-tree itself or the entire document. Additional representative feature data can also be independently used for classification and comprises the data content of the fragment structurally represented by the sub-tree and additionally with node attributes. The classification values independently generated from each of the different sets of features can then be combined in an assembly classifier to generate an automated classification system.
Owner:XEROX CORP

Full text retrieval inquiry index method for extensible markup language document in relational database

The invention provides a full text retrieval inquiry index method for an extensible markup language document in a relational database. The method comprises the following four steps of: storing XML document data in the way of a mark sequence-based dimensional relation table; constructing a document structure basic information table; creating a word-based inverted index on a node text column of the document structure basic information table; and carrying out full text retrieval inquiry on the basis of the index. By the index method, the management efficiency of the extensible markup language document and the execution efficiency of the full text retrieval operation of the extensible markup language document can be effectively improved, and the inquiry execution time is shortened. The method has relatively high commonality and can be seamlessly fused with existing relation database in the way that the XML document data and the index data are stored in a using relation mode. At the same time, the method can be applied to inquiry of keyword research of the XML document data and then the execution efficiency of inquiry is improved.
Owner:NORTHEASTERN UNIV

Topic model based document keyword extraction method and system

The invention discloses a topic model based document keyword extraction method and system. The document keyword extraction method comprises the following steps of document information preprocessing, document structure graph construction, document topic distribution extraction, word weight extraction and keyword generation. The document keyword extraction system comprises the following modules: a document information preprocessing module, a document structure graph construction module, a document topic distribution extraction module, a word weight extraction module and a keyword generation module. According to the method and system, extracted keywords are more reasonable and related to a topic of a document more closely; and partial deficiencies in the keyword extraction field at present are overcome, a better document summarization effect is achieved, and a user can conveniently and quickly know an abstract of the document.
Owner:SOUTH CHINA UNIV OF TECH

Method and expert system for deducing document structure in document conversion

An expert system for more efficiently and accurately deducing document structure from document formatting, the expert system including a conversion engine for converting an unstructured file to a structured file, and a verification engine, responsive to the output of the conversion engine, for generating and displaying a representation of the structured file annotated with a visual depictions of the classified components thereof so that the annotations can be modified and / or classifications can be added and / or classifications can be suggested, and / or rules for classification can be suggested and the structured file reprocessed by the conversion engine.
Owner:TEXTERITY INC

A method and a terminal for creating paper document structured data based on a deep learning model

The invention relates to a method and a terminal for creating paper document structured data based on a deep learning model. The method comprises the following steps: training a sample set through a preset document; wherein each sample in the document sample set comprises a paper document OCR recognition result and a labeled document corresponding to the paper document OCR recognition result; wherein the labeled document records position information and category information of each key field in the OCR recognition result of the paper document; training a preset first deep learning model by using the training sample set to obtain a second deep learning model; enabling the second deep learning model to analyze a first paper document OCR recognition result to obtain position information and category information of each key field in the first paper document OCR recognition result; and creating a structured document corresponding to the first paper document OCR recognition result accordingto the position information and the category information of each key field in the first paper document OCR recognition result. The accuracy of converting the OCR result of the paper document into thestructured document is improved.
Owner:厦门商集网络科技有限责任公司

An API knowledge graph construction method based on a reference document

The invention belongs to the technical field of software engineering and intelligent software development, and particularly relates to an API knowledge graph construction method based on a reference document. The method comprises the steps that a basic skeleton structure of an API element is obtained through document structure analysis, and function description and use mode description are recognized by automatically classifying sentences in descriptive content of the API element; performing common concept identification and linking among the description information of different API elements to realize internal knowledge fusion; and performing concept linkage between the common concept in the API element description information and the related technical concepts in the general knowledge graph to realize external knowledge fusion. The constructed API knowledge graph comprises API packets, classes, interfaces, methods, attributes, abnormalities, method parameters, return values and relations among the elements. According to the constructed API knowledge graph, intelligent applications such as API knowledge semantic query, automatic question answering, auxiliary code understanding andcode recommendation are supported through structured knowledge representation.
Owner:FUDAN UNIV

Model-based job supporting system and method thereof

A job model with which an organization model representing an organization structure, a document model representing a document structure, and a work model representing a work procedure are correlated, is stored independent from a service model defining each service. When a service is performed, with reference to the job model corresponding to the service model, a service executing module causes a tool control module to control a tool. Thus, the required service is accomplished.
Owner:FUJITSU LTD

Method and system for document classification based on document structure and written style

A classification method and system for documents containing text sentences and images having meta-data. The classification method and system categorizes document sentences into subjective and non-subjective sentences and categorizes document images into descriptive and non-descriptive. The categorization is further used to calculate subjectivity and descriptive-images classification of a document. This classification system can be used by a web search engine to filter, sort or tag a set of document references based on user selection.
Owner:ABOUYOUNES RANIA

A design system and method for thermal power generation projects

The invention relates to a design system and method for thermal power generation projects. The layers of the system include, from bottom to top, a data object layer, a computing and processing layer, a document flow management layer, a conversion output layer and a physical layer. The data object layer includes KKS codes, static attributes, correlation relationships and document structures. The computing and processing layer is used for the computing and finished drawing generation of each subject in project design. The document flow management layer sets corresponding operating permissions according to the roles of users in different projects and different flows. The conversion output layer converts required documents and data into a data format that can be analyzed by software in project design. The physical layer comprises a server cluster and clients. The system can design drawings and inventories without manual intervention. The system can automatically complete computing and computing sheet output, realize real-time interaction and concurrent collaboration work of data of different subjects, and generally increase the electronic control process design efficiency by more than 50%.
Owner:四川电力设计咨询有限责任公司

Data version comparison method used for Excel documents

ActiveCN108009264AImprove the efficiency of data comparisonReduce complexitySpecial data processing applicationsXML schemaDocument structuring
The invention relates to a data version comparison method used for Excel documents. The method specifically comprises the steps of 1) selecting the Excel documents, performing data structuring processing on the Excel documents, and converting the Excel documents into structured data; 2) after the structured data in the step 1) is obtained, performing data description of original Excel documents ina predefined XML Schema data format, and converting the structured data of the Excel documents into structured data of XML documents, thereby obtaining the structured data of the first XML document;3) repeating the steps 1)-2) to obtain the structured data of the second XML document; and 4) comparing the step 2) with the step 3), performing comparison by adopting a bidirectional comparison method through traversing an XML data mode according to different data of an XML document memory to obtain a comparison result of the structured data of the first XML document and a comparison result of the structured data of the second XML document, storing the comparison results, and performing display.
Owner:BEIJING AEROSPACE MEASUREMENT & CONTROL TECH

Financial document information processing method and device, electronic equipment and storage medium

The embodiment of the invention discloses a financial document information processing method and device, electronic equipment and a storage medium. The financial document information processing methodcomprises the steps: enabling a to-be-audited financial document to generate document structural data through a document processing module; generating financial subject structured data based on the document structured data; inputting the document structured data into a text error correction model, and outputting an error correction result; inputting the document structured data into a manager information casual inspection and verification module to generate a verification result of manager information; respectively inputting financial subject structured data into a financial index formula calculation module, a financial subject change verification module and a financial statement extraction verification module; respectively generating a verification result of the financial index formula,a verification result of financial subject change and a verification result of financial subject data and corresponding reference data; and displaying all verification results and error correction results. According to the technical scheme provided by the embodiment of the invention, the financial document auditing efficiency can be improved.
Owner:DATAGRAND TECH INC

A method and a device for acquiring document information

The invention relates to a document information extraction method and device based on sequence labeling and a learning model. The method comprises the following steps: training at least one sequence labeling algorithm model to obtain at least one offline sequence labeling algorithm model; Determining the accuracy of the annotation information in each of the offline sequence annotation algorithm models, and converting a to-be-processed document into a text document; Obtaining document structure format property information from the to-be-processed document; And inputting the text document and the structural format property information into the offline sequence labeling algorithm model to obtain labeling information corresponding to the document information in the document. According to the method, the key information of the document can be extracted by using the sequence labeling technology. And by using a multi-model fusion technology, different key information in the document can be extracted by using an optimal model. In addition, business rule reasoning and calculation are carried out on a typeface extraction result, and the application range is wider.
Owner:DATAGRAND TECH INC

Method and system relating to salient content extraction for electronic content

Individuals receive overwhelming barrage of information which must be filtered, processed, analysed, reviewed, consolidated and distributed or acted upon. Automatic approaches to “scraping” salient content from sources of content are provided allowing the salient content to be provided to the user or subjected to further processing such as clustering or sentiment analysis for example.Embodiments of the invention provide for:automated scraper induction based on document and / or contextual semantic cues and document structure analysis.identifying salient text, removing boiler-plate text, off-topic content and other non-salient content;deriving reusable descriptive extraction patterns for subsequent documents;applying descriptive extraction patterns for extraction from subsequent documents from the same source;intelligent identification of extraction success confidence score, using historical success scores; andemploying confidence scores to automatically trigger new extraction pattern identification if extracted confidence is below an acceptable confidence threshold.
Owner:WHYZ TECH

Systems and methods for machine content generation

Computerized systems and methods are disclosed to generate a document by providing a document structure having one or more seed landmark texts therein, each landmark text including a milestone overview text and a plurality of component texts; from the milestone overview text, generating one or more computer-generated text suggestions to supplement the milestone overview text; combining the milestone overview text with each component text and generating one or more computer-generated component text suggestions; and creating the document by combining the milestone overview, the one or more computer-generated text suggestions, and each component text with corresponding one or more computer-generated component text suggestions.
Owner:TRAN BAO

Link management of document structures

Links are managed and units of information are linked based on a list having identifiers placed in a hierarchical order relative to other identifiers, the identifiers for identifying the units of information. Lists are stored and examined to determine the hierarchical order of the identifiers relative to the other identifiers, and a unit of information is linked to at least one other unit of information based on a relative hierarchical order between an identifier identifying the unit of information and another identifier identifying at least one other unit of information.
Owner:GOOGLE LLC

Probabilistic learning method for XML annotation of documents

A document processor includes a parser that parses a document using a grammar having a set of terminal elements for labeling leaves, a set of non terminal elements for labeling nodes, and a set of transformation rules. The parsing generates a parsed document structure including terminal element labels for fragments of the document and a nodes tree linking the terminal element labels and conforming with the transformation rules. An annotator-annotates the document with structural information based on the parsed document structure.
Owner:XEROX CORP

Search methods and various applications

The present invention relates to a system and method for information process using artificially constructed apparatus. More specially, in one preferred embodiment of the present invention, documents can be processed so that the most relevant terms of the contents of the documents can be obtained, and searched. In another preferred embodiment of the present invention, the present invention provides a system and method that can search for information in a document structure and provide precise results by analyzing the inputs and search results using the executing system and the knowledge structure of the think system.
Owner:ZHANG QIN

OFD document webpage end browsing method and system

The invention provides an OFD document webpage end browsing method and system, and the method comprises the steps: a browser transmitting an OFD document identification to a server, and the server returning the page structure information of an OFD document to the browser; and the browser grouping the OFD documents according to the received information, generating HTML tags of all groups, and generating HTML tags containing pages under the current group; if the current page is the nth page, the browser querying whether the (n-1)th page, the nth page, the (n + 1)th page and the (n + 2)th page are loaded or not in a browser cache, if yes, not processing, and if not, requesting corresponding page data to be loaded to the server. An asynchronous loading mode is adopted, loading is carried out according to needs, waiting time of a browser end is shortened, an HTML document structure is simplified, rendering pressure of a browser can be reduced, and response speed of the browser is improved.
Owner:BEIJING THUNISOFT INFORMATION TECH

Document structured data embedding method and system

The invention relates to the field of computer knowledge management. The invention relates to the field of data embedding, in particular to a document structured data embedding method and system. Thesystem comprises a template generator, a document editor, a structured data collector, a data authentication processor, a structured data controller, a template library and a data extraction and conversion interface, the method specifically comprises the following steps of: constructing a document structured framework template; pre-loading into a document editor; editing of the structured data label and the extensible semi-structured data label is completed, the edited document data is extracted and converted into xml structural body data and document attribute fields, the structural body datais embedded into a target format file, and the structured data and the extensible semi-structured data in the structural body data are extracted. By means of the method, the documents can meet the requirements for manual reading, understanding, using and filing, automatic collection and processing of the documents embedded with the structural data can be achieved, and the requirements for the standardization degree and data precision of the documents can be effectively controlled.
Owner:XINING NINGGUANG ENG CONSULTATION +1

Global normalized reader systems and methods

Presented herein are systems and methods for question answering (QA). In embodiments, extractive question answering (QA) is cast as an iterative search problem through the document's structure: selectthe answer's sentence, start word, and end word. This representation reduces the space of each search step and allows computation to be conditionally allocated to promising search paths. In embodiments, globally normalizing the decision process and back-propagating through beam search makes this representation viable and learning efficient. Various model embodiments, referred to as Globally Normalized Readers (GNR), achieve excellent performance. Also introduced are embodiments of data-augmentation to produce semantically valid examples by aligning named entities to a knowledge base and performing swaps new entities of the same type. This methodology also improved the performance of GNR models and is of independent interest for a variety of natural language processing (NLP) tasks.
Owner:BAIDU USA LLC

Document structuring method and device

The invention provides a document structuring method and device, wherein the method comprises the steps of dividing a to-be-structured document into a plurality of single chapter documents according to a text structure recognition model, calculating the similarity between the chapter title and each template name in the structured template to obtain an adaptive template name, calculating the similarity between the elements corresponding to the adaptive template names and the subordinate statements of corresponding chapter titles to obtain the adaptive statements, and filling the adaptive statements of all the single chapter documents into the corresponding fillable areas in the structured template to obtain the structured document, so that according to the document structuring method and device provided by the invention, the unstructured document can be accurately divided according to the preset structuring template, and the structured document having the corresponding relationship withthe template name and elements can be accurately generated, and accordingly the accuracy of subsequently determining the key points is ensured.
Owner:ZHONGKE DINGFU BEIJING TECH DEV

Encoding/decoding apparatus, method and computer program

An information processing apparatus comprises a readout unit adapted to read out, from a storage unit, correspondence information that includes a document structure of a structured document and a first code for encoding the document structure; a verification unit adapted to verify whether grammar of a portion included in a structured document for processing is valid, based on the document structure included in the correspondence information; and an encoding unit adapted to encode the structured document using the first code, in relation to a portion whose grammar is verified as being valid by the verification unit.
Owner:CANON KK

Text segmentation and topic annotation for document structuring

The invention relates to a method, a computer program product and a computer system for structuring an unstructured text by making use of statistical models trained on annotated training data. Each section of text in which the text is segmented is further assigned to a topic which is associated to a set of labels. The statistical models for the segmentation of the text and for the assignment of a topic and its associated labels to a section of text explicitly accounts for: correlations between a section of text and a topic, a topic transition between sections, a topic position within the document and a (topic-dependent) section length. Hence structural information of the training data is exploited in order to perform segmentation and annotation of unknown text.
Owner:KONINKLIJKE PHILIPS ELECTRONICS NV

Document online editing system and method based on authority control

One embodiment of the invention discloses a document online editing system and method based on authority control, and the system comprises a document management module which is used for storing document templates and all documents edited by users, and setting authority levels corresponding to the document template and the documents; a personnel management module used for dividing authority levelsof the users; a document structuralization module used for performing structuralization division on the document templates or the documents; an authority module used for distributing and structuring the divided document templates or documents to the users according to the authority levels of the document modules or documents and the authority levels of the users; and an online editing module usedfor carrying out online editing on the document templates or the documents by the users and storing the content.
Owner:BEIJING SIMULATION CENT

General document identification method and system, terminal and storage medium

The invention provides a universal document recognition method, which comprises the following steps of: obtaining text information of one or more text fields in a document, the text information comprising text content and a text bounding box; obtaining category information in one-to-one correspondence with one or more text fields in the document, wherein the category information at least comprises a primary key field category Key and a value field category Value; obtaining a connection relationship between the character field of which the category is Key and other character fields; and on the basis of the connection relationship, obtaining a Value-class text field connected or disconnected with a Key-class text field and / or a Key-class text field as structured content corresponding to the Key-class text field, determining class information and text information of the structured content, and completing identification of the document. Meanwhile, the invention provides a corresponding system, a terminal and a storage medium. According to the method and the device, the accuracy and universality of document structured content identification are improved.
Owner:上海深杳智能科技有限公司 +1

Structured document retrieval device and program

The invention provides a structured document retrieval device and program, capable of performing structure retrieval combining both of the structure information based on an XML label and the structure information based on a comment label. The device comprises: a processor, which executes the program; a first storage region, which stores the program; a second storage region, which stores a structured document satisfying a tree structure condition and comment data added onto the document; a document structure list building part, which aims at a root element generalized structure of a DOM tree individually obtained based on including relations of the labels of the structured document and the comment data, distributes a text of the structured document, and generates a text common DOM tree; and a retrieval process part, which indexes elements according with the retrieval from the text common DOM tree.
Owner:HITACHI LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products