Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

247 results about "Semi-structured data" patented technology

Semi-structured data is a form of structured data that does not obey the formal structure of data models associated with relational databases or other forms of data tables, but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. Therefore, it is also known as self-describing structure.

System and method for creating dynamic folder hierarchies

A dynamic foldering system automatically manages the creation and deletion of run-time dynamic folders and selection of documents, items, or object graphs found in the run-time dynamic folders. The system comprises a query / predicate for a design-time folder group based on which dynamic folders are automatically managed, a variable binding mechanism, automatic identifications of dynamic folders containing a particular object, parallel navigation, and customized combination of foldering results. The query / predicate for a design-time folder group can be a parameterized query. The dynamic folder hierarchy is defined by a hierarchy of design-time folder groups defined by a query / predicate on a document comprising structured or semi-structured data. Modifiable criteria for creating the dynamic folder hierarchy are provided by the user. Variable binding provides powerful query / predicate definitions on hierarchical data or graph-structured data. Given a document, the system automatically identifies which dynamic folders contain the document. Parallel navigation allows a user to navigate along additional paths in a hierarchy and combine the navigation results using set operations.
Owner:SERVICENOW INC

Specifying a Parser Using a Properties File

A system for generating a parser and using the parser to parse a target file includes a target file description, an output format description, a Parser generator, a Parser, a target file, and a result object. The target file description and the output format description are included in one or more “properties files”, which are text files that include one or more name / value pairs (“properties”). The target file description and the output format description are input into the Parser generator, which outputs the Parser. The target file is input into the Parser, which outputs the result object. The target file description specifies one or more parsers and / or tokenizers that can be used to parse the target file. The parsers and / or tokenizers specified by the target file description are part of the generated Parser. These parsers and / or tokenizers make the Parser more flexible, which enables the Parser to parse semi-structured data.
Owner:HEWLETT-PACKARD ENTERPRISE DEV LP

Structured data translation apparatus, system and method

An apparatus is provided that includes a processor and a memory. The processor is configured to determine a category for a group of isolated noun phrases in a structured or semi-structured data source stored in the memory. The group of isolated noun phrases includes one or more isolated noun phrases. The processor is also configured to translate the group of isolated noun phrases from a source language to a target language using a category-driven isolated noun phrase translation. The determination of the category and the category-driven isolated noun phrase translation are performed based on context derived from the group of isolated noun phrases.
Owner:NTREPID LLC

LDAP-based distributed cache technology for XML

The design, internal data representation and query model of the invention, a hierarchical distributed caching system for semi-structured documents based on LDAP technology is presented that brings both, the semi-structured data model and the LDAP data model together into a system that provides the ideal characteristics for the efficient processing of XPath queries over XML documents. Transformation algorithms and experimental results have also been shown that prove the feasibility of the invention as a distributed caching system especially tailored for semi-structured data.
Owner:MARRON PEDRO JOSE +1

Fast processing of an XML data stream

To answer one or more queries of semistructured data, an answer automaton is constructed, based at least in part on the queries and on a schema of the data. The answer automaton is applied to the data to answer the queries. Preferably, to construct the answer automaton, a schema automaton is constructed for the schema, a query automaton is constructed for the queries, and the schema automaton and the query automaton are merged. If there are more than one query, separate query automata are constructed for the different queries and then are united to provide a joint query automaton. Preferably, all the automata are deterministic finite automata. Most preferably, all the automata are isostate automata.
Owner:RAMOT AT TEL AVIV UNIV LTD

System and method for processing semi-structured business data using selected template designs

A method for processing semi-structured data. The method includes receiving semi-structured data into a first format from a real business process. Preferably, the semi-structured data are machine generated. The method includes tokenizing the semi-structured data into a second format and storing the semi-structured data in the second format into one or more memories and clustering the tokenized data to form a plurality of clusters. The method also includes identifying a selected low frequency term in each of the clusters, and processing at least two of the clusters and the associated selected low frequency terms to form a single template for the at least two of the clusters. In a preferred embodiment, the method replaces the selected low frequency term with a wild card character.
Owner:OPENSPAN

Direct loading of semistructured data

Techniques and systems are disclosed for directly storing semistructured data in a database. According to one aspect, a client application reads data that comprises instances of a parent type. The client application invokes routines associated with the parent type. An array is created for storing instances of the parent type. These routines invoke routines associated with a child type of the parent type. An array is created for storing instances of the child type. The arrays are populated with values specified in the data. According to one aspect, some columns of the arrays may be populated with other values to be stored in hidden columns of database tables. The client application converts the arrays into a data stream that conforms to the format of the database's data blocks. The client application then streams the data to a database server, which writes the data blocks directly into one or more data blocks in the database.
Owner:ORACLE INT CORP

Indexing, rewriting and efficient querying of relations referencing semistructured data

The invention discloses methods and apparatus that facilitate efficient querying of tables referencing semistructured data such as digraphs and other domains with complex grouping structure. The invention methods enable meaningful indexing of the tables as well as rewriting of queries with respect to the structures. Dynamic schema extraction using proper coloring algorithms is disclosed that structures the semistructured data in such a way that complex set operations and grouping are replaced with traditional relational joins. This enables a relational database system to harness its entire query optimizing capability when querying tables referencing semistructured data.
Owner:DECODE GENETICS EHF

Indexing and querying semi-structured data

Generating an inverted index is disclosed. Semi-structured data from a plurality of sources is parsed to extract structure from at least a portion of the semi-structured data. The inverted index is generated using the extracted structure. The inverted index includes a location identifier and a data type identifier for one or more entries of the inverted index.
Owner:VMWARE INC

Scalable Analysis Platform For Semi-Structured Data

A method of operating a data analysis system includes retrieving objects from a data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. The method further includes dynamically creating a cumulative schema by, for each object of the retrieved objects: (i) inferring a schema from the object based on the metadata of the object and inferred data types of elements of the data of the object, (ii) creating a unified schema, wherein the unified schema describes both (a) the object described by the inferred schema and (b) a cumulative set of objects described by the cumulative schema, and (iii) storing the unified schema as the cumulative schema. The method further includes exporting the data of each of the retrieved objects to a data warehouse.
Owner:AMAZON TECH INC

Data mining method based on extraction of Web numerical value tables

The invention discloses a data mining method based on extraction of Web numerical value tables. The method is based on domain knowledge base, adopts generation of numerical value knowledge element base as a basic target, and mainly comprises construction of domain knowledge base basic set, positioning of Web numerical value tables, recognition of table structure, integration of table content, semantic representation of extract result, data retrieval, automatic learning of domain knowledge and data mining processing and the like. The invention is based on the method of extraction of Web numerical value tables in specific domain, can carry out extraction to data, information and knowledge included in numerical value tables in Web pages, converts semi-structured data into structured data and provides services of data retrieval, data mining analysis and the like. The data mining method can completely and accurately extract valuable numerical value knowledge in large amount of Web numerical tables dispersed on Web and meets the requirements of data query and data analysis of a user.
Owner:TONGFANG KNOWLEDGE NETWORK TECH CO LTD (BEIJING)

Scalable analysis platform for semi-structured data

A data transformation system includes a schema inference module and an export module. The schema inference module is configured to dynamically create a cumulative schema for objects retrieved from a first data source. Each of the retrieved objects includes (i) data and (ii) metadata describing the data. Dynamically creating the cumulative schema includes, for each object of the retrieved objects, (i) inferring a schema from the object and (ii) selectively updating the cumulative schema to describe the object according to the inferred schema. The export module is configured to output the data of the retrieved objects to a data destination system according to the cumulative schema.
Owner:AMAZON TECH INC

Knowledge graph establishment method and system

The invention provides a knowledge graph establishment method and system. The knowledge graph establishment method comprises the steps that a basic architecture of a knowledge graph is established according to general data standards; the relations of entities in the basic architecture are normalized in a unified mode to obtain a standard dictionary table having standard specifications; semi-structural data associated with contents in the knowledge graph is obtained; entity information of key entities is extracted from the semi-structural data; data fusion is conducted on the entity information according to the standard dictionary table to form structural data; the structural data form corresponding data structure pairs, and the data structure pairs are stored as the knowledge graph. The data is obtained and data fusion is completed by establishing the basic architecture of the knowledge graph and utilizing multiple network channels, the function converting the semi-structural data into the structural data is achieved, and a foundation is laid for further development of artificial intelligence technologies on the basis.
Owner:法玛门多(常州)生物科技有限公司

Core object-oriented type system for semi-structured data

A type system employing structural subtyping is disclosed herein. A core type system supports several structural types, such as stream, choice, intersection and sequence. Also part of the core type system is a new invariant type, which denotes values whose dynamic type is the same as its static type, and type restrictions for limiting a range of a base type. Furthermore, a streamlined structural version of delegates, called structural delegates and a validation method thereof are introduce into the type system. To further facilitate type safety, strict statically checked interface casts are introduced.
Owner:MICROSOFT TECH LICENSING LLC

System and method to search and generate reports from semi-structured data

Embodiments of the present invention provide a system and method for searching and reporting on semistructured data that can include dynamic metadata. One embodiment can comprise providing a user interface to a user based on an object type definition for an object type that allows the user to specify search criteria associated with a set of metadata, mapping the user search criteria to a query that comprises at least one structured query constraint and at least one unstructured query constraint, processing the query to search a set of data objects containing semistructured data associated with the object type according to the query and returning a set of results to the user. The search results can be returned to a user based on user-specified reporting parameters. Additionally, the reporting definition can be saved as an object for future execution.
Owner:OPEN TEXT SA ULC

Method and device for establishing NoSQL database index for semi-structured data

Semi-structured source data is preprocessed to obtain text partitions to be stored into a data table with a first combined primary key including a structure thread primary key and a sequence value primary key. The structure thread primary key identifies a structure thread that is segmented into several consecutive intervals according to a determined or predetermined sequence. An inverted index table, created for the preprocessed text partitions, includes a second combined primary key including the structure thread primary key and a keyword primary key. Corresponding to values of the primary keys in the second combined primary key, related text partition sequence IDs are recorded as index values of the inverted index table. Index values having a same keyword primary key value but different structure thread primary key values are located in different rows in the inverted index table. The present techniques improve query efficiency of database index and facilitate updating.
Owner:ALIBABA GRP HLDG LTD

Web data extraction method based on visual customization of extraction template

The invention discloses a Web data extraction method based on visual customization of an extraction template. The Web data extraction method comprises the following steps: A. pretreatment of template pages: converting and showing source codes of the template pages; B. visual customization of the extraction template: providing a drag selection function on a user interface, setting the corresponding relationship between attribute tags and data values on the template pages and attributes in a domain model by a user, and establishing the extraction template; C. setting of mass extraction frequency of the pages: extracting the crawled HTML (Hypertext Markup Language) pages in large quantity once every 8 hours; and D. mass extraction of the pages: extracting the crawled HTML pages in large quantity by the corresponding extraction template, converting semi-structured data into structured data and then storing the structured data in a local database.
Owner:SHANDONG UNIV

Storage and management of semi-structured data

Data having a desirable and machine readable structure, but which is not known in advance may be thought of as semi-structured data. Semi-structured data may be represented in Resource Document Framwork (RDF) format, and such documents may be parsed to form a table of triples. Relatively small amounts of data give rise to substantial number of triples, meaning that a triple store for relatively small amounts of data will have relatively large number of rows. A management programme for a triple store monitors the number of occasions on which a given query is executed, and if the frequency of the query exceeds a given threshold, then the triples forming the result set of the query are migrated to an auxiliary triple store, thus reducing the number of rows searchable as a result of execution of the given query.
Owner:HEWLETT PACKARD DEV CO LP

Creation and enrichment of search based taxonomy for finding information from semistructured data

Techniques are provided for creating and updating a entity hierarchy (taxonomy) based on information captured about user interaction with a system. Techniques are also provided for using the taxonomy to determine the nature of entities represented by terms submitted to a search engine. Search logs analyzed for related sets of entities, and used to improve the taxonomy for storing information. Once the taxonomy is created, information across data sources are fetched and aggregated based on the taxonomy. When the system is queried, the query is modified to a predefined template, and the best fit result is promptly returned. A feedback mechanism is also provided to enhance taxonomy and entity data based on search volumes. This system enables search engines to provide accurate answers when entities, their attributes and relationships are involved.
Owner:R2 SOLUTIONS

Big data storage method and device

The invention provides a big data storage method and device. The method includes the steps of receiving object data, recognizing attribute information of the object data and storing the object data in a first storage sub system in a storage system according to the attribute information of the object data. According to the big data storage method and device, structural data, semi-structural data and non-structural data are unified to be stored in a database platform and an Hadoop platform as objects, performance advantages of a relational database, fault tolerance of the Hadoop platform and an MapReduce framework and support for dynamic data models are effectively made use of, and data modes of the objects and corresponding attribute information are stored in metadata so that the data can be conveniently sent to a proper executing engine to complete an inquiry when data analysis is carried out. Therefore, unified management of large quantities of the structural data and the non-structural data is achieved, management cost is reduced, flexibility and usability of data processing are facilitated, and learning cost of a user who uses the big data storage device is reduced.
Owner:SUGON INFORMATION IND

Index to a semi-structured database

A method and apparatus for generating an index entry for a record in a semi-structured database involves analysing each field to identify an entry within each field and to identify a sequence of characters having a format corresponding to a predetermined format. Thereafter, the method and apparatus operate to generate an index entry for the identified entry, and for at least one field, define any characters not identified as an entry as a free text entry.
Owner:BRITISH TELECOMM PLC

Basic service system for medical large data application

The invention relates to a basic service system for medical large data application, which is characterized in that the system comprises a large data cloud ETL subsystem, a large data basic kernel subsystem, a medical large data analysis mining subsystem and a system running management monitoring subsystem, wherein the large data cloud ETL subsystem is responsible for extracting, cleaning and converting and loading related datum of a hospital information system; the large data basic kernel subsystem takes the advantage of technologies such as Storm stream calculation, MapReduce batch calculation, Spark, medical large data metadata and HBASE semi-structured data storage used in large data basic capacity to realize storage, calculation and analysis to medical large datum; the medical large data analysis is used to analyze the datum stored in a medical large data basic kernel platform by utilizing data exploration, mining modeling and model evaluation; and the management monitoring subsystem is responsible for providing support, monitoring, configuration and safety service for the whole medical large data application platform.
Owner:HUNAN INTERACTIVE MEDIA

Processing method and system for tree structured data

The invention provides a processing method and system for tree structured data (STEED) and relates to the technical field of data processing. The system supports reading of text data and analyzes the text data into row or column type binary data, wherein in the analysis process, a grammar tree is dynamically generated and definitions of semi-structured data are stored; the row or column type binary data is stored, wherein the row or column type binary data is mutually converted and the binary data is directly output as JSON data in a text format; and based on the binary data, the semi-structured data is subjected to query operation.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Database security structure

A database security structure that may be used for semistructured databases assigns each node 4 of a database to a collection. For each collection, create rights, retrieve rights, associate rights and dissassociate rights are provided to one or more users, the rights being assigned in common for all nodes of the collection. Users can only carry out the task if they have the appropriate rights. In this way, a flexible database security structure is provided that can deliver appropriate security for different elements of a semistructured database.
Owner:HEWLETT-PACKARD ENTERPRISE DEV LP

A method and system for constructing a health knowledge graph

The invention relates to a method for constructing a health knowledge graph. The method comprises the following steps of directly extracting entities of users, symptoms, diseases, experts, treatment schemes and commodities belonging to generalized representations in structured and semi-structured data from a network data source by utilizing an html label and a regular expression; extracting entities belonging to the six summarized representations from the unstructured data by using a conditional random field algorithm; using Bi-pairs of entities extracted in the same context The LSTM algorithmcarries out relation classification and determines a relation between entities; calculating the correlation between the entity names and the entity descriptions and achieving the disambiguation of the entity information; and complementing the knowledge graph relation by using an owl reasoning function of a jena tool, capturing ambiguous triplet by using a criterion, and feeding back the triplet which is judged to be possibly wrong to a domain expert for verification. The method has the beneficial effects that the health knowledge graph of the traditional Chinese medicine theory is constructed, the incomplete relation is automatically complemented by applying the knowledge reasoning technology, and the more perfect health graph is constructed.
Owner:JILIN UNIV

Multi-source heterogeneous network security knowledge graph construction method and device

The invention discloses a multi-source heterogeneous network security knowledge graph construction method and device, and the method comprises the steps: responding to a triggering request for constructing a network security knowledge graph, and extracting matched entities and entity relationships from the semi-structured data set and the structured data set according to the entities and entity relationships defined by a preset network security knowledge ontology to generate a triple; identifying entities matched with the entities defined by the network security knowledge ontology from the unstructured data set according to different categories of the entities and a preset identification mode, wherein data in the unstructured data set is text data; inputting the text data into a word vector recognition model to obtain a word vector of each entity; extracting a model according to the entity pairs selected by a preset rule and a corresponding word vector input relationship to obtain relationships among the entity pairs, and generating a triple fusing the word vectors of the entities according to the entity pairs, the corresponding word vectors and the relationships among the entity pairs; and constructing a network security knowledge graph according to each triad.
Owner:NSFOCUS INFORMATION TECHNOLOGY CO LTD +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products