Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data association method and system for open data set

A technology of open data and data association, which is applied in the field of data association methods and systems of open datasets, can solve the problems that the value of open data cannot be fully exploited, open datasets are difficult for data users to understand and utilize, and lack of semantic association of dataset data description etc.

Pending Publication Date: 2021-06-08
SOUTH CHINA NORMAL UNIVERSITY
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the fact that the data of open datasets come from different levels of government departments and business systems, the description vocabulary of open datasets is different, and the lack of semantic association description of data in datasets makes it difficult for data users to understand and utilize open datasets. , unable to fully exploit the value of open data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data association method and system for open data set
  • Data association method and system for open data set
  • Data association method and system for open data set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0071] like figure 1 As shown, this embodiment provides a data association method for an open dataset, including the following steps:

[0072] S1. Perform data preprocessing on open datasets, and convert datasets in different file formats into json file formats;

[0073] S2. Analyze the open data set after the preprocessing is completed, and obtain the characteristic data of the open data set. The characteristic data of the open data set is specifically a description of metadata of the data set and a description of metadata of the data.

[0074] S3. Use machine learning technology to analyze the metadata description of the dataset to obtain the theme of the open dataset;

[0075] More specifically, step S3 includes the following steps:

[0076] S31. Using a tokenizer to segment the metadata description of the dataset to obtain a word segmentation result;

[0077] S32. Calculate the tf-idf feature vector described by the metadata of the data set according to the word segment...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data association method and system for an open data set, which are applied to data association analysis of the open data set of a government data open platform. The method comprises the steps of performing data preprocessing on an open data set; analyzing the preprocessed open data set to obtain feature data of the open data set; performing association analysis according to the feature data; establishing data association description according to the result; and storing the obtained data association description into a database. Compared with a traditional data association technology, the method has the advantages that fine-grained data association can be established, and the relationship between the data can be better mined. Besides, on the basis that an original open data set issuing process is not affected, the association description in the data set of a development data set is automatically established, and human input and human errors can be reduced.

Description

technical field [0001] The invention belongs to the technical field of open data and data association, and in particular relates to a data association method and system of an open data set. Background technique [0002] Government open data means that the government will release a large amount of data accumulated by the government in various fields to the public in the form of machine-readable data sets without involving privacy, security and copyright. Anyone can freely and freely Acquire and use datasets. According to the "2019 China Local Government Data Opening Report" released by Fudan University and the Digital China Research Institute of the National Information Center, as of the first half of 2019, 82 provincial, sub-provincial and prefectural governments in my country have launched open data platforms. The total number of national open datasets has grown rapidly from 8,398 in 2017 to 62,801 in 2019, an increase of nearly seven times. However, due to the fact that ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/31G06F16/35G06F16/33G06N20/00G06F40/289G06F40/295
CPCG06F16/313G06F16/353G06F16/3347G06N20/00G06F40/289G06F40/295
Inventor 范冰冰郭光雄
Owner SOUTH CHINA NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products