Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Large-scale industrial data compression and storage method and system and medium

A technology for industrial data, compressed storage, applied in file systems, file system management, electrical digital data processing, etc., can solve problems such as disk waste, reduce time and energy, ensure consistency, and reduce working time.

Active Publication Date: 2021-01-12
上海微亿智造科技有限公司 +1
View PDF13 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This results in a waste of disk

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale industrial data compression and storage method and system and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0041] Such as figure 1 According to the large-scale industrial data compression and storage method provided by the present invention, comprising:

[0042] Industrial data extraction steps: Configure different FlumeSources according to different data sources, and operate the FlumeSource configuration through the interface to realize configurable general settings;

[0043] Temporary preloading of data into Avro steps: define the conversion chain, configure the Schame of the Avro data format, and configure Morphline to temporarily convert different types of data formats into Avro format data;

[0044] Create a Dataset step: Create a dataset with parquet as the storage format in Hdfs through Dataset and compress the data with the GPL protocol, and declare that the final landing data is in Parquet format and snappy compression format;

[0045] Combined operation steps: connect and run the above steps through flume configuration, and finally the data is compressed from a large amo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a large-scale industrial data compression and storage method and system and a medium, and the method comprises the steps: 1, configuring different data collection systems according to the types of data sources, and extracting the data collected by the data collection systems through interface operation; 2, defining a conversion chain, and temporarily converting the formats of the extracted different types of data into an Avro format through a data cleaning plug-in; and 3, compressing the data in the Avro format by using a GPL protocol, the compression format being snappy, creating a data set in which parquet is used as a storage format in the distributed file system, and storing the compressed data. Conversion chains and compression and storage formats can be definedfor any type of data, and the data processing speed and the data compression ratio of the computing platform are greatly increased.

Description

technical field [0001] The present invention relates to the technical field of data compression storage, in particular to a large-scale industrial data compression storage method, system and medium. Background technique [0002] With the vigorous development of new infrastructure, more and more traditional industrial enterprises have begun to use Internet technology to improve productivity, among which data is the most critical. In the traditional Internet, there are more and more data in big data processing, and many companies will back up two copies of data. This results in a waste of disk. [0003] Patent document CN108304472A (application number: 201711455790.2) discloses a data compression storage method and a data compression storage device. The data compression method includes the following steps: a segmentation step, which divides the original data into multiple fields; and a compression step, based on Depending on the data content, different compression strategies...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/11G06F16/16G06F16/174G06F16/182
CPCG06F16/116G06F16/16G06F16/1744G06F16/182
Inventor 高响
Owner 上海微亿智造科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products