Data processing method and device based on big data
A data processing device and data processing technology, applied in the field of data processing, can solve the problems of unreasonable resource allocation, large number of data blocks, occupation, etc.
Inactive Publication Date: 2019-10-11
CHINA UNITECHS
View PDF5 Cites 10 Cited by
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
[0003] In related technologies, the storage efficiency of data storage in files is low, and a file with a small amount of data also occupies a separate HDFS data block. The resource allocation is unreasonable, which makes the entire HDFS cluster have too many data blocks, which affects the Computational Efficiency of
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View moreImage
Smart Image Click on the blue labels to locate them in the text.
Smart ImageViewing Examples
Examples
Experimental program
Comparison scheme
Effect test
Login to View More
PUM
Login to View More
Abstract
The embodiment of the invention provides a data processing method and device based on big data, and the method comprises the steps: carrying out the data cleaning of an original target file under a target directory, and obtaining at least one first target file; according to the file size of each first target file, dividing all the first target files into a plurality of combinations, wherein the sum of the file sizes of all the first target files in each combination does not exceed a first preset reference value, and the number of the combinations is the minimum value of all the possible numbers; performing file merging on all the first target files in each permutation and combination to obtain a second target file, the second target file being correspondingly provided with a data block ofa distributed file system. According to the method and the device, the storage efficiency of mass data is higher, the resource allocation is more reasonable, and the subsequent calculation performanceis better.
Description
technical field [0001] This application relates to the field of data processing, in particular to a data processing method and device based on big data. Background technique [0002] Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution, and then can make full use of the integrity of the cluster for high-speed computing and storage. The framework of Hadoop The core design is: HDFS (Hadoop Distributed File System, distributed file system) and MapReduce. HDFS provides storage for massive data, while MapReduce provides calculation for massive data. [0003] In related technologies, the storage efficiency of data storage in files is low, and a file with a small amount of data also occupies a separate HDFS data block. The resource allocation is unreasonable, which makes the entire HDFS cluster have too many data blocks, which affects the Operational effi...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More Application Information
Patent Timeline
Login to View More
IPC IPC(8): G06F16/16G06F16/17G06F16/11G06F16/182
CPCG06F16/119G06F16/162G06F16/1737G06F16/182
Inventor 周朝卫刘垒
Owner CHINA UNITECHS
Who we serve
- R&D Engineer
- R&D Manager
- IP Professional
Why Patsnap Eureka
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com