Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data cleaning method and device in soil big data analysis

A data cleaning and big data technology, applied in the field of data analysis, can solve the problems of outliers in the data and the reduction of the accuracy of data analysis, and achieve the effects of high accuracy, improved accuracy, and improved efficiency

Pending Publication Date: 2022-05-06
GUANGXI FORESTRY RES INST
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The patent mentions related data cleaning technical solutions, but the cleaning method still uses the existing conventional technology, and there are still abnormal values ​​in the cleaned data, which leads to a decrease in the accuracy of subsequent data analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data cleaning method and device in soil big data analysis
  • Data cleaning method and device in soil big data analysis
  • Data cleaning method and device in soil big data analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0030] Such as figure 1 As shown, the data cleaning method in soil big data analysis, the method performs the following steps:

[0031] Step 1: collect soil data, and obtain environmental data when collecting soil data; the collected soil data at least include: soil effective water content, sand content, silt content, clay content, soil bulk density and organic carbon content; Environmental data include: ambient temperature, ambient humidity and ambient light intensity;

[0032] Step 2: Perform data dispersal on the collected soil data according to categories to obtain several scattered data sets; the process of data dispersal includes: firstly classify the collected soil data according to data types according to data categories to obtain multiple classifications data, and then enlarge each classification data according to the set ratio to obtain scattered data;

[0033] Step 3: Build a decentralized data sphere based on the data structure and data volume of each dispersed d...

Embodiment 2

[0040] On the basis of the previous embodiment, the range of the ratio set in the step 2 is: 3 to 8; the value range depends on the type of classified data; when the classified data is soil effective water content, the set The value of the ratio is 3; when the classified data is sand content, the set ratio is 4; when the classified data is silt content, the set ratio is 5; when the classified data is clay content, the set The value of the ratio is 6; when the classification data is soil bulk density, the set value of the ratio is 7; the value of the organic carbon content is 8.

Embodiment 3

[0042] On the basis of the previous embodiment, the method for constructing a scatter data sphere in step 3 specifically includes: calculating the data volume of the scatter data, using the calculated data volume of the scatter data as the radius of the scatter data sphere, using a preset The data sphere construction model is constructed, and a scattered data sphere is constructed so that the scattered data is evenly distributed on the outer surface of the scattered data sphere.

[0043] Specifically, the consistency check (consistency check) is to check whether the data meets the requirements according to the reasonable value range and interrelationship of each variable, and to find data that exceeds the normal range, is logically irrational, or is contradictory. For example, a variable measured with a scale of 1-7 has a value of 0, and a negative number of weight should be considered as exceeding the normal range. Computer software such as SPSS, SAS, and Excel can automatica...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of power systems, in particular to a data cleaning method and device in soil big data analysis. The method comprises the following steps: acquiring soil data, and acquiring environment data when the soil data is acquired; performing data dispersion on the acquired soil data according to categories to obtain a plurality of dispersed data sets; on the basis of the data structure and the data size of each piece of dispersed data, constructing a dispersed data sphere; and finally, constructing a data cleaning cube, and integrating the data cleaning cube and the dispersed data spheres to obtain final cleaning data. According to the method, data cleaning is carried out by using a mode which is different from a mode that only abnormal value searching is carried out on the data in the prior art, and the normal data is marked by using a mode based on data cube construction, so that the abnormal data is corrected; and the correction model is constructed in combination with abnormity frequently caused by environmental data in the soil data, so that the accuracy of data cleaning is remarkably improved.

Description

technical field [0001] The invention belongs to the field of data analysis, and in particular relates to a data cleaning method and device in soil big data analysis. Background technique [0002] Data cleaning refers to the process of re-examining and verifying data, with the purpose of deleting duplicate information, correcting existing errors, and providing data consistency. [0003] Data cleaning can also be seen from the name to "wash out" the "dirty", which refers to the last procedure to find and correct identifiable errors in data files, including checking data consistency, dealing with invalid and missing values, etc. Because the data in the data warehouse is a collection of data oriented to a certain topic, these data are extracted from multiple business systems and contain historical data, so it is unavoidable that some data are wrong data, and some data are inconsistent with each other. Conflicts, these wrong or conflicting data are obviously unwanted, called "di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/215G01N33/24
CPCG06F16/215G01N33/246
Inventor 石媛媛邓明军唐健赵隽宇覃祚玉宋贤冲王会利潘波覃其云
Owner GUANGXI FORESTRY RES INST
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products