Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Three-way decision-making imbalance data oversampling method based on spark big data platform

A big data platform, over-sampling technology, applied in instruments, computing, character and pattern recognition, etc., can solve problems such as reducing efficiency, and achieve the effect of solving classification problems, reducing learning time, and improving performance

Active Publication Date: 2019-07-19
CHONGQING UNIV OF POSTS & TELECOMM
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Another feature of RDD is that it is elastic. When the memory of the machine overflows during the operation of the job, the RDD will interact with the hard disk data. Although it will reduce efficiency, it can ensure the normal operation of the job.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Three-way decision-making imbalance data oversampling method based on spark big data platform
  • Three-way decision-making imbalance data oversampling method based on spark big data platform
  • Three-way decision-making imbalance data oversampling method based on spark big data platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The technical solutions in the embodiments of the present invention will be described clearly and in detail below with reference to the drawings in the embodiments of the present invention. The described embodiments are only some of the embodiments of the invention.

[0036] The technical scheme that the present invention solves the problems of the technologies described above is:

[0037] Using the three-branch decision imbalance data oversampling method based on the Spark big data platform includes the following steps:

[0038] Obtain the sample set that needs to be sampled from the system, and HDFS automatically performs distributed storage, and then uses Spark to perform data transformation on the entire sample to obtain a normalized sample set in LabeledPoint format . Specific steps: first create a SparkContext object, and then use its textFile(URL) function to create a distributed dataset RDD. Once created, this distributed dataset can be operated in parallel; se...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention claims to protect a three-way decision-making unbalanced data oversampling method based on the Spark big data platform, and relates to the data mining neighborhood and the Spark big data technology. First use Spark's RDD for data transformation to get the normalized LabeledPoint format <lable:[features]>The sample set is divided into training set and test set; secondly, the RDD of Spark is used for data change, the distance between samples is determined, and the radius of the field is determined. According to the neighborhood three-way decision model, the samples in the entire training set are divided into positive domain samples, and the boundary domain samples and negative domain samples; then respectively oversample the boundary domain samples and negative domain samples; finally invoke the Spark Mllib machine learning algorithm to verify the sampling effect. The invention effectively solves the classification problem of large-scale unbalanced data sets in the fields of machine learning and pattern recognition.< / lable:[features]>

Description

technical field [0001] The invention belongs to the fields of data mining, pattern recognition and big data processing, and specifically relates to a three-way decision-making unbalanced data oversampling method based on a Spark big data platform. Background technique [0002] In recent years, mobile phones have already become our daily necessities, and their replacements are quite frequent. It seems that it is becoming more and more common for users to replace their mobile phones. On the one hand, the faster users change their mobile phones, the greater the market value and the higher the manufacturer's income. Therefore, manufacturers need to do everything possible to design new products to stimulate users to replace their mobile phones. On the other hand, major operators are successively using data mining technology to improve marketing efficiency. In actual work, the analysis of customer terminal preferences in the current communication industry is simply based on busine...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
CPCG06F18/217G06F18/24133G06F18/214
Inventor 胡峰王蕾欧阳卫华于洪王进雷大江李智星瞿原赵蕊张其龙
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products