The invention discloses a method for constructing a prediction and evaluation model of
time series type
surface water quality
big data, which firstly clears the numerical value obviously contrary to common sense, then finds out the time point nearest to the Markov distance according to all the data at the time point where the vacancy value exists, and uses the data at the time point to fill the vacancy value. Then the outliers in the
water quality data are detected by using the improved KMeans + + clustering
algorithm and Z-fraction detection
algorithm, and the outliers are filled by support vector regression. Then stochastic forest
algorithm is used to extract the important characteristics of
water quality indicators, and the indicators with high importance are selected to evaluate the overall state of
water quality. Then the LSTM model is used to predict the
time series of the whole state of water quality. Finally, the MapReduce program of Hadoop is used to realize the parallel execution of the program, which improves the execution efficiency of each algorithm, completes the final prediction and evaluation model construction, and improves the efficiency, integrity and accuracy ofwater quality
big data analysis.