The invention discloses a high-risk
pollution source classification forecasting method based on
principal component analysis and
random forest. The method includes the steps of collecting and integrating environmental
pollution source behavior data of enterprises into primary selection indexes, and screening out illegal
pollution source behavior indexes influencing pollution sources to serve as a high-risk pollution source
index system; conducting data cleaning and data normalization
processing on the environmental pollution source behavior data; finding out a function relationship indicating whether or not the high-risk pollution source
index system and the pollution sources are illegal, and building a
random forest model; conducting model training and evaluating the precision of the
random forest model after training is finished; sorting importance degrees of the pollution source behavior indexes; conducting the
principal component analysis to obtain principal components, utilizing the principal components to conduct weighting and work out comprehensive scores; according to the comprehensive scores, judging the risk
score coefficient of each enterprise, automatically
ranking the risk core coefficients and generating a TOP enterprise
list, wherein the risk
score coefficients indicate the
occurrence probability of illegal behaviors of the corresponding enterprises. The high-risk pollution source classification forecasting method based on the
principal component analysis and the random forest can reduce complexity of operations and improve forecasting precision and the
quality of results.