The invention relates to a method for predicting a
protein compound on the basis of sample data. The method comprises the following steps that: (1) on the basis of the sample data, constructing a weighted PPI (
Protein-
Protein Interaction) network, and carrying out denoising
processing on the weighted PPI networok, wherein the denoising
processing is that the
semantic similarity of
gene ontology istaken as the weight of the PPI network for carrying out weighting; (2) on the basis of the weighted PPI network subjected to the denoising
processing to construct a dynamic weighted PPI network; and(3) utilizing a
hybrid clustering
algorithm to predict the
protein compound. By use of the method, through the introduction of a GO
semantic similarity, the
noise of PPI data can be effectively lowered, and in addition, the method has a biological meaning. Meanwhile, real
protein activities are reflected, and the dynamic nature of the PPI network is embodied. In addition, the defect of overfittingor underfitting in a traditional clustering
algorithm is eliminated, the accuracy of the clustering
algorithm is improved, and therefore, the accuracy of a protein compound prediction result is effectively improved.