The invention belongs to the field of biological information, and discloses a
cancer classification and characteristic
gene selection method, which comprises the following steps of: establishment of a primary learner: establishing T
logistic regression models and a spark
group lasso regularized
loss function solving model corresponding to the T
logistic regression models, and outputting a secondary learner
training set; establishing a secondary learner: establishing a multi-response regression model and a
loss function solving model corresponding to L1 regularization, and outputting a
training set prediction result; and a prognosis
feature selection model: establishing a prognosis
feature selection SGL model. According to the
cancer classification and feature
gene selection method, the three standards of prediction, stabilization and selection are met, the accuracy and stability of the model on
cancer classification prediction are improved through stacking integration, oncogenes and cancer-related genes are accurately selected, and the
interpretability of the model is enhanced;
gene and
gene pathway priori knowledge are fused, and the accuracy of
cancer classification and the effectiveness of
feature selection are improved.