The invention provides a multi-subject extracting method based on semantic categories. The multi-subject extracting method based on the semantic categories comprises the following steps that firstly, a document is preprocessed according to a traditional method and a vector composed of feature words is obtained preliminarily; secondly, synonyms are merged by the utilization of the corresponding relation between word meanings and concepts of 'HowNet', polysemic word disambiguation is carried out according to the correlation between the semantic categories and the context, and a
concept vector model is constructed to represent the document; then the
concept vector model is converted to be a semantic category model according to the one-to-one corresponding relation between the concepts and the semantic categories; the concept similarity is calculated by the utilization of the related
semantic information in the concepts in 'HowNet' and then the
semantic similarity is obtained; the semantic categories are clustered by improving the K-means
algorithm according to the method of presetting seeds, and a plurality of subject semantic category clusters are formed; finally, a plurality of sub-subject word sets are obtained in a
reverse mode according to the corresponding relations between the semantic categories and the concepts and between the concepts and words. The method considers the
semantic information, overcomes the defect that the sensibility to the initial center by the K-means
algorithm and time-and-space cost are not stable, and improves the quality of extracted subjects.