The invention discloses a cluster-based text duplicate checking method. The method includes the steps: 1, for data acquisition and processing, storing text data in a database and a file server; 2, for preprocessing, subjecting the text data to word segmentation and feature vector extraction; 3, clustering the text data preprocessed in the database, and calculating center feature vectors of all class clusters; 4, for primary duplicate checking processing, extracting feature vectors of the text data, comparing the feature vectors with the center vectors of the class clusters in the database, and recording the class clusters of the center feature vectors with the distance smaller than a set threshold; 5, for secondary duplicate checking processing, comparing the feature vectors of the text data with the feature vectors of the text data in the corresponding class clusters, and recording the corresponding text data of the feature vectors with the distance smaller than a certain threshold as duplicated text data, so as to realize text data duplicate checking. By the method, unnecessary duplicated comparative work can be reduced, and text duplicate checking efficiency is improved.