The invention relates to the technical field of
natural language processing, and especially relates to a multi-factor fused textrank
keyword extraction algorithm. Influence factors of the
keyword extraction algorithm TextRank include five factors including word coverage, word position, word frequency,
word length, word span and the like. 1, global factors are greater than local factors in a
keyword extraction process; 2, the word coverage, the
word length, the word frequency, the word span and the word position influence weight are gradually increased; 3, the influence weights of the word coverage and the
word length are basically equivalent, the word span and the word frequency influence weight are basically equivalent when the keyword of the text is extracted by using the TextRank
algorithm, only two factors of word positions and word spans can be considered; wherein the ratio of the two factors is 7: 3; 3, because the text needs to be traversed again on the basis of establishing a
word graph when the word span is calculated, a certain
running time needs to be consumed, if the requirement on the running speed of the algorithm is strict, the word span can be replaced by the word frequency, and the extraction effect is slightly influenced, but is also good.