The invention relates to an intelligent word segmentation method based on a
hidden Markov model. The method comprises the following steps of (1) building a parameter
Lambda<0>=(N, M, L, Pi, A, B<1>, B<2>) of the
hidden Markov model; (2) determining a state set Theta in an article; (3) abbreviating
Lambda<0>=(N, M, L, Pi, A, B<1>, B<2>) as
Lambda=(Pi, A, B<1>, B<2>) after determining N, M and L; (4) carrying out word segmentation on a large amount of articles by a mechanical word segmentation method through applying computer languages, and then marking the states of the articles by a computer to further form an initial Pi matrix, an A matrix, a B<1> matrix and a B<2> matrix; (5) carrying out article training on the formed initial A matrix, the B<1> matrix and the B<2> matrix by using a BW
algorithm, and revaluating according to a BW
algorithm revaluation formula to obtain a new Pi matrix, a new A matrix, a new B<1> matrix and a new B<2> matrix; and (6) carrying out
Chinese word segmentation by using a
viterbi algorithm according to a new parameter of the
hidden Markov model (please see the abstract), dividing the article into a plurality of sentences according to
punctuation symbols, and carrying out
Chinese word segmentation on each
sentence, thereby obtaining the article after word segmentation. By the intelligent word segmentation method, accurate and high-efficiency word segmentation can be carried out on a large amount of Chinese texts.