Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Intelligent word segmentation method based on hidden Markov model

A word segmentation method, Hidden Markov technology, applied in character and pattern recognition, special data processing applications, instruments, etc., can solve problems such as lack of language environment analysis

Active Publication Date: 2016-03-02
GANSU ZHICHENG NETWORK TECH CO LTD
View PDF4 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] These two documents introduce a hidden Markov Chinese word segmentation model based on word tagging. This model inherits the advantages of the word tagging model. It can balance the recognition of vocabulary words and unregistered words, but it lacks the recognition of the language environment. Analysis

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent word segmentation method based on hidden Markov model
  • Intelligent word segmentation method based on hidden Markov model
  • Intelligent word segmentation method based on hidden Markov model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] A kind of intelligent word segmentation method based on Hidden Markov Model, comprises the following steps:

[0052] ⑴Establish hidden Markov model parameters ,

[0053] in

[0054] N is the number of states of the Markov chain in the model; record n ​​states as θ 1 ,..., θ n , remember the state of the Markov chain at time t is ,and ( ,..., );

[0055] M The number of observations of a possible single Chinese character corresponding to each state; record m observations as V 1 ,...,V M , record the observed value observed at time t ,in, (V 1 ,...,V M );

[0056] L The number of observations of possible multiple Chinese characters corresponding to each state; record l extended observations ,..., , record the observed value observed at time t ,in ( ,..., );

[0057] π Indicates the probability of choosing a certain state at the beginning of the sequence, π = (π 1 ,...,π n ), where , 1≤ ? ≤ N ;

[0058] A Represents th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an intelligent word segmentation method based on a hidden Markov model. The method comprises the following steps of (1) building a parameter Lambda<0>=(N, M, L, Pi, A, B<1>, B<2>) of the hidden Markov model; (2) determining a state set Theta in an article; (3) abbreviating Lambda<0>=(N, M, L, Pi, A, B<1>, B<2>) as Lambda=(Pi, A, B<1>, B<2>) after determining N, M and L; (4) carrying out word segmentation on a large amount of articles by a mechanical word segmentation method through applying computer languages, and then marking the states of the articles by a computer to further form an initial Pi matrix, an A matrix, a B<1> matrix and a B<2> matrix; (5) carrying out article training on the formed initial A matrix, the B<1> matrix and the B<2> matrix by using a BW algorithm, and revaluating according to a BW algorithm revaluation formula to obtain a new Pi matrix, a new A matrix, a new B<1> matrix and a new B<2> matrix; and (6) carrying out Chinese word segmentation by using a viterbi algorithm according to a new parameter of the hidden Markov model (please see the abstract), dividing the article into a plurality of sentences according to punctuation symbols, and carrying out Chinese word segmentation on each sentence, thereby obtaining the article after word segmentation. By the intelligent word segmentation method, accurate and high-efficiency word segmentation can be carried out on a large amount of Chinese texts.

Description

technical field [0001] The invention relates to a Chinese word segmentation method, in particular to an intelligent word segmentation method based on a hidden Markov model. Background technique [0002] With the development of Internet technology, people have higher and higher requirements for computers to process text. Among them, the software needs to have the functions of inputting, displaying, editing, and outputting articles, and the basis for realizing these functions is the recognition of words in the text; however, unlike English, Chinese words have no natural boundaries, so if you want to improve The ability of Chinese software to process text must do a good job in Chinese word segmentation. [0003] At present, the main methods used for Chinese word segmentation include mechanical word segmentation, comprehension and statistics. The mechanical word segmentation method is based on the existing strings in the dictionary, but its word segmentation requires a large a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06K9/62
CPCG06F40/284G06F18/295
Inventor 邓剑波马润宇刘毓智
Owner GANSU ZHICHENG NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products