The invention discloses a semi-automatic word segmentation corpus labeling and training device, which aims to overcome the defects of the corpora used during the word segmentation corpus labeling and training process. The device of the invention is realized through the following technical schemes of using a
text corpus annotation preparation module for managing the to-be-annotated corpora and the segmented word corpora; based on a plurality of word segmentation algorithms, such as the bidirectional maximum matching word segmentation based on an integrated dictionary, CRF, JIEBA, etc., submitting the word segmentation
annotation work of the raw corpus to a semi-automatic corpus word segmentation
annotation module; creating the segmented word tagging tasks, selecting a labeling applicable
algorithm model, carrying out the automatic annotations, on the basis of automatic labeling result fusion, feeding back a training model corpus and a labeling model generated by the
text corpus labeling preparation module to the feedback
model learning training module; selecting and carrying out
model learning training, calling a unified training model interface to generate a core dictionary, updating a word segmentation training model table, establishing a labeling
algorithm comprehensive evaluation model to evaluate a model labeling effect, so that a new word segmentation labeling task is completed.