Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Disambiguation method and systemin traditional Chinese medicine text word segmentation process, equipment and medium

A text segmentation, traditional Chinese medicine technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc. and other problems to achieve accurate word segmentation results and eliminate the effect of correct word segmentation

Active Publication Date: 2019-11-26
SHANDONG NORMAL UNIV
View PDF6 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the existing process of word segmentation for TCM texts, the result of word segmentation is not accurate enough, especially for words with combination ambiguity, accurate word segmentation and accurate disambiguation cannot be achieved, resulting in unsatisfactory word segmentation results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Disambiguation method and systemin traditional Chinese medicine text word segmentation process, equipment and medium
  • Disambiguation method and systemin traditional Chinese medicine text word segmentation process, equipment and medium
  • Disambiguation method and systemin traditional Chinese medicine text word segmentation process, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] Embodiment 1, the present disclosure provides a disambiguation method in the word segmentation process of TCM texts;

[0027] Such as figure 1 As shown, the disambiguation methods in the word segmentation process of TCM texts include:

[0028] S1: Obtain the TCM text to be segmented; perform preprocessing on the TCM text, the preprocessing includes: deleting stop words, repeated words and modal particles;

[0029] S2: Perform word segmentation processing on the preprocessed TCM text;

[0030] S3: Match the result after word segmentation processing with the pre-built combined ambiguous thesaurus, and filter out combined ambiguous words and non-combined ambiguous words from the results after word segmentation processing; store the non-combined ambiguous words in the word segmentation result database;

[0031] S4: Mark the word frequency and part of speech of the selected combined ambiguous words, calculate the mutual information vector of the current combined ambiguous ...

example 1

[0125] Example 1: Weight loss of 15kg within 2 years.

example 2

[0126] Example 2: The patient's tongue is slightly swollen.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a disambiguation method in a traditional Chinese medicine text word segmentation process. The disambiguation method comprises the steps of obtaining a to-be-segmented traditional Chinese medicine text; preprocessing the traditional Chinese medicine text; performing word segmentation processing on the preprocessed traditional Chinese medicine text; matching a result after word segmentation processing with a pre-constructed combined ambiguous word bank, and screening out combined ambiguous words and non-combined ambiguous words from the result after word segmentation processing; storing the non-combined ambiguous words into a word segmentation result database; performing word frequency and part-of-speech tagging on the screened combined ambiguous words, calculating amutual information vector of the current combined ambiguous words according to the part-of-speech and the part-of-speech frequency of the screened combined ambiguous words, inputting the mutual information vector into a pre-trained support vector machine model, and outputting whether the category of the current combined ambiguous words is a detachable category or not; and realizing splitting or non-splitting processing of the current combined ambiguous word according to the category. Correct word segmentation of combined vocabularies in the traditional Chinese medicine text word segmentation process is eliminated, and accurate disambiguation of the combined traditional Chinese medicine vocabularies is achieved.

Description

technical field [0001] The present disclosure relates to the technical field of text word segmentation, and in particular to a disambiguation method, system, device and medium in the process of text word segmentation in traditional Chinese medicine. Background technique [0002] The statements in this section merely mention background art related to the present disclosure and do not necessarily constitute prior art. [0003] In the process of realizing the present disclosure, the inventors found that the following technical problems existed in the prior art: [0004] In the existing process of word segmentation of TCM texts, the result of word segmentation is not accurate enough, especially for words with compound ambiguity, accurate word segmentation and accurate disambiguation cannot be achieved, resulting in unsatisfactory word segmentation results. Contents of the invention [0005] In order to solve the deficiencies of the prior art, the disclosure provides a disambi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/35
CPCG06F16/353Y02A90/10
Inventor 袁锋王冰郑向伟于凤洋
Owner SHANDONG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products