Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Named entity identification method for medical text data

A named entity recognition and text data technology, applied in the field of information extraction, can solve the problems of medical named entity recognition of medical text data, etc.

Inactive Publication Date: 2017-09-15
BEIJING UNIV OF CHEM TECH
View PDF6 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the complexity of Chinese natural language processing and the uniqueness of the above-mentioned medical text data make medical named entity recognition a difficult problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Named entity identification method for medical text data
  • Named entity identification method for medical text data
  • Named entity identification method for medical text data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0024] In this embodiment, a Hidden Markov Model (HMM) is used to sequentially label the original medical text to obtain a prediction word segmentation result. After the predicted word segmentation process is completed, the semi-supervised learning method is used to iteratively self-learn the word segmentation results to obtain accurate word segmentation and named entity recognition results. In this embodiment, by comparing the advantages and disadvantages of various supervised learning methods and combining the semi-supervised learning method for error correction, the longitudinal named entity recognition of diseases is studied. The aim is to summarize methods that can extract accurate information quickly and with less manual intervention.

[0025] Use HMM to solve named entity recognition annotations, that is, given a sequence of observations (1):

[0026] P(Y|X)=p(x 1 , n), X={x 1 , x 2 ,...x n} (1)

[0027] To find an optimal marker sequence (2) that maximizes the co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a named entity identification method for medical text data, and belongs to the technical field of information extraction. A hidden Markov model is used for carrying out sequence labeling on an original medical text to obtain a prediction word segmentation result. After prediction word segmentation processing is finished, a semi-supervised learning method is used for carrying out iterated self-learning on the word segmentation result to obtain an accurate and word segmentation and named entity identification result.

Description

technical field [0001] The invention relates to the technical field of information extraction, in particular to a named entity recognition method for medical text data. Background technique [0002] In the context of the current era of vigorously developing information technology, many medical institutions are building or have completed medical information systems. With the development and improvement of medical information systems, the accumulated medical data will provide reliable data support for the future research and development of medicine and information science. In recent years, the mathematical research on statistical data has been relatively mature, and the big data research on massive medical statistical data has also been carried out in full swing, which has played a good role in prediction and prevention and control. [0003] A large amount of text data, such as text medical records, medical literature, health information standards, etc., contains a lot of res...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06N99/00G06F19/00
CPCG06F40/279G06F40/284G06N20/00G16Z99/00
Inventor 史晟辉徐梓豪李五锁黄定琦陈晓宇张永健朱群雄林晓勇
Owner BEIJING UNIV OF CHEM TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products