Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A medical text de-privacy method and system based on Stacking ensemble learning

An integrated learning and privacy-removing technology, applied in special data processing applications, instruments, biological neural network models, etc., can solve the problem of removing private information from medical texts

Active Publication Date: 2018-12-07
黑龙江鉴成生物技术有限公司
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problem that medical texts need to remove private information in the prior art, the present invention proposes a method and system for removing privacy from medical texts based on Stacking integrated learning. Find out the protected privacy information (Protected Health Information, PHI), and determine the PHI category to which the information belongs, and then output the PHI entity and the corresponding PHI category

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A medical text de-privacy method and system based on Stacking ensemble learning
  • A medical text de-privacy method and system based on Stacking ensemble learning
  • A medical text de-privacy method and system based on Stacking ensemble learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0093] A medical text deprivation system based on Stacking integrated learning. The technical solutions adopted are as follows. The system includes:

[0094] The text segmentation module used to segment the input text to obtain the processing unit token;

[0095] The feature extraction module used to obtain the relevant features of each processing unit token;

[0096] A rule-based PHI marking module that is used to establish and obtain a rule-based PHI marking module based on the conversion rule automatically on the training data;

[0097] Used to establish and obtain PHI marking module based on conditional random field on training data;

[0098] Used to establish and obtain PHI marking module based on neural network on training data;

[0099] A PHI entity recognition module used to mark each processing unit token by using the PHI marking module, the conditional random field-based PHI marking module and the neural network-based PHI marking module to identify the PHI entity in each proces...

Embodiment 2

[0156] A medical text deprivation method based on Stacking integrated learning, the technical solution adopted is as follows, the method includes:

[0157] The text segmentation step used to segment the input text to obtain the processing unit token;

[0158] The feature extraction step for obtaining the relevant features of each processing unit token;

[0159] Automatic acquisition step based on transformation rules for establishing and obtaining automatic acquisition model based on transformation rules on training data;

[0160] A conditional random field learner step used to establish and obtain a conditional random field-based learner model on the training data;

[0161] Used to establish and obtain a neural network-based learner model on the training data. Neural network-based learner steps;

[0162] It is used to mark each processing unit token using the conversion-based rule-based automatic acquisition model, the conditional random field-based learner model, and the neural network...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a medical text de-privacy method and system based on Stacking ensemble learning, belonging to the technical field of computer medical software. The system comprises a text segmentation module, an automatic acquisition module based on conversion rules, a conditional random field based learning module, a neural network based learning module, a PHI entity identification module and a Stacking ensemble learning module. The method comprises a text segmentation step, an automatic acquisition step based on conversion rules, a conditional random field based learning step, a neural network based learning step, a PHI entity recognition step and a Stacking ensemble learning step.

Description

Technical field [0001] The invention relates to a medical text deprivation method and system based on Stacking integrated learning, and belongs to the technical field of computer medical software. Background technique [0002] Removing the private information related to the parties in the text is an important step before the text data is released to the public. For example, when the legal department discloses case information to the public, it must remove the private information of the person involved; when the NLP research institution discloses the research corpus to the public, it must remove the information that involves personal privacy. [0003] In the medical field, the United States passed the HIPAA (Health Insurance Portability and Accountability Act) bill in 1996. This bill defines 18 categories of related private information about patients, their friends, colleagues, and family members, and stipulates that these private information is in the public It must be deleted fro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06N3/04
CPCG06F40/289G06N3/045
Inventor 杨沐昀赵臻宇赵铁军朱聪慧曹海龙徐冰郑德权
Owner 黑龙江鉴成生物技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products