Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method and system for removing privacy of medical texts based on stacking ensemble learning

An integrated learning and privacy-removing technology, applied in unstructured text data retrieval, text database clustering/classification, instruments, etc., can solve the problem of removing private information from medical texts

Active Publication Date: 2019-09-10
黑龙江鉴成生物技术有限公司
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the problem that medical texts need to remove private information in the prior art, the present invention proposes a method and system for removing privacy from medical texts based on Stacking integrated learning. Find out the protected privacy information (Protected Health Information, PHI), and determine the PHI category to which the information belongs, and then output the PHI entity and the corresponding PHI category

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and system for removing privacy of medical texts based on stacking ensemble learning
  • A method and system for removing privacy of medical texts based on stacking ensemble learning
  • A method and system for removing privacy of medical texts based on stacking ensemble learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0093] A medical text privacy removal system based on Stacking integrated learning, the technical solution adopted is as follows, and the system includes:

[0094] A text segmentation module used to segment the input text to obtain a processing unit token;

[0095] A feature extraction module for obtaining the relevant features of each processing unit token;

[0096] A rule-based PHI labeling module for building on training data and obtaining transformation-based rules automatically;

[0097] Used to build and obtain PHI labeling modules based on conditional random fields on the training data;

[0098] It is used to establish and obtain the PHI labeling module based on the neural network on the training data;

[0099] It is used to mark each processing unit token by using the PHI marking module, the conditional random field-based PHI marking module and the neural network-based PHI marking module, and identify the PHI entity recognition module of the PHI entity in each proces...

Embodiment 2

[0156] A kind of medical text de-privacy method based on Stacking integrated learning, the adopted technical scheme is as follows, and described method comprises:

[0157] A text segmentation step for segmenting the input text to obtain a processing unit token;

[0158] A feature extraction step for obtaining relevant features of each processing unit token;

[0159] An automatic acquisition step based on conversion rules for establishing and obtaining an automatic acquisition model based on conversion rules on the training data;

[0160] a conditional random field based learner step for building and obtaining a conditional random field based learner model on the training data;

[0161] For establishing and obtaining the neural network-based learner model on the training data based on the neural network learner step;

[0162] It is used to mark each processing unit token by using the conversion-based rule automatic acquisition model, the conditional random field-based learner...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and system for removing privacy of medical texts based on Stacking integrated learning, belonging to the technical field of computer medical software. The system includes a text segmentation module, an automatic acquisition module based on conversion rules, a conditional random field learner module, a neural network learner module, a PHI entity recognition module and a Stacking integrated learning module; the method includes a text segmentation step, Automatic acquisition step based on conversion rules, step based on conditional random field learner, step based on neural network learner, step of PHI entity recognition and Stacking integrated learning step.

Description

technical field [0001] The invention relates to a method and system for removing privacy of medical texts based on Stacking integrated learning, belonging to the technical field of computer medical software. Background technique [0002] Removing the private information related to the parties in the text is an important step before the text data is released to the public. For example, when the legal department discloses case information to the public, it needs to remove the private information of the person involved; when the NLP research institution releases the research corpus to the public, it needs to remove the information related to personal privacy. [0003] In the medical field, the United States passed the HIPAA (Health Insurance Portability and Accountability Act) Act in 1996, which defines 18 types of private information about patients and their friends, colleagues and family members, and stipulates that these private information should be disclosed to the society...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/36G06F17/27G06N3/04
CPCG06F40/289G06N3/045
Inventor 杨沐昀赵臻宇赵铁军朱聪慧曹海龙徐冰郑德权
Owner 黑龙江鉴成生物技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products