Text desensitization method and device, electronic equipment and computer readable storage medium
A computer program and text technology, applied in computer security devices, calculations, electrical digital data processing, etc., can solve problems such as reducing the accuracy of sensitive data identification, reducing system availability and ease of use, and missing sensitive data
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0043] like figure 1 As shown, according to the embodiment of the first aspect of the present invention, a text desensitization method is proposed, the method includes:
[0044] Step 102, obtaining the text to be processed and the Hidden Markov Model;
[0045] Step 104, performing word segmentation processing on the text to be processed according to the word segmentation database to obtain vocabulary information;
[0046] Step 106, according to the vocabulary information and the Hidden Markov Model, determine the context information corresponding to the vocabulary information;
[0047] Step 108, whether the context information satisfies the preset context information, if so, go to step 110, if not, go to step 112;
[0048] Step 110, desensitizing the vocabulary information;
[0049] Step 112, no desensitization treatment is performed.
[0050] In this embodiment, the text to be processed is segmented in combination with the word segmentation library to obtain vocabulary in...
Embodiment 2
[0052] like figure 2 As shown, according to an embodiment of the present invention, a text desensitization method is proposed, the method includes:
[0053] Step 202, obtaining the text to be processed and the Hidden Markov Model;
[0054] Step 204, performing word segmentation processing on the text to be processed according to the word segmentation database to obtain vocabulary information;
[0055] Step 206, according to the vocabulary information and the Hidden Markov Model, determine the context information corresponding to the vocabulary information;
[0056] Step 208, whether the context information satisfies the preset context information, if yes, go to step 210, if not, go to step 212;
[0057] Step 210, whether the vocabulary text in the vocabulary information conforms to the privacy vocabulary, if so, go to step 214, if not, go to step 212;
[0058] Step 212, do not perform desensitization treatment;
[0059] Step 214, mark the vocabulary text as sensitive data...
Embodiment 3
[0065] like image 3 As shown, according to an embodiment of the present invention, a text desensitization method is proposed, the method includes:
[0066] Step 302, obtaining the target text;
[0067] Step 304, using the maximum matching algorithm to perform word segmentation processing on the target text to obtain the second target vocabulary;
[0068] Step 306, counting the frequency of occurrence of the second target vocabulary in the target text;
[0069] Step 308, updating the thesaurus according to the second target vocabulary whose frequency of occurrence is greater than or equal to the preset frequency;
[0070] Step 310, performing word segmentation processing on the target text according to the word segmentation database, to obtain the first target vocabulary, the vocabulary position and semantics corresponding to the first target vocabulary;
[0071] Step 312, according to the first target vocabulary, vocabulary position, semantics and context pattern library, ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com