Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Sensitive word detection method based on decision tree and variant recognition

A detection method and a technology for sensitive words, which are applied to instruments, digital data processing, calculations, etc., can solve the problems of less consideration of variants and low filtering accuracy of sensitive words, so as to improve accuracy and optimize time complexity Effect

Active Publication Date: 2022-02-25
万商云集(成都)科技股份有限公司
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In the existing research, there are still many problems to be solved, such as only direct comparison of Chinese characters or pinyin of sensitive words, less consideration of the variants of Chinese characters and pinyin, resulting in low filtering accuracy of sensitive words, etc. ; This application expects to solve the more prominent problems in the current research and provide a method with a higher recall rate and precision rate of sensitive words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sensitive word detection method based on decision tree and variant recognition
  • Sensitive word detection method based on decision tree and variant recognition
  • Sensitive word detection method based on decision tree and variant recognition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044]In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

[0045] First of all, the sensitive words and decision tree in this application are explained as follows:

[0046] The location of the sensitive word is specifically, the position of the sensitive word is represented by a tuple, the first bit of the tuple is the starting position of the sensitive word in the text, and the second bit of the tuple is the The ending position o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of natural language processing, and provides a sensitive word detection method based on a decision tree and variant recognition. The method comprises the following steps: S1, constructing a sensitive word dictionary and updating according to a preset period, wherein Chinese character patterns and corresponding pinyin are added into the sensitive word dictionary according to an initial sequence, and constructing a decision tree according to the sensitive word dictionary; S2, inputting a text into a sensitive word detection model, wherein the sensitive word detection model detects sensitive words in the text through a matching algorithm and a matching standard on the basis of the decision tree, and positions the sensitive words. Through direct comparison and similarity comparison of character patterns and pinyin, the problems that sensitive words are not included and detection is avoided by modifying character patterns, pinyin, or English of the sensitive words, can be effectively solved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a sensitive word detection method based on decision tree and variant recognition. Background technique [0002] With the development and prosperity of the Internet age, massive network resources have made it more convenient and quick for people to obtain information, communicate in life, consume and manage money. However, while people are enjoying the convenience brought by the Internet, many people take advantage of the rapid and extensive information dissemination characteristics of the Internet to publish all kinds of illegal information such as pornography, violence, reactionary, superstition, etc. It has caused great harm and brought many adverse effects to society. [0003] In order to deal with this problem, many Internet companies and public information management departments are reviewing and filtering information published on the Internet all the ti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/242G06F40/194
CPCG06F40/242G06F40/194
Inventor 王飞田文洪刘文鑫
Owner 万商云集(成都)科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products