Keyword-based bad texts detection method and device
A text detection and keyword technology, applied in unstructured text data retrieval, text database clustering/classification, special data processing applications, etc., can solve the problem of missing illegal words, difficult to identify camouflage words, low accuracy rate of web page recognition, etc. problem, to achieve the effect of improving the accuracy rate and solving the lower accuracy rate
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0053] This embodiment provides a method for detecting bad text based on keywords, which can be executed by a computer with an information processing function, a network server, or the like. Bad text refers to text content that contains bad information related to pornography, gambling, and drugs. Keywords are words with bad information or sensitive information that are pre-acquired by detectors for bad text detection, such as "sex" and other illegal words. As an application scenario of the present invention, in this embodiment, the web server detects the webpage text in the form of data stream in the network according to the method provided by the present invention. It can be understood that, for detection, the webpage text in data stream form can be restored to the webpage text in natural language form. Hereinafter, the method for detecting bad text based on keywords provided in this embodiment will be described.
[0054] figure 1 It is a flow chart of the keyword-based bad ...
Embodiment 2
[0108] Corresponding to the keyword-based bad text detection method provided in the first embodiment, the second embodiment provides a keyword-based bad text detection device. The device may specifically be a computer with an information processing function, a network server, or the like. Such as figure 2 As shown, the bad text detection device 100 based on keywords includes:
[0109] A seed word obtaining unit 101, which is used to obtain a plurality of seed words, and the seed word is a word used to represent bad information;
[0110] Semantic associated word expansion unit 102, which is used to expand the seed words acquired by the seed word acquisition unit 101 according to the semantic clustering method, to obtain semantically associated words associated with the seed word semantics, and to use the seed word and the semantically associated words as the criteria for detecting bad text Key words;
[0111]The bad text judging unit 103, when the webpage text is transmitte...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com