Three-folded webpage text content recognition and filtering method based on the Chinese punctuation
A punctuation and content recognition technology, applied in the field of network information security, can solve problems such as slow filtering speed, low filtering accuracy and filtering rate, and easy bypassing of filters, and achieves improved speed, high efficiency, and CPU usage. low rate effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043]
[0044]
[0045]
[0046]
[0047]
[0048]
[0049]
[0050]
[0051]
[0052]
[0053]
[0054]
[0055]
[0056]
[0057]
[0058]
[0059]
[0060]
[0061]
[0062]
[0063]
[0064]
[0065]
[0066]
[0067] Tags construct the tag tree of the webpage, and regularize a webpage into nested content blocks; then, for the webpage set generated by using the same template, find out the content blocks that appear multiple times in the webpage set as noise content, and The content blocks that appear less frequently in the set of web pages are valid information blocks. Fudan University proposed an Internet filtering system and filtering method based on Content Filtering Agent (CFA). The system framework includes three parts: Content Filtering Agent (CFA), Query Server (QS), and Content Analysis and Management Server (CAMS). The filtering process of the network content filtering system is: ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com