Word splitting method and device aiming at URL
A word segmentation method and word segmentation technology, applied in natural language data processing, special data processing applications, using information identifiers to retrieve web data, etc., to achieve the effect of improving task accuracy and efficient segmentation
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0032] In order to make the above objects, features and advantages of the present invention more obvious and comprehensible, the present invention will be further described in detail below with specific drawings.
[0033] A word segmentation method for URL provided by the present invention, the flow of the method is as follows figure 1 As shown, the main steps include:
[0034] (1) Hierarchical segmentation. Firstly, the URL of the semi-structured data is segmented according to its internal hierarchical structure to obtain five hierarchical parts;
[0035] (2) Symbol segmentation and regular expression filtering, carry out sequentially on each level, segment it according to special symbols, and perform regular expression filtering on fields with specific formats, such as IP addresses, dates, numbers, etc., and further Sanitize non-alphabetic characters in URLs;
[0036] (3) Segmentation of character strings, using the two-way maximum matching algorithm and probability model ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com