RNNs-based method for automatic safety checking of short message
A SMS, security technology, applied in natural language data processing, instruments, computing and other directions, can solve the problems of discrimination, difficult expansion, inconsistent prison rules and so on
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0060] A method for automatic security audit of SMS based on RNNs, such as figure 1 As shown, the specific steps include:
[0061] (1) Preprocessing is carried out to historical text message data, and preprocessing includes removing noise, Chinese word segmentation; Described removing noise includes removing the punctuation mark in the note, rejecting the note that word count is less than 3; Participle.
[0062] Words are the smallest meaningful language components that can move independently. Spaces are used as natural delimiters between English words, while Chinese uses characters as the basic writing unit, and there is no obvious distinguishing mark between words. Therefore, Chinese Word analysis is the foundation and key of Chinese information processing. We tag words according to the part of speech of each word in the sentence. For example: "We form a team", the part-of-speech tag is: we ad / combination v / cheng v / one m / team n / , the result of Chinese word segmentation is...
Embodiment 2
[0091] According to the method for a kind of RNNs-based short message automatic security review described in embodiment 1, its difference is:
[0092] Described step (2), extract feature based on the CBOW model of Hierarchical Softmax, the block diagram of CBOW model is as figure 2 As shown, it specifically includes: maximizing the optimization function of the CBOW model based on Hierarchical Softmax, and training to obtain the word vector of each Chinese word segmentation; the optimization function of the CBOW model based on Hierarchical Softmax is shown in formula (I):
[0093]
[0094] C is a corpus, and w refers to any word obtained after Chinese word segmentation in step (1); Context(w) is the context of w.
[0095] The word vector of each word is trained by maximizing this likelihood function. When the training converges, words with similar meanings will be mapped to similar positions in the vector space. In our model, word vectors are trained through Sogou corpus, ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com