Sensitive word recognition method and device, equipment, storage medium and program product
A technology for sensitive words to be recognized, applied in the field of data processing, can solve the problem of low recognition accuracy of sensitive words, and achieve the effect of improving the recognition accuracy, improving the labeling effect, and improving the ability of boundary recognition.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0037] figure 1 It is a flow chart of an embodiment of a method for identifying sensitive words provided in Embodiment 1 of the present application. This embodiment can be applied to a device for identifying sensitive words, and the device can be located in a server or a client, which is not limited in this embodiment.
[0038] Such as figure 1 As shown, this embodiment may include the following steps:
[0039] Step 110, based on the pre-generated domain dictionary database, determine and acquire the word set of the text to be recognized, and each word in the word set includes head position information and tail position information.
[0040] In practice, according to different requirements and application scenarios, the text to be recognized can have different sources and functions. For example, the text to be recognized may be the text obtained after the speech is recognized by an ASR (Automatic Speech Recognition, automatic speech recognition) system and denoised and clean...
Embodiment 2
[0075] image 3 It is a flowchart of an embodiment of a method for identifying sensitive words provided in Embodiment 2 of the present application. This embodiment is described in more detail on the basis of Embodiment 1, as shown in image 3 As shown, this embodiment may include the following steps:
[0076] Step 310, in the pre-generated field dictionary library, use a matching algorithm to match the words of the text to be recognized, obtain the word set of the text to be recognized, and obtain the position of each word in the word set Describe the head position information and tail position information in the text to be recognized.
[0077] By using a matching algorithm in the domain dictionary library to perform word matching on the text to be recognized, a word set of the text to be recognized can be obtained. Exemplarily, the matching algorithm may include, but not limited to: a forward maximum matching algorithm, a reverse maximum matching algorithm, or a bidirection...
Embodiment 3
[0147] Figure 6 A structural block diagram of an embodiment of a device for identifying sensitive words provided in Embodiment 3 of the present application may include the following modules:
[0148] The word set determination module 610 is used to determine the word set of the text to be recognized based on the pre-generated domain dictionary library, and each word in the word set includes head position information and tail position information;
[0149] The word-building parts acquisition module 620 is used to split each word in the word collection into word-building parts, and obtain the word-building parts corresponding to each word;
[0150] The input vector determination module 630 is used to obtain the word vector corresponding to each word, and to obtain the word-building part vector corresponding to the word-building part of each word; and based on the word vector of each word and the word structure Word component vectors generate input vectors for said words;
[0...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com