Junk corpus screening method, system and device based on LGBM model and BTM model
A screening method and model technology, applied in character and pattern recognition, natural language data processing, instruments, etc., can solve problems such as the influence of subjective factors, achieve the effect of reducing workload, reducing the cost of manual labeling, and ensuring the speed of inference
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0056] refer to Figure 1 to Figure 3 As shown, this example discloses a specific implementation of a garbage corpus screening method based on the LGBM model and the BTM model (hereinafter referred to as the "method").
[0057] Specifically refer to figure 1 and figure 2 As shown, the method disclosed in this embodiment mainly includes the following steps:
[0058] Step S1 , extract comments from the product to obtain comment data.
[0059] Specifically, in some of the embodiments, the e-commerce platform is set up based on the analysis texts that meet the conditions in the massive text database, and the comment data is extracted for different categories of commodities. For example, data extraction is performed on products such as "milk" and "cosmetics".
[0060] Then execute step S2, use the BTM model to carry out topic mining on the comment data, and summarize high-frequency words in spam comments according to the mining results.
[0061] Specifically, in some of the e...
Embodiment 2
[0074] In combination with a garbage corpus screening method based on the LGBM model and the BTM model disclosed in Embodiment 1, this embodiment discloses a specific implementation example of a garbage corpus screening system based on the LGBM model and the BTM model (hereinafter referred to as "the system") .
[0075] refer to Figure 4 As shown, the system includes:
[0076] The extracting module 100 is used to extract comments on commodities to obtain comment data;
[0077] The mining module 200 uses the BTM model to carry out topic mining on the comment data, and summarizes the high-frequency words of spam comments according to the mining results;
[0078] Training module 300, training an LGBM model based on the comment data and the high-frequency words of the spam comments;
[0079] The screening module 400 uses the trained LGBM model to screen spam comment corpus.
[0080] Specifically, in some of these embodiments, a review classification module 500 is also include...
Embodiment 3
[0088] combine Figure 5 As shown, this embodiment discloses a specific implementation manner of a computer device. The computer device may comprise a processor 81 and a memory 82 storing computer program instructions.
[0089] Specifically, the processor 81 may include a central processing unit (CPU), or an Application Specific Integrated Circuit (ASIC for short), or may be configured to implement one or more integrated circuits in the embodiments of the present application.
[0090] Among them, the memory 82 may include mass storage for data or instructions. For example without limitation, the memory 82 may include a hard disk drive (Hard Disk Drive, referred to as HDD), a floppy disk drive, a solid state drive (SolidState Drive, referred to as SSD), flash memory, optical disk, magneto-optical disk, magnetic tape or universal serial bus (Universal Serial Bus, referred to as USB) drive or a combination of two or more of the above. Storage 82 may comprise removable or non-r...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com