De-weighting method and apparatus for short text
A short text and text technology, which is applied to the field of deduplication methods and devices for short texts, can solve problems such as too strict judgment conditions, and achieve the effect of improving generalization ability and efficiency and reducing the amount of calculation.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0016] figure 1 It is a flow chart of a method for deduplication of short text in Embodiment 1 of the present invention. The method is used for deduplication of short text. The method can be executed by a device with a document processing function, and the device can be composed of Realized by software and / or hardware, for example, a typical user terminal device, such as a mobile phone, a computer, and the like. In this embodiment, the generalization relationship refers to the relationship between the general description and the specific description of an element, and the specific description is based on the general description and extended. Generalization refers to operating on elements to make them more general. The method for deduplication of short text in this embodiment includes: step S110, step S120, step S130 and step S140.
[0017] Step S110, acquiring text string information of the short text.
[0018] Specifically, the user inputs a text string to be processed to ...
Embodiment 2
[0027] figure 2 It is a flow chart of a method for deduplicating short text in Embodiment 2 of the present invention. This embodiment further explains step S120, step S130 and step S140 on the basis of embodiment 1. In step S120, obtaining the keywords of the text string according to the word segmentation information of the text string includes: removing stop words in the word segmentation information, and performing normalization processing. In step S130, the factors affecting the keyword weight include at least the frequency of each keyword and / or the reverse document frequency, and the text substring includes a threshold number of keywords including: removing the weight of the keyword in the text string Keywords less than the preset weight threshold; or, according to the weight corresponding to the keywords, select the keywords of the threshold number in the text string; divide the two or Two or more keywords are combined into phrases. In step S140, removing duplicates o...
Embodiment 3
[0039] image 3 It is a deduplication method for short text in Embodiment 3 of the present invention. On the basis of Embodiment 1 and Embodiment 2, this embodiment, as a preferred embodiment, deduplicates between two text strings operations are described. Specifically, the method for deduplication of short text in this embodiment includes: step S310, step S320, step S330, step S340, step S350, step S360 and step S370.
[0040] Step S310, acquiring information of the first text string and the second text string.
[0041] Step S320, performing word segmentation on the first text string to obtain word segmentation information of the first text string, and performing word segmentation on the second text string to obtain word segmentation information of the second text string.
[0042] Step S330, performing stop word removal and normalization operations on the word segmentation of the first text string to obtain keyword information of the first text string; and performing stop w...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com