Method and system for searching target theme
A retrieval system and theme technology, applied in the fields of instruments, calculations, electrical digital data processing, etc., can solve the problems of complex, incomplete, and accurate difference in thematic clustering methods, and achieve high precision, expanded content, and expanded scope Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0031] A method for retrieving a target topic is provided in this embodiment, including the following steps:
[0032] S1: Determine the related words of the target topic. Expand according to the target subject words to obtain related words of the target subject words. The method here can use the method of expanding search keywords in the prior art to determine related words of the target topic. In this embodiment, a method of calculating related words of the target topic is provided, as follows:
[0033] First, search in the database according to the target subject to obtain all hit sentences.
[0034] Then, to obtain related sentences before and after the hit statement, the previous sentence and the next sentence can be obtained,
[0035] In other implementation manners, the first two sentences or the last two sentences may also be obtained.
[0036] Next, the hit sentence and the related sentence are word-segmented.
[0037] Finally, count the word frequency after all w...
Embodiment 2
[0049] This embodiment provides a search method for a subject term, according to which the subject term is used to obtain its relevant content, which can be used in scenarios such as clustering and classification. The specific process is as follows, as figure 2 Shown:
[0050] 1. Establish a corpus, including some full-text text content of the corpus.
[0051] 2. Use the subject headings to perform full-text searches in the corpus.
[0052] 3. Extract the sentence where the search result is located and the sentence before and after each sentence, a total of three sentences to form a screening sentence.
[0053] 4. Use the tokenizer to segment all the filtered sentences, sort them according to the word frequency from large to small, and take out the first N words as related words.
[0054] 5. Use these words to search respectively from the text to be searched, and obtain the set of search results R1.
[0055] 6. Segment the subject words with a tokenizer to obtain several w...
Embodiment 3
[0060] This embodiment only needs to solve the problem of topic content aggregation, that is, through a topic word, expand some related words, use these related words to search, and get the result R1; Take the intersection to get the result set R2, and then merge the two parts of the results R1 and R2 to generate a topic to solve the problem of topic aggregation. The specific process is as follows:
[0061] 1. Perform full-text retrieval from the corpus through the topic words specified by the user.
[0062] 2. For the hit sentence, use the method of drawing a window to take the hit sentence and one sentence before and after each, a total of three sentences.
[0063] 3. Segment the three sentences into words.
[0064] 4. All hit sentences are processed in the order of 2 and 3, and the word frequency after word segmentation is counted. After the statistics, they are sorted according to the word frequency. After sorting, the first few words are selected according to a certain ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com