Document retrieval method and device
A document retrieval and document technology, which is applied in the direction of instruments, calculations, electrical digital data processing, etc., can solve the problems of unable to sort the retrieval results, etc., and achieve the effect of meeting user needs and accurate sorting results
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0072] This embodiment is an index establishment process, specifically as follows:
[0073] Step 01: Segment the fields in the document with precise retrieval requirements by word to obtain one or more retrieval tokens, and create an index for each retrieval token;
[0074] Step 02: Add an extra marker (Term) to the index to mark the end of the field. The text of Term uses a predefined character END. END is an illegal character in the character encoding set to ensure that it will not repeat with normal text;
[0075] Step 03: Record and save the length of the field of each document, that is, the number of search word segments contained in the field, and the length value greater than 255 is treated as 255 to facilitate storage and calculation.
Embodiment 2
[0077] This embodiment is a document retrieval process, specifically as follows:
[0078] Step 11: Segment the search keyword in the search request by character to obtain N search word segments. If it involves the positional relationship with the end of the field, additionally add END as the N+1th search word;
[0079] Step 12: Analyze the search keywords and the wildcards in them, obtain and record the positional relationship between each search word, including:
[0080] The positional relationship between the first search word and the beginning of the document, the positional relationship between the second search word and the first search word, ..., the positional relationship between the Nth search word and the end of the document;
[0081] The position relationship can be represented by a set of minimum position value and maximum position value, denoted as (min, max). The minimum value of min is 0, that is, the position is the same, and the maximum value of max is MAX, ...
Embodiment 3
[0086] This embodiment illustrates the specific implementation mode through the implementation of searching the entry fields of "Ci Hai" in the enterprise search application.
[0087] The search for the entry fields of "Ci Hai" requires the ability to find the documents containing the search word at a specific position, and use the above rules to sort according to the hit position and the length of the hit document.
[0088] Wildcards "?" and "*" are supported in the search request, where "?" represents 0 or 1 character, and "*" represents 0 or 1 or more characters. wildcards.
[0089] The following is a detailed explanation of the use of various types of wildcards:
[0090]
[0091] During the retrieval process, it is necessary to match not only the positional relationship between the searched word, but also the positional relationship between the searched word and the beginning and end of the document.
[0092] Before retrieval, an index building process is required, as...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com