Full-text search method supporting search request containing missing symbols

A technology for querying requests and symbols, which is applied in the information field and can solve problems such as inability to query

Inactive Publication Date: 2014-08-06
PEKING UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method faces serious efficiency problems when dealing with the intersection of large matching sets: just imagine, the cost of intersecting two very long suffix tables of frequently occurring keywords may be some small online unbearable to the system
However, these traditional round-robin index methods fail to fully consider the missing symbols in the query request, if there is a missing symbol "_" in the query request, the query will not be possible

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Full-text search method supporting search request containing missing symbols
  • Full-text search method supporting search request containing missing symbols
  • Full-text search method supporting search request containing missing symbols

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Hereinafter, the present invention will be described in detail through specific embodiments and accompanying drawings.

[0030] figure 1 It is a flowchart of the steps of the full-text search method for supporting query requests containing missing symbols in this embodiment. For a given series of text content, this embodiment establishes a symbol rotation index for it. Its index structure consists of three parts: a wavelet tree based on the results of various BWT conversions, a mapping from each symbol to the total number of symbols before the first appearance position on the F column generated based on the BWT conversion-table C, And an array that records the mapping relationship from the subscript of each element in column F to the subscript of each symbol element in the original text T-FT array. For the query request Q initiated by the user to the text content, this embodiment can return all the matching positions of Q in the original text content within the time compl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a full-text search method supporting a search request containing missing symbols. The full-text search method comprises the following steps: converting content symbols of a text into symbol strings, and splicing the symbol strings into a long symbol string; respectively carrying out BWT (Burrows-Wheeler transformation), (l,m)-skipping-BWT, w-ahead-BWT and SET-skipping, and w-ahead-BWT to the long symbol string; constructing a wavelet tree on the basis of the four conversion results, establishing a C table and a FT (functional test) array, and establishing a symbol rotation index; and matching the given search request with the full text, and outputting the search results, wherein the search request can contain one section or multiple sections of continuous missing symbols. The full-text search method provided by the invention supports different symbolization methods, and also supports the search request containing missing symbols, thereby meeting different kinds of search requirements.

Description

Technical field [0001] The invention belongs to the field of information technology, and relates to a retrieval method, in particular to a full-text retrieval method based on a rotating index, which can give a quick response to a query request containing missing symbols. Background technique [0002] With the development of electronic information, the electronic data managed by computers is developing at an unprecedented speed. From the perspective of data structure, these data can be divided into structured data and unstructured data. Structured data refers to data with a pre-defined data model, including flight schedules, employee information tables, etc.; unstructured data refers to data without a pre-defined data model, often with a large amount of text content as the main body. With the development of the World Wide Web, unstructured data is growing at a crazy rate. [0003] The traditional relational database management system can manage structured data well. But for unstr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 孟必平王腾蛟李红燕高军杨冬青唐世渭
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products