Database substring filtering index system and method for constructing and inquiring database substring filtering index system

An indexing system and database technology, applied in the database field, can solve problems such as the impact of data loading speed, the inability to guarantee data return, and the reduction of indexing performance, so as to reduce disk access overhead and data decompression overhead, improve full table scan performance, and reduce disk access. The effect of the number of reads and writes

Inactive Publication Date: 2012-12-19
天津神舟通用数据技术有限公司
View PDF2 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1. Unable to get accurate results
Inverted index can only create an index for possible keywords by word segmentation first, not a real index of any substring in the full text, so the query result cannot guarantee all the substrings that contain the query substring (especially irregular words or self-made words) The data is all returned, and some data containing the query substring may not be preprocessed by the inverted index because it has not been successfully segmented, so this is not available in the exact query of the database
[0005] 2. The space cost is high. The index itself must preprocess all the word segmentations that may be queried. The index data volume is very large, even larger than the data itself, and it is difficult to be compressed. The pressure is extremely high during the data import process; for massive history For data, it is often very easy to compress, so the amount of data can be greatly reduced through compression, but once this type of index is used, the index is huge and almost impossible to compress, so that the benefits of data compression are almost offset
[0006] 3. High maintenance cost and slow indexing speed
There are two ways to create an index. One is to build it first. The cost is that the index needs to be maintained synchronously when the data is imported. Random index insertion may be very time-consuming. At the same time, after the index is built, an independent compression is still required The second is post-establishment, which means that after the data loading is completed, a large amount of data needs to be re-segmented and indexed. If a single index maintains less data, it means that the number of indexes will increase, which reduces the performance of the index itself
Taken together, no matter which mode is used, maintaining full-text indexes will have a serious impact on data loading speed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Database substring filtering index system and method for constructing and inquiring database substring filtering index system
  • Database substring filtering index system and method for constructing and inquiring database substring filtering index system
  • Database substring filtering index system and method for constructing and inquiring database substring filtering index system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] Embodiments of the present invention are described in further detail below in conjunction with the accompanying drawings:

[0043] A database substring filtering index system, such as figure 1As shown, it includes a sharded data storage module and a filter index storage module. The sharded data storage module stores the imported data in the database as sharded data, and each sharded data is equivalent to a row in an ordinary database, except that each sharded data is larger and stored continuously, so that it can be stored continuously. Obtain high access performance; for different fragmented data, it is required to quickly locate the fragmented data in the actual storage according to the fragmentation number, that is, the storage of fragmented data provides external random access functions. The filtering index storage module is used to store the substring filtering bitmap, and the substring feature filtering bitmap is to extract the general knowledge describing the su...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a database substring filtering index system and a method for constructing and inquiring the database substring filtering index system. The database substring filtering index system is characterized by comprising a fragmented data storage module and a filtering index storage module, wherein the fragmented data storage module is used for storing input data in a fragmented manner and extracting the characteristics of the substrings describing the fragmented data from each fragmented data to construct one or more substring characteristic filter bitmaps; and the filtering index storage module is used for storing substring characteristic filter bitmaps to be substring characteristic indexes; the invention further provides the method for constructing and inquiring the substring filtering index system. The database substring filtering index system has a rational design, solves the problem that the traditional index space is large, and the compressed data is difficult to index, achieves the function of efficiently inquiring substrings and keywords of massive text data, thereby greatly reducing the disk reading-writing times to data and improving full table scanning performance of the database.

Description

technical field [0001] The invention belongs to the technical field of databases, and in particular relates to a database substring filtering and indexing system and a construction and query method thereof. Background technique [0002] In many industries, a large amount of data is generated all the time, thus forming a storage requirement for massive historical data. In these massive data, it is often necessary to perform substring queries on some text data. The so-called substring query refers to querying data containing a certain substring for a certain column of data, or a database like query. In most cases, it is in the form of select*from t where name like'%abcd% ' query, which is actually a substring query of abcd on the name column. These requirements require us to build an index for this type of query to meet the fast query requirements. At the same time, due to the continuous generation of new data that needs to be imported into the storage system, the database sy...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 蔡华林冯柯徐昶何清法周丽霞蒋志勇毛云青赵殿奎李海峰
Owner 天津神舟通用数据技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products