Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A real-time indexing method based on inverted index

A technology of inverted index and temporary index library, which is applied in the field of real-time index based on inverted index, which can solve problems such as limited application occasions, decreased retrieval performance, and inability to merge in time, so as to achieve the reduction and improvement of retrieval efficiency and the improvement of user experience Effect

Active Publication Date: 2017-10-31
BEIJING ZHONGSOU NETWORK TECH
View PDF9 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Based on the method of main index library + auxiliary index library, in order to improve the update performance and shorten the update time, the auxiliary index library needs to be fully stored in the memory. As the amount of data increases in a single period of time, the auxiliary index library will cause memory A large amount of consumption must limit the amount of updated data in a single time, which greatly limits the application of this technology;
[0009] Based on the scheme of multiple disk index libraries + independent memory index library, after the capacity of the memory index library reaches a certain level, the memory library will be flushed to the disk, which avoids the capacity limit of the memory index, but due to the multiple disk index libraries Existence, and cannot be merged in time, will lead to the need to query multiple independent index libraries (multiple disk libraries + memory libraries) during the retrieval, which will reduce the retrieval performance and affect the user experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A real-time indexing method based on inverted index
  • A real-time indexing method based on inverted index

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The present invention will be described in further detail below in conjunction with the accompanying drawings.

[0057] The present invention provides a real-time indexing method based on inverted index, characterized in that: the method includes the following steps:

[0058] Step 1: Preprocessing the data;

[0059] Step 2: Update the memory index library;

[0060] Step 3: Update the disk index library.

[0061] Described step 1 comprises the following steps:

[0062] Step 1-1: Parse the updated data or new data;

[0063] Step 1-2: Segment the parsed data;

[0064] Step 1-3: Pre-sort the word-segmented data.

[0065] In the step 1-1, the data to be indexed is first selected according to the index configuration file, and then the selected data is denoised, and the data is denoised by removing useless symbols, that is, the update data or new data is completed to parse.

[0066] In the step 1-2, first perform word segmentation on the data to be processed as required, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a real-time indexing method based on a reverse index. The method comprises the following steps: pretreating data; updating an internal memory index database; updating a disk index database. With adoption of the real-time indexing method based on the reverse index, on the premise of not affecting the searching performance as much as possible, the updated index database number is reduced, the limitation to the data size by the internal memory database is eliminated, a multipath processor is optimized in a targeted manner, data updating of a multipath concurrence multithreading is supported better, a more flexible data updating mode and efficient data updating efficiency are provided, the data timeliness is enhanced, and excellent searching experience is provided for a user.

Description

technical field [0001] The invention relates to an indexing method, in particular to a real-time indexing method based on an inverted index. Background technique [0002] For the field of text indexing, new data and updates can be performed without affecting the normal operation of the current system [0003] Real-time fast indexing of data has always been a subject of public relations. The real-time data update speed is very important especially for information services, which directly affects the user experience and the timeliness of push information. At present, the real-time update methods of the retrieval system are various, but they are all updated and optimized according to the index structure to speed up the update speed as much as possible. According to the index update method, it can basically be classified into two categories: [0004] (1) Main index library + auxiliary index library [0005] The main index library contains most of the data, and the auxiliary i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/2272G06F16/2315
Inventor 张宏利高勇秦飞樊云红郭永福
Owner BEIJING ZHONGSOU NETWORK TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products