Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Estimation of postings list length in a search system using an approximation table

Inactive Publication Date: 2011-02-17
GLOBALSPEC
View PDF51 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0019]The present invention provides, in a first aspect, a method of minimizing accesses to secondary storage when searching an inverted index for a search term. The method comprises automatically obtaining a predetermined size of a posting list for the search term, the predetermined size based on document frequency for the search term, wherein the posting list is stored in secondary storage, and reading at least a portion of the posting list into memory based on the predetermined size.
[0020]The present invention provides, in a second aspect, a computer system for minimizing accesses to secondary storage when searching an inverted index for a search term. The computer system comprises a memory, and a processor in communication with the memory to perform a method. The method comprises automatically obtaining a predetermined size of a posting list for the search term based on document frequency for the search term, wherein the posting list is stored in secondary storage, and reading at least a portion of the posting list into memory based on the predetermined size.
[0021]The present invention provides, in a third aspect, a program product for minimizing accesses to secondary storage when searching an inverted index for a search term. The program product comprises a storage medium readable by a processor and storing instructions for execution by the processor for performing a method. The method comprises automatically obtaining a predetermined size of a posting list for the search term based on document frequency for the search term, the posting list being stored in secondary storage, and reading at least a portion of the posting list into memory based on the size approximated.

Problems solved by technology

A large inverted index may not fit into a computer's main memory, requiring secondary storage, typically disk storage, to help store the posting file, lexicon, or both.
Each separate access to disk may incur seek time on the order of several milliseconds if it is necessary to move the hard drive's read heads, which is very expensive in terms of runtime performance compared to accessing main memory.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Estimation of postings list length in a search system using an approximation table
  • Estimation of postings list length in a search system using an approximation table
  • Estimation of postings list length in a search system using an approximation table

Examples

Experimental program
Comparison scheme
Effect test

case 1

imateReadSize<=bufsize

[0056]Build a two-stage predetermined buffer fill size strategy as indicated below in Table II.

TABLE IIStageFill SizeNumber of Times to Use1approximateReadSize128 kilobytesRepeat as necessary

[0057]The above two-stage strategy, when installed in an Enhanced Buffered Reader and used to read the posting list, will with high probability result in a single disk seek and read of exactly approximateReadSize bytes. As many supplemental 8 kilobyte reads as necessary may then be issued to handle the relatively rare case when the approximateReadSize is insufficient.

case 2

imateReadSize>bufsize

[0058]For this discussion, let “ / ” represent the operation of integer division, and “%” represent the operation of integer modulo.

[0059]In this case, we build a predetermined buffer fill size strategy that generally has three stages, as indicated in the following table. However, the second stage is not necessary when the bufsize divides the approximateReadSize evenly).

TABLE IIIStageFill SizeNumber of Times to Use1bufsizeapproximateReadSize / bufsize2approximateReadSize % bufsize138 kilobytesRepeat as necessary

[0060]The above strategy, when installed in an Enhanced Buffered Reader and used to read the posting list, will utilize the available input buffer of size bufsize bytes to read approximateReadSize bytes of data using a minimal number of disk seeks and minimal data transfer. The approximateReadSize is sufficient to read the entire posting list with high probability; however, as many supplemental 8 kilobyte reads as necessary will be issued to handle the relati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a method of minimizing accesses to secondary storage when searching an inverted index for a search term. The method comprises automatically obtaining a predetermined size of a posting list for the search term, the predetermined size based on document frequency for the search term, wherein the posting list is stored in secondary storage, and reading at least a portion of the posting list into memory based on the predetermined size. Corresponding computer system and program products are also provided.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application claims priority under 35 U.S.C. §119 to the following U.S. Provisional Applications, which are herein incorporated by reference in their entirety:[0002]Provisional Patent Application Ser. No. 61 / 233,411, by Flatland et al., entitled “ESTIMATION OF POSTINGS LIST LENGTH IN A SEARCH SYSTEM USING AN APPROXIMATION TABLE,” filed on Aug. 12, 2009; and[0003]Provisional Patent Application No. 61 / 233,420, by Flatland et al., entitled “EFFICIENT BUFFERED READING WITH A PLUG IN FOR INPUT BUFFER SIZE DETERMINATION,” filed on Aug. 12, 2009;[0004]Provisional Patent Application Ser. No. 61 / 233,427, by Flatland et al., entitled “SEGMENTING POSTINGS LIST READER,” filed on Aug. 12, 2009.[0005]This application contains subject matter which is related to the subject matter of the following applications, each of which is assigned to the same assignee as this application and filed on the same day as this application. Each of the below listed ap...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F17/30622G06F16/319
Inventor FLATLAND, STEINARDALTON, JEFF J.
Owner GLOBALSPEC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products