A custom scoring method suitable for Lucene full-text search engine

A self-defined and engine technology, applied in the field of information retrieval, can solve the problems such as the scoring value exceeding expectations, small adjustment costs, and the similarity score is not intuitive, so as to improve the intervention ability, reduce the maintenance cost, and improve the development efficiency.

Active Publication Date: 2022-07-22
FOCUS TECH
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 1) Arithmetic operation superimposed clause hit scoring method will encounter cross-class problems, which will lead to situations where the scoring value exceeds expectations
[0006] 2) When defining the scoring sub-sentence score, only a fixed score can be defined. When the weight of the scoring sub-sentence changes, it will bring a lot of adjustment costs
[0007] 3) Returning a meaningless similarity score to express the final similarity score is not intuitive enough
Although Lucene provides the Explanation method to output score explanations, for some custom scenarios, there is a certain maintenance cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A custom scoring method suitable for Lucene full-text search engine
  • A custom scoring method suitable for Lucene full-text search engine
  • A custom scoring method suitable for Lucene full-text search engine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] The present invention will be further explained below in conjunction with the accompanying drawings and exemplary embodiments:

[0041] like figure 1 A self-defined scoring method suitable for the Lucene full-text search engine is shown. Under the full-text search engine Lucene, the Lucene is specifically the 8.1.1 version of Apache Lucene release, and a document collector that calls a custom scoring plug-in is constructed to use Obtaining and using the result score of document similarity calculation, that is, realizing custom scoring, includes the following steps:

[0042] Step S1, build a custom document collector. According to the design of the full-text search engine Lucene (8.1.1), the result score of the document similarity calculation is obtained and used by the document collector. To define scoring, it is necessary to build a document collector that can call a custom scoring plug-in. The present invention adopts the implementation of the custom hit document co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a self-defined scoring method suitable for Lucene full-text search engine, which is characterized by comprising: constructing a custom document collector, constructing a clause hit information collector, constructing a search clause assembly plug-in, constructing a Long-based type Numerical search and scoring plug-ins and other steps map the actual similarity ranking requirements to a limited number of Boolean clauses. According to the hit situation of different clauses and a search and scoring model based on Long-type numerical values ​​proposed by the present invention, the document's The final similarity score is mapped to the Long value in the form of high and low bits. Through the invention, not only a search and scoring plug-in with strong business scalability can be designed, but also the document similarity score output by the plug-in has high readability and interpretability, which brings convenience to development and debugging work. It reduces the difficulty of scoring and expanding the custom search based on the Lucne full-text search engine, and improves the efficiency of related work in the field.

Description

technical field [0001] The invention relates to the field of information retrieval, in particular to a self-defined scoring method suitable for a Lucene full-text retrieval engine. Background technique [0002] In the field of full-text retrieval, the open source project Apache Lucene, as the mainstream search engine in the industry, provides search service support for thousands of projects. Lucene not only excels in terms of performance, but also has good support in terms of scalability. Whether you use the Lucene engine directly or use the search engine system built on the Lucene engine, you can do personalized search customization. [0003] As the core part of full-text retrieval, the document similarity calculation process is often intervened and customized by different search services, and returns a customized score according to the actual scoring needs of the business. Although the default similarity algorithm provided by Lucene has a good performance in document sim...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/33G06F40/194G06F40/216
CPCG06F16/3341
Inventor 赵亮亮
Owner FOCUS TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products