Full-text retrieval matching method and system based on Lucence custom lexicon

A technology for customizing words and matching methods, which is applied in the field of big data search and can solve problems such as phrases that cannot be retrieved

Active Publication Date: 2016-10-12
泉州奇兔网络科技有限公司
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If word segmentation is not performed for a specific phrase, the phrase cannot be retrieved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Full-text retrieval matching method and system based on Lucence custom lexicon

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0034] see figure 1 As shown, the embodiment of the present invention provides a kind of full-text retrieval matching method based on Lucence self-defined thesaurus, comprises the following steps:

[0035] S1. Establish a Lucence custom thesaurus that supports Lucence full-text search: In the search environment based on the Lucence full-text search engine, obtain the search words entered by the user in real time, and check whether the search results are found. If no results are found, the search will not be found. Remove the special characters from the search words of the results and store them in the Lucence custom thesaurus; if the search results are found, the word segmentation processing is performed on the search words of the search results to obtain several phrases after word segmentation; continue to search for several phrases ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a full-text retrieval matching method and system based on a Lucence custom lexicon, and relates to the field of big data search. The method comprises the following steps: in a Lucence search environment, obtaining a search word input by a user in real time, detecting whether a result is searched or not, removing special characters from the search word if no results are searched, and storing into the Lucence custom lexicon; if the result is searched, carrying out word segmentation on the search word, continuously searching a plurality of phrases subjected to the word segmentation, detecting whether the result is searched or not, removing the special characters from the phrases, which can not search the result, subjected to the word segmentation if no results are searched, and storing into the Lucence custom lexicon; if the result is search, recording search time, the search word subjected to the word segmentation and search feedback information, and finally establishing the Lucence custom lexicon which supports Lucence full-text retrieval. According to the search word input by the user, the method can quickly and effectively establish an own customized Lucence custom lexicon.

Description

technical field [0001] The invention relates to the field of big data search, in particular to a method and system for full-text search and matching based on a Lucence self-defined thesaurus. Background technique [0002] Apache Lucence is an open source full-text search engine toolkit, but it is not a complete full-text search engine, but a full-text search engine architecture, which provides a complete query engine and index engine, and some text analysis engines. [0003] In order to facilitate the reader's understanding, the relevant terms are briefly explained below: [0004] Apache Lucence: An open source full-text retrieval project under Apache; [0005] Full-text search: Different from traditional fuzzy matching, the search word is first segmented according to certain rules, and then the word segmentation is matched with the source data, and the search results are obtained by scoring according to the number of occurrences of word segmentation, word segmentation adja...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/3325G06F16/3334G06F16/334G06F16/3326G06F16/35G06F16/00G06F16/3335G06F16/374
Inventor 白凡
Owner 泉州奇兔网络科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products