Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

A search optimization method based on local chronicle research

An optimization method and local chronicle technology, applied in the field of information search, can solve problems such as inability to distinguish different sentences, loss of search sentence information, and failure to consider interval words, etc., to achieve reliable search results, protect semantic features, and improve accuracy.

Active Publication Date: 2021-01-29
HUAZHONG NORMAL UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

", when using "Sentence Similarity Model and Most Similar Sentence Search Algorithm" for word tagging, only "I" can be marked as a non-repetitive word in d, which loses the important information of the search sentence
[0004] Second, it does not take into account other spaced words in the sentence
The algorithm in "Sentence Similarity Model and Most Similar Sentence Search Algorithm" is completely unable to distinguish the difference between some sentences

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A search optimization method based on local chronicle research
  • A search optimization method based on local chronicle research
  • A search optimization method based on local chronicle research

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0038] Example 1: q="In the campus, I like to draw."

[0039] d="I like to study on campus and also like to exercise."

[0040] After word segmentation, we get:

[0041] q: in / campus / in / me / like / painting

[0042] d: I / like / in / campus / in / study / also / like / sports

[0043] Set(q,d) represents the number of words appearing in both q and d, Set(q,d)=5, including "in", "campus", "inside", "me", and "like". P_q(q,d) represents the vector composed of the position numbers of the words in Set(q,d) in q, and the correspondence between the words in q and the numbers is:

[0044]

[0045] Get P_q(q,d)=(1,2,3,4,5,6). P_d(q,d) represents the vector generated by the components in P_q(q,d) arranged in the order of the corresponding words in d. The components in q are arranged in the order of the corresponding words in d as:

[0046]

[0047]

[0048]It can be seen that the word "like" appears repeatedly, select the one with the smallest total inversion number and the fewest spaced w...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of information search, and provides a search optimization method based on local chronicle research, including (1) calling the local chronicle word segmentation algorithm, which is used to statistically generate supplementary vocabulary for the local chronicle, and supplementing the vocabulary not in the default lexicon in the self-defined words (2) Invoke a search optimization algorithm that integrates word sequence features, and correct the score value of the default search algorithm by comparing word sequence features. Quantify and calculate the score of word order similarity, and then modify the score of BM25 algorithm or VSM algorithm to calculate the final score value. Articles with high word frequency and word order similarity have high scores; Sort at the top of the list and return search results that are more in line with the user's semantics, thereby improving the accuracy of the search. The method of the invention optimizes the calculation formula of the matching degree of the search algorithm, so that the search result is more accurate.

Description

technical field [0001] The invention belongs to the technical field of information search, in particular to a search optimization method based on local chronicles research. Background technique [0002] At present, the commonly used search algorithms are those based on VSM (Support Vector Machine) and BM25. Neither of the above two algorithms consider the sequence features of words in sentences. The paper "Sentence Similarity Model and Most Similar Sentence Searching Algorithm" - Lu Xueqiang, introduced the concept of word sequence. However, through the comparison of formula derivation and data verification, it is found that the algorithm in "Sentence Similarity Model and Most Similar Sentence Searching Algorithm" still has the following shortcomings. [0003] First of all, only the words that appear in both sentence A and sentence B and only appear once are marked, which will inevitably lose a lot of words, and may even lose some important words, resulting in a decrease in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/903G06F40/289G06F40/216G06F40/30
Inventor 黄涛张浩杨华利张晨晨张慧芳熊慧敏
Owner HUAZHONG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products