Pseudo-correlation feedback model information retrieval method and system based on BERT

A pseudo-correlation feedback and information retrieval technology, which is applied in the field of information retrieval methods and systems, can solve problems such as excessive information, long calculation time, and high difficulty, and achieve the effects of improving accuracy, good discrimination, and improving retrieval effects

Active Publication Date: 2019-11-12
HUAZHONG NORMAL UNIV
View PDF3 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] After BERT (Bidirectional Encoder Representations from Transformers) was proposed recently, as a substitute for Word2Vec, using the structure of Transformer will bring Word2Vec, which is already in the bottleneck period, to a new direction, and its 11 in the field of NLP (Natural Language Processing) The direction has greatly improved the accuracy, and selflessly open sourced the source code and models in multiple languages, which has very high commercial value. It can be said that it is the best breakthrough technology for self-residual networks in recent years. Although BERT has many advantages, but It is very difficult to use BERT in the field of information retrieval technology. If all documents are calculated by the BERT method, the amount of information will be too large and the calculation time will be too long. Therefore, it is not appropriate to directly use BERT for information retrieval. actual

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Pseudo-correlation feedback model information retrieval method and system based on BERT
  • Pseudo-correlation feedback model information retrieval method and system based on BERT
  • Pseudo-correlation feedback model information retrieval method and system based on BERT

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

[0031] The present invention proposes to first filter the documents through the BM25 retrieval model, and then score each sentence in the filtered documents and the original query Q based on the BERT semantic similarity, and take the b with the highest score in each document (in the embodiment It is preferably set to 4) sentence scores as the score of the document, and then scan each important candidate extension word generated by Rocchio, the BERT score of the word is the sum of the sentence scores of all sentences where the word is located, and take this The semantic similarity is fused into the pseudo-relevance feedback model as an additional weight to achieve the final document evaluation and query expansion to improve the accuracy of retrieval.

[0032] The embodiment proposes an information retrieval method that integr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a pseudo-correlation feedback model information retrieval method based on BERT. The method comprises: in the first round of retrieval of pseudo-correlation feedback, evaluatinga target document set D through a BM25 model, screening out a document set D', evaluating documents in the document set D' again through a BERT model, and obtaining BERT scores of the documents; linearly fusing the document scores obtained through the BM25 retrieval model and the BERT model to obtain a pseudo-correlation document set D1; performing query expansion based on the pseudo-correlation document set D1 to select candidate expansion words, and optimizing the candidate expansion words through BERT sentence semantic similarity to obtain final expansion words; and combining the final extension word with the original query Q to generate a new query keyword set, and performing a second round of retrieval on the target document set D by using the BM25 model to obtain a final retrieval result. According to the method, a large amount of useless and irrelevant information can be removed from massive information, more accurate candidate words are obtained, and the precision of extended query and final retrieval is improved.

Description

technical field [0001] The invention belongs to the technical field of information retrieval, and in particular relates to an information retrieval method and system that integrates BERT into a pseudo correlation feedback model. Background technique [0002] In the modern social environment, the rapid development of environmental Internet technology has realized the sharing of global resources, the network resources are extremely rich, and the total amount of information is rapidly expanding. In the network environment, human life style and way of thinking will undergo major changes. The digital environment and information network technology affect and change the traditional way of learning of human beings. Facing the vast ocean of information, people urgently need a more effective information processing technology to deal with the growing mass of data. They should fully rely on advanced technology To search and absorb useful knowledge. As a classic text processing technol...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/9532G06F16/332G06F17/27
CPCG06F16/9532G06F16/332
Inventor 何婷婷王俊美潘敏王雪彦黄翔应志为
Owner HUAZHONG NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products