Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Multisource semantic analysis based information retrieval method

A technology of semantic analysis and information retrieval, applied in the field of information retrieval based on multi-source semantic analysis, which can solve problems such as query accuracy reduction

Inactive Publication Date: 2016-11-23
BEIJING UNIV OF TECH
View PDF2 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method is based on the initial query. If the first search result is not good, it may extract words that are not related to the query topic for expansion, resulting in a decrease in query accuracy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multisource semantic analysis based information retrieval method
  • Multisource semantic analysis based information retrieval method
  • Multisource semantic analysis based information retrieval method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to enable those skilled in the art to better understand the solutions of the present invention, the implementation of the solutions of the present invention will be described in detail below in conjunction with the accompanying drawings in the examples of the present invention.

[0040] Such as figure 1 As shown, the general idea of ​​an information retrieval method based on multi-source semantic analysis in the present invention is as follows: first, LDA modeling is performed on the preprocessed document, from which the representation ability of the term on the document at the hidden topic level is obtained, and then the term Establish an inverted index at the same time as the term's ability to represent the document, so that it can represent the text information in the form of a low-dimensional topic; then obtain the user's initial query text and perform preprocessing, and then according to whether each query term is a professional medical vocabulary Carry ou...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a multisource semantic analysis based information retrieval method. The method comprises the steps that document acquisition and preprocessing are performed; document modeling is performed by utilizing an LDA model, and a reverse index is established; obtaining and preprocessing of user's initial query are performed; multi-dimensional analysis is performed according to the judgment whether queried lexical items are professional medical vocabularies or not, lexical item weighting and query extension are performed based on WordNet and UMLS Metathesaurus; the similarity between a queried extended word set and documents undergoing dimensionality reduction of LDA is calculated, ranking is performed according to progressively decreasing similarity, and the documents which are not lower than a preset threshold value are extracted and returned to a user. The multisource semantic analysis based information retrieval method integrates the characteristics of the WordNet and the UMLS Metathesaurus, conducts multi-dimensional analysis, weighting and extension on the initial query, can make the user's query intention more accurately understood, utilizes the LDA model to perform document modeling, analyzes the document representation capacity of lexical items at hidden theme level and improves the document retrieval performance for the user.

Description

technical field [0001] The invention belongs to the technical field of information retrieval, and in particular relates to an information retrieval method based on multi-source semantic analysis. Background technique [0002] Information retrieval research is a research field that rises with the development of science and technology and the sharp increase of various forms of information. With the popularity of the Internet, medical researchers and doctors often use search engines to obtain the medical information they need. Therefore, how to accurately grasp the user's retrieval intention and how to accurately extract the information that the user is interested in from the massive data Returning information to users has become a primary topic. In response to this problem, the use of query expansion technology to discover and utilize the content of medical literature has become one of the most popular means to improve retrieval performance. [0003] Query expansion is one ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/903
Inventor 亢阳阳李建强田猛孙靖超赵旭莫豪文
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products