Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Retrieval method and method using same for establishing text semantic extraction module

A model and document technology, applied in the field of text semantic extraction model establishment based on implicit semantic analysis, can solve problems such as limited function, high time complexity, and amazing time complexity, and achieve the effect of removing redundancy

Inactive Publication Date: 2011-10-12
无锡科利德斯科技有限公司
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, Wordnet's thesaurus is still very limited in some highly specialized fields, and it is difficult to accurately measure the similarity between related concepts
At the same time, the time complexity of finding the shortest path length between concepts and the depth of the nearest parent node in the tree structure is very high
[0008] The corpus-based method measures the correlation between concepts by calculating the maximum information background (information amount) covered by two concepts in the corpus. The amount of information between concepts is calculated by the instance overlap probability of two concepts and their sub-concepts in the corpus. , this method requires a comprehensive corpus to provide rich information background support, but also leads to an astonishing time complexity of the method
At the same time, because the correlation between concepts is limited by the specific corpus selected, this method is very limited for some expert databases

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Retrieval method and method using same for establishing text semantic extraction module
  • Retrieval method and method using same for establishing text semantic extraction module
  • Retrieval method and method using same for establishing text semantic extraction module

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] When m≤n, according to the above step 2402, perform singular value decomposition on the keyword_document matrix A (m×n), and the matrix generated after the decomposition is the keyword vector matrix U, the diagonal matrix Σ, and the document vector matrix V T , in order to simplify the matrix and highlight the relationship between the dimensions of the matrix, the elements in the matrix are represented by "*", as follows:

[0057]

[0058] According to the above step 1404, the corresponding generated target matrix is:

[0059]

[0060] Assumption D 1 and D 2 are two rows of elements randomly selected from the document_keyword matrix D, C 1 and C 2 is the matrix C respectively with D 1 and D 2 The corresponding two rows of elements can be obtained:

[0061] C 1 =D 1 U (8)

[0062] C 2 =D 2 U (9)

[0063] due to D 1 with D 2 Respectively expressed as {w 1,1 ,w 1,2 ,...,w 1,m} and {w 2,1 ,w 2,2 ,...,w 2,m}, then D 1 with D 2 The inner product o...

Embodiment 2

[0098] When m>n, also according to the above step 2402, the matrix obtained after performing singular value decomposition on the keyword_document matrix A is the keyword vector matrix U, the diagonal matrix Σ, and the document vector matrix V T , similarly, in order to simplify the matrix and highlight the relationship between the dimensions of the matrix, the elements in the matrix are represented by "*", as follows:

[0099]

[0100] When m>n, the present invention only uses the matrix U 1 (m×n) to construct the target matrix C, where U 1 is the economic matrix of the matrix U, and its n is determined by the number of singular values ​​of the matrix Σ, that is to say, n is equivalent to the number of documents in the document set.

[0101] Therefore, when m>n, the target matrix C can be defined as:

[0102] C=DU 1 (29)

[0103] The details are as follows:

[0104]

[0105] It can be seen from formula (30) that when m>n, C is an n×n matrix, and ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a retrieval method, comprising the steps of: representing a database to be retrieved as a document_keyword matrix, wherein the number of rows of the document_keyword matrix is equal to the number n of documents, and the number of columns of the document_keyword matrix is equal to the number m of keywords; generating a target matrix to represent the improved database to be retrieved, wherein the generation process comprises the following procedures of: transposing the document_keyword matrix to form a keyword document matrix, and decomposing the keyword_document matrix into the product of a keyword vector matrix, a diagonal matrix and a document vector matrix by a singular value decomposition algorithm; and selecting the keyword vector matrix and multiplying the document keyword matrix by the keyword vector matrix to set up the target matrix; and retrieving in the improved database to be retrieved which is represented by the target matrix. By using the retrieval method provided by the invention, the retrieval speed and the efficiency are greatly improved.

Description

【Technical field】 [0001] The invention relates to a retrieval method and the establishment of a text semantic extraction model, in particular to a method for establishing a text semantic extraction model based on implicit semantic analysis. 【Background technique】 [0002] With the rapid development of Internet technology, the amount of text information on the Internet has grown exponentially in recent decades. How to quickly and effectively organize and manage a large amount of text information has become the primary challenge of modern information retrieval technology. [0003] figure 1 Schematic diagram of an environment adapted for information retrieval. Please refer to figure 1 As shown, wherein the computer 102 is interconnected with the server 104A through the local area network LAN, and the server 104A is connected with the server 104B or other servers to obtain all network resources from the server 104B to the server 104N, so that the computer 102 can obtain all co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 宋威梁久祯
Owner 无锡科利德斯科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products