Text similarity determination method and device

A technology of text similarity and determination method, applied in the field of text similarity determination method and device, can solve the problems of inaccurate text similarity, increase labor cost, reduce work efficiency, etc.

Pending Publication Date: 2020-05-12
BEIJING MININGLAMP SOFTWARE SYST CO LTD
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current text similarity determination algorithms are all aimed at short texts consisting of one or several words. When the text includes many words, the text similarity obtained according to the method in the prior art is inaccurate, which reduces work efficiency. , increased labor costs
[0003] Based on the above technical problems, this application provides a text similarity determination method and device to solve the problem in the prior art that the similarity of long texts containing many words cannot be accurately reflected

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity determination method and device
  • Text similarity determination method and device
  • Text similarity determination method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] In order to make the purpose, technical solution and advantages of the present invention more clear, the embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined arbitrarily with each other.

[0049] The steps shown in the flowcharts of the figures may be performed in a computer system, such as a set of computer-executable instructions. Also, although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.

[0050] Word2vec is an important technology for measuring the meaning of words. It converts each word into a vector, which is also called a word vector, and the distance between the word vectors corresponding to words with similar semantics is relatively cl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text similarity determination method and device. The method comprises the following steps: 1, preparing raw materials; performing word segmentation on the first text and the second text to obtain a first word segmentation set and a second word segmentation set, in a preset word vector library, searching for word vectors corresponding to all segmented words in the first segmented word set and the second segmented word set according to the corresponding relation between the word vectors and the segmented words, and obtaining a first word vector set and a second word vector set respectively; when it is determined that the first word vector set and the second word vector set are not empty sets, the maximum cosine distance between the first word vector set and the second word vector set is calculated; and determining the similarity between the first text and the second text according to the maximum cosine distance between the first word vector set and the second word vector set and preset word frequency information of the first word vector set and the second word vector set.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for determining text similarity. Background technique [0002] During the rapid development of the Internet, text data has shown explosive growth, and more and more application scenarios need to analyze and mine text data. Short text similarity is one of the text mining tasks, which plays an important role in the fields of search, product recommendation and intelligent question answering. Short texts are generally composed of several phrases or sentences, and the average number of words is generally around tens. Since there are many synonyms in Chinese and there are many words with similar semantics, it is particularly important to measure the similarity of short texts. The current text similarity determination algorithms are all aimed at short texts consisting of one or several words. When the text includes many words, the text similarity obtained accordi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/289G06F40/216G06F40/247G06F40/30G06K9/62
CPCG06F18/22Y02D10/00
Inventor 张文剑牟小峰
Owner BEIJING MININGLAMP SOFTWARE SYST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products