Text similarity calculation deduplication method and system, medium and equipment

A text similarity, text technology, applied in computing, computer parts, instruments, etc., can solve the problems of insufficient accuracy, lack of pertinence, excessive noise, etc., to achieve enhanced features, good deduplication effect, similarity calculation Accurate results

Pending Publication Date: 2022-07-29
西安金讯通软件技术有限公司
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In recent years, text similarity calculation methods have been continuously proposed, but most of these methods are widely used in various fields, with insufficient accuracy and lack of pertinence. For the hotline field, these methods will introduce too much noise and the effect Not very ideal, and, in the hotline field where text similarity plays a very important role, there is currently no ideal text similarity calculation method applied, and a good deduplication effect cannot be achieved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity calculation deduplication method and system, medium and equipment
  • Text similarity calculation deduplication method and system, medium and equipment
  • Text similarity calculation deduplication method and system, medium and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

[0050] In the description of the present invention, it is to be understood that the terms "comprising" and "comprising" indicate the presence of the described features, integers, steps, operations, elements and / or components, but do not exclude one or more other features, The existence or addition of a whole, step, operation, element, component, and / or a collection thereof.

[0051] It should also be understood that the terminology used in t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text similarity calculation deduplication method and system, a medium and equipment, and the method comprises the steps: importing a hotline text data set into a database, inputting the hotline text data set into a trained model, carrying out the feature extraction, feature fusion and feature enhancement, obtaining the final features of the data, and carrying out the serialization storage; inputting to-be-calculated data into the model to extract final features; carrying out cosine similarity calculation on the hot line text data and the hot line text data in the corresponding date to obtain a similarity result; and outputting a similarity result of the ranking top50, and performing duplicate removal. The method is oriented to the government affair service convenience hotline, the final sentence features of the hotline text content are extracted through the model for similarity calculation and deduplication, an external calling way is further provided, different model functions can be achieved through four interface operation models, operation is convenient, practicability is high, and the method is suitable for the hotline field.

Description

technical field [0001] The invention belongs to the technical field of text similarity, and in particular relates to a method, system, medium and device for calculating and deduplicating text similarity. Background technique [0002] In the era of intelligence, natural language processing technology has developed rapidly and has been widely used. In the application field, it often involves how to measure and calculate the similarity between two texts. Text similarity is widely used in many Domains, such as knowledge retrieval, text duplication detection, automatic generation of text summaries, recommender systems, text classification, machine translation, etc. Text similarity is a very important and indispensable technique in natural language processing. [0003] Text similarity refers to the similarity between two texts. The text is in a high-dimensional semantic space, and it needs to be abstractly decomposed to be able to quantify the similarity between them from a mathe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F40/30G06F40/295G06F40/242G06F16/335
CPCG06F40/295G06F40/242G06F40/30G06F16/335G06F18/22G06F18/253
Inventor 韩召宁赵国帅罗一玲钱学明
Owner 西安金讯通软件技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products