Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text duplicate checking method and device, electronic equipment and storage medium

A text and original text technology, applied in electronic digital data processing, instruments, calculations, etc., can solve the problems of low accuracy and low efficiency of text duplication checking, and achieve the effect of reducing the amount of calculation, improving the accuracy and improving the efficiency.

Pending Publication Date: 2022-05-06
BEIJING QIANXIN TECH +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a text plagiarism check method, device, electronic equipment and storage medium, which are used to solve the defects of low accuracy and low efficiency of text plagiarism check in the prior art, and improve the accuracy and efficiency of text plagiarism check

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text duplicate checking method and device, electronic equipment and storage medium
  • Text duplicate checking method and device, electronic equipment and storage medium
  • Text duplicate checking method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0037] In order to make the purpose, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the present invention. Obviously, the described embodiments are part of the embodiments of the present invention , but not all examples. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0038] Text plagiarism checking is an application of text similarity calculation. For two given texts, text similarity calculation aims to measure how similar the two texts are in semantics. Generally, the smaller the semantic similarity value of the text, the greater the semantic difference between the two texts, that is, the less similar at the semantic level; on the contrary, the larger t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text duplicate checking method and device, electronic equipment and a storage medium, and the method comprises the steps: obtaining an original text data set, building a synonym library and a word weight library based on the original text data set, and enabling the synonym library to correspond to the word weight library; on the basis of a synonym library, performing synonym replacement on feature words of each original text in the original text data set to obtain a replaced text; based on the word weight library, performing fingerprint extraction on the replaced text to obtain a fingerprint about the replaced text; based on the fingerprint of the replaced text and the original text corresponding to the replaced text, a fingerprint database about the original text is constructed, and the fingerprint of the replaced text corresponds to the original text; and determining a fingerprint of the to-be-duplicated text, and performing duplicate checking on the to-be-duplicated text based on the fingerprint database and the fingerprint of the to-be-duplicated text. According to the method and the device, the text duplicate checking accuracy and efficiency are improved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a text plagiarism checking method, device, electronic equipment and storage medium. Background technique [0002] It is known from related technologies that for large-scale text similarity calculation applications, fingerprints are often used to store text, and then similarity calculation is performed based on the comparison between fingerprints, so as to achieve the purpose of text plagiarism checking. [0003] Currently, byte data can be used to store fingerprints, and the distance between fingerprints can be quickly calculated through XOR operations. In this way, the fingerprint storage space can be reduced, and the efficiency of text duplicate checking can be improved. However, in the process of extracting fingerprints from text, because the semantics of the text is not considered, the extracted fingerprints are different, which leads to a decrease in the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/289G06F40/216G06F40/30
CPCG06F40/289G06F40/216G06F40/30
Inventor 郭峰沈矗杨宇轩
Owner BEIJING QIANXIN TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products