Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Electronic text document plagiarism recognition method based on similar string matching distance

An electronic text and recognition method technology, applied in the fields of electronic digital data processing, special data processing applications, instruments, etc., can solve problems such as large amount of calculation, sensitive local features, and easy misjudgment.

Inactive Publication Date: 2010-04-14
WENZHOU UNIVERSITY
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The advantage of the statistical model is that it has strong anti-noise ability, but the disadvantage is that it is easy to misjudge and fail to recognize non-uniform plagiarism actions
The advantage of the matching model is that it has a strong ability to identify structural plagiarism and high recognition accuracy. The disadvantage is that it is sensitive to local features and requires a large amount of calculation.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Electronic text document plagiarism recognition method based on similar string matching distance
  • Electronic text document plagiarism recognition method based on similar string matching distance
  • Electronic text document plagiarism recognition method based on similar string matching distance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] refer to figure 1 , which is a structural diagram of a preferred embodiment of the present invention. The system includes a document reading device 101, a document segmenting device 102, a plagiarism identifier 103, a document storage 104, and an output device 105, wherein the document reading device 101, the document segmentation device 102, and the plagiarism identifier 103 are respectively connected to the document storage 104 , the plagiarism recognizer 103 is connected to the output device 105 . The document reading device 101 reads a number of electronic text documents from this computer system or other computer systems or the Internet, and then sends them to the document storage 104, and the document segmenting and paragraphing device 102 performs paragraphing on the document A and document B to be identified in the document storage 104 Dividing is actually inserting some paragraph delimiters. In this embodiment, the carriage return character in the document is ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for identifying plagiarism of an electronic text document. The method mainly identifies the plagiarism through the approximate string matching distance of a subparagraph. The method to identify whether a document A plagiarizes a document B comprises the following specific steps: firstly, the approximate string matching distance and an approximate matching segmentof each paragraph of the document A in the document B are calculated; secondly, according to the approximate matching segment, the retroversion number and the forward jumping number are calculated; theretroversion number refers to the number of generation that the head part of the next approximate matching segment is positioned before the tail part of the last approximate matching segment or theto tal number of passing segments; the forward jumping number refers to the number of generation that the next approximate matching segment is behind the last approximate matching segment and at leasthasdistance of one segment with the last approximate matching segment or the total number of the alternate segments; and finally, the sum of the approximate string matching distance, the retroversionnum ber and the forward jumping number are summed; the sum is taken as the plagiarism distance of the document A to the document; and if the distance is less than certain threshold value, the documentA issuspected of plagiarizing the document B.

Description

technical field [0001] The invention belongs to the field of intelligent information processing and computer technology, in particular to a method for identifying plagiarism of electronic text documents by using a computer system. Background technique [0002] The use of electronic documents is becoming more and more common, such as students submitting experimental reports, computer programs, and electronic homework, unit employees submit work summaries, political learning experience, and scientific researchers write papers and reports. Since the plagiarism of electronic documents has the characteristics of convenience and speed, no trace of copying, and no need to submit in person, the problem of plagiarism has become more and more prominent in today's popularization of computer networks and office automation software, and it is easy to occur electronic plagiarism. Completion, misconduct in scientific research, coping with higher-level tasks, etc. The identification of pla...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/22
Inventor 胡明晓
Owner WENZHOU UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products