Rank-Based Text Matching Method for Plagiarism Detection

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A matching method and text technology, which can be used in unstructured text data retrieval, text database clustering/classification, semantic analysis, etc., can solve problems such as poor detection performance, and achieve the effect of improved statistical significance and good performance.

Active Publication Date: 2021-09-03

HEILONGJIANG INST OF TECH

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The purpose of the present invention is to provide a text matching method for plagiarism detection based on sorting, in order to solve the problem of relying on expert experience based on heuristic methods, resulting in poor detection performance

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0068] Such as Figure 1 to Figure 2 As shown, this embodiment is specifically described as follows for the sorting-based plagiarism detection text matching method:

[0069] 1 about plagiarism

[0070] Generally, plagiarism can be divided into low-ambiguity plagiarism (such as full copy, partial copy, simple modification) and high-ambiguity plagiarism (including paraphrase plagiarism, summary plagiarism, cross-language plagiarism, etc.) (Alzahrani et al., 2012). The low performance of high-fuzzy plagiarism detection is the biggest problem in plagiarism detection at present, and heuristic methods are far from achieving satisfactory performance on high-fuzzy plagiarism detection. The main reason is that the vocabulary of highly fuzzy plagiarized text is quite different from that of the source text, and the number of vocabulary matches is very small, so it is difficult to accurately identify plagiarized matches.

[0071] 2 Analysis of Plagiarism Matching Problems

[0072] To i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a text matching method for plagiarism detection based on sorting, and relates to the technical field of plagiarism detection. In order to realize the detection of highly fuzzy plagiarism, the present invention solves the problem that the heuristic method relies on expert experience and cannot integrate various effective features in plagiarism detection. Formalizing plagiarized text matching as a ranking task, given a suspicious text segment, the method applies a sequence-based ranking learning method to obtain the most likely plagiarized segment of the segment in the source document. The present invention introduces the evaluation index METEOR of machine translation to capture lexical similarity and semantic similarity. The method is evaluated on the PAN2012 and PAN 2013 plagiarism detection datasets and compared with the best performing method in the PAN2013, 2013 and 2014 evaluations. On the high fuzzy plagiarism and summary plagiarism subsets, the present invention improves the evaluation index Plagdet by 22% and 43% respectively compared with the baseline method. The time efficiency of the inventive method is also better than the baseline method.

Description

technical field [0001] The invention relates to a text matching method for plagiarism detection and relates to the technical field of plagiarism detection. Background technique [0002] Plagiarism text matching is the core task of plagiarism detection, which is dedicated to obtaining plagiarized fragments that match a suspicious document with the source document it plagiarized (Potthast et al., 2012a; 2013a; 2014). Researchers have done a lot of work on plagiarized text matching, most of which are based on heuristic methods, using words or characters to represent suspicious documents and plagiarized source documents, and then by calculating the overlapping characters and words in suspicious documents and source document fragments, Or identify exact or likely plagiarism matches by similarity of text vectors. [0003] Such methods achieve good performance on low-ambiguity plagiarism detection, but unsatisfactory performance on high-fuzzy plagiarism detection. For example, ta...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F40/30G06F16/35

CPCG06F40/30

Inventor 孔蕾蕾韩中元齐浩亮

Owner HEILONGJIANG INST OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Rank-Based Text Matching Method for Plagiarism Detection

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology