Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text plagiarism detection method and system

A detection method and detection system technology, applied in the direction of unstructured text data retrieval, text database query, etc., can solve the problems of inaccurate detection of plagiarized content and inability to adapt to a massive text environment, so as to reduce the size of the index, reduce the number, The effect of improving efficiency

Pending Publication Date: 2019-07-16
SHENGTING INFORMATION TECH SHANGHAI
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] In order to overcome the defects that existing text plagiarism detection systems cannot adapt to massive text environments and cannot accurately detect slightly modified plagiarized content, the present invention provides a text plagiarism detection system, which includes:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text plagiarism detection method and system
  • Text plagiarism detection method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to make the technical problems, technical solutions and beneficial effects solved by the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. Refer to attached figure 2 , the present invention provides a kind of text plagiarism detection system, it comprises: text clause module (1), sentence screening and refining module (2), sentence fingerprint extraction module (3), local search engine module (4), plagiarized sentence labeling module (5).

[0031] Wherein, the text clause module (1) is used for dividing a text into several sentences. Sentence module (1) uses all non-Chinese, non-English, and non-number symbols appearing in the text as separators to divide a text into several sentences. Sentence as a unit has a good detec...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text plagiarism detection method and system. According to the method, the number and the length of extracted sentence fingerprints are reduced by deleting short sentences andadopting a truncated character fingerprint mode; the sentence fingerprints are extracted by deleting names, place names, organization names, time and other redundant information in the sentences, accurate detection of slightly-changed plagiarism contents is achieved, for example, the situation that the names, the place names, the organization names and other contents are changed can also be detected, and robustness is enhanced. Compared with a traditional text plagiarism method, the technical scheme provided by the invention greatly reduces the computation burden, improves the detection speed, is more suitable for quickly retrieving the same or similar place of the to-be-detected file and the original text with the copyright in mass (billion level) original texts, and outputs all the plagiarism texts and the corresponding plagiarism degree of the to-be-detected file.

Description

technical field [0001] The invention discloses a text plagiarism detection method and system, and relates to the plagiarism detection of a specific text in a massive text environment. For plagiarism detection in a massive text environment, due to the need to process a large amount of text data and a large number of matching operations, the corresponding method or system needs to meet the requirements of fast, accurate and robust to anti-plagiarism methods. Background technique [0002] In the authorized patent "Anti-plagiarism System and Method for Electronic Homework Based on Paragraph Plagiarism Detection" (Application No. 201310631663.9), a vector is generated for each paragraph by segmenting the word frequency of keywords and other information, and then the cosine function is used to calculate the similarity between. This method can detect plagiarism between paragraphs. If an article plagiarizes multiple articles, all plagiarized articles can be detected. The disadvant...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33
CPCG06F16/3344
Inventor 张亿光郑杰王旭
Owner SHENGTING INFORMATION TECH SHANGHAI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products