Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Cross-linguistic electronic text plagiarism detection system and detection method

A technology of electronic text and detection method, which is applied in the field of intelligent information processing and computer, and can solve problems such as mistranslation, affecting the quality of text copy detection, and replacement of synonyms

Active Publication Date: 2014-05-28
XI AN JIAOTONG UNIV
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the problem with this approach is that the quality of machine translation can have a critical impact on detection results
At present, the accuracy of machine translation for large paragraphs of text is still very poor
There is a huge gap between machine translation quality and human translation quality
Therefore, although machine translation converts different language texts into the same language text, there will be some wrong translations, synonym substitutions and order reversals
These errors greatly affect the quality of subsequent text copy detection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cross-linguistic electronic text plagiarism detection system and detection method
  • Cross-linguistic electronic text plagiarism detection system and detection method
  • Cross-linguistic electronic text plagiarism detection system and detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0054] The present invention will be described in detail below in conjunction with the accompanying drawings.

[0055] The invention provides a cross-language electronic text plagiarism detection method, comprising the following steps:

[0056] Step 1, the electronic text to be tested and the reference electronic text are divided into paragraphs respectively, and the paragraph set to be tested and the reference paragraph set are obtained;

[0057]Specifically, it includes converting the input electronic text to be tested and the reference electronic text into a unified encoding format, such as UTF-8 format. The electronic texts to be tested are such as Chinese, English, French, German, Russian, Japanese, and Spanish. or natural language text in other languages, rather than audio, video, pictures and other information. The text to be tested and the reference text are natural language texts in different languages, rather than natural language texts in a single language.

[005...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a cross-linguistic electronic text plagiarism detection system and detection method. The cross-linguistic electronic text plagiarism detection method comprises the steps that paragraph division is carried out on an electronic text to be detected and a reference electronic text respectively to obtain a paragraph set to be detected and a reference paragraph set; concepts corresponding to terms in the paragraph set to be detected and the reference paragraph set are searched for according to a cross-linguistic body, and the paragraph set to be detected and the reference paragraph set are expressed as a multiple-concept sequence to be detected and a reference multiple-concept sequence according to the found concepts; the reference multiple-concept sequence having the most common concepts with the multiple-concept sequence to be detected is obtained through searching according to the multiple-concept sequence to be detected; the multiple-concept sequences are detected to generate a plagiarism evidence list; the plagiarism evidence list is combined and ordered to generate a detection result; the detection result is output and displayed. By means of the cross-linguistic electronic text plagiarism detection system and detection method, the built multiple-concept sequences can sufficiently search the electronic text to be detected and the reference electronic text, and further the detection accuracy is improved.

Description

technical field [0001] The invention belongs to the field of intelligent information processing and computer technology, and in particular relates to a cross-language electronic text plagiarism detection system and a detection method thereof. Background technique [0002] With the rapid development of information technology, there are massive electronic texts on the Internet, and the number is still increasing. Protecting the intellectual property rights of electronic texts has become the consensus of all walks of life at home and abroad. Text copy detection, also known as text plagiarism detection, is a technology for judging whether a text copies one or more other texts, and provides technical support for protecting the intellectual property rights of electronic texts. With the deepening of internationalization, the copying of texts is not limited to a single language, and the text copying of cross-language translation is also very common. Therefore, cross-language text ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/334G06F40/247
Inventor 鲍军鹏张昭
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products