Method and device for detecting document

A detection method and detection device technology, applied in special data processing applications, instruments, electrical digital data processing, etc., which can solve problems such as easy omissions, affecting the work efficiency of servers or computers, and high processing pressure on servers or computers

Active Publication Date: 2011-08-17
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF1 Cites 28 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, such an approach has the following disadvantages: First, through the query of the title, author, and word information of the document, omissions are likely to occur, for example, modify or delete the title and author information of the document, or divide the document into multiple parts , so that it is impossible to accurately q

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for detecting document
  • Method and device for detecting document
  • Method and device for detecting document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0108] The present invention will be described in detail below in conjunction with various embodiments shown in the drawings. However, these embodiments do not limit the present invention, and any structural, method, or functional changes made by those skilled in the art according to these embodiments are included in the protection scope of the present invention.

[0109] Such as figure 1 As shown, in an embodiment of the present invention, the document detection method includes:

[0110] S1. Obtain the paragraph feature information corresponding to the document; wherein, the document refers to an electronic file with text as the main body. Preferably, in this embodiment, the document can also be an electronic file that can be edited, for example txt files, doc files, etc. By identifying the line break in the electronic file, the paragraph information of the document can be obtained, and the document can be divided into one or more paragraphs. In the best embodiment of the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for detecting a document. The method comprises the following steps of: acquiring paragraph characteristic information corresponding to the document; comparing the paragraph characteristic information of the document with paragraph characteristic information of at least one existing document; and judging whether the existing document similar to the document is provided according to a comparison result. The document can be detected by using the paragraph characteristic information, so the similarities of the documents can be compared accurately, and cheating on sectional processing of the document is avoided; moreover, the checking efficiency is higher and the pressing pressure of a server is lower; the method for detecting the document can be used for improving the online document copyright property detection, and detecting the document when the document is uploaded, so that the following document copyright property detection which causes unnecessary pressure on the server can be avoided; and the copyright property detection of the existing documents can be processed in mass, so the efficiency is higher.

Description

technical field [0001] The present invention relates to a document detection method and device, in particular to a document detection method and device for comparing the similarity of long documents. Background technique [0002] Usually, the document detection method used for document similarity is to confirm through the title, author and word information of the document. However, such an approach has the following disadvantages: First, through the query of the title, author, and word information of the document, omissions are likely to occur, for example, modify or delete the title and author information of the document, or divide the document into multiple parts , so that it is impossible to accurately query or compare other documents through the word information; secondly, if the document to be queried is long, such as a novel, the query is performed through the word information, and the query efficiency is low, and the server or computer processing The pressure is high...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/22
Inventor 周纾李彦宏徐兴军张雯
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products