Text characteristic extracting method and document copy detection system based on sentence level
A feature extraction and subsystem technology, applied in the field of copy detection, can solve problems such as inability to detect copy detection methods
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0046] Suppose there are two papers in the document set, named P1 and P2. Among them, the third paragraph in P2 is copied from the second paragraph in P1. The range of this paragraph in P1 is S3-S5, and that in P2 is S6-S8. Then the document collection D is divided into two separate documents P1 and P2 after being input into the document reading subsystem; and the two documents are divided into a collection of sentences after being input into the sentence breaking system; the feature extraction subsystem extracts the sentences from the text Represents a set converted into a feature vector and added to the inverted index; the copy detection subsystem uses the inverted index to perform copy detection, and finds the following sentence pairs that are copies of each other (P1S3, P2S6), (P1S4, P2S7) , (P1S5, P2S8); after the sequence matching subsystem arranges the above copy pairs, it outputs (P1[S3-S5], P2[S6-S8]), that is, the third sentence to the fifth sentence in P1 in the doc...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com