Statistical method and system for text similarity
A technology of text similarity and statistical methods, applied in computing, special data processing applications, instruments, etc., can solve problems such as difficult to accurately reflect the degree of similarity
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0020] figure 1 It is a flowchart of a statistical method for text similarity in an embodiment, including the following steps:
[0021] S110. Acquire text T1 and text T2 for which similarity needs to be determined.
[0022] S120. Separate the text T1 and the text T2 into several natural segments, compare all the natural segments in the text T1 with all the natural segments in the text T2, and record the number of identical natural segments as k3.
[0023] In this embodiment, the number of natural paragraphs in the text T1 is recorded as k1, and the number of natural paragraphs in the text T2 is recorded as k2. i ranges from 1 to k1, j ranges from 1 to k2, compare whether paragraph i of text T1 is the same as paragraph j of text T2, and record the number of identical natural paragraphs as k3.
[0024] S130, delete the same natural segment from the text T1 and the text T2, the text T1 is deleted to obtain the text T3, and the text T2 is deleted to obtain the text T4.
[0025]...
Embodiment 2
[0053] S210. Acquire text T1 and text T2 for which similarity needs to be determined.
[0054] S220. Separate the text T1 and the text T2 into several natural segments, compare all the natural segments in the text T1 with all the natural segments in the text T2, and record the number of identical natural segments as k3.
[0055] In this embodiment, the number of natural paragraphs in the text T1 is recorded as k1, and the number of natural paragraphs in the text T2 is recorded as k2. i ranges from 1 to k1, j ranges from 1 to k2, compare whether paragraph i of text T1 is the same as paragraph j of text T2, and record the number of identical natural paragraphs as k3.
[0056] S230, delete the same natural segment from the text T1 and the text T2, the text T1 is deleted to obtain the text T3, and the text T2 is deleted to obtain the text T4.
[0057] S240. Separate the text T3 and the text T4 into several words, compare all the words in the text T3 with all the words in the text...
Embodiment 3
[0069] S310. Acquire text T1 and text T2 for which similarity needs to be determined.
[0070] S320. Separate the text T1 and the text T2 into several sentences, compare all the sentences in the text T1 with all the sentences in the text T2, and record the number of identical sentences as k3.
[0071] In this embodiment, the number of sentences in the text T1 is denoted as k1, and the number of sentences in the text T2 is denoted as k2. i is from 1 to k1, j is from 1 to k2, compare whether the i-th sentence of the text T1 is the same as the j-th sentence of the text T2, and record the number of identical sentences as k3.
[0072] S330, delete the same sentence from the text T1 and the text T2, the text T1 is deleted to obtain the text T3, and the text T2 is deleted to obtain the text T4.
[0073] S340. Separate the text T3 and the text T4 into several words, compare all the words in the text T3 with all the words in the text T4, and record the number of identical words as k6....
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com