A text document garbled detection and repair method and system
A text document and garbled code detection technology, applied in memory systems, instruments, computing, etc., can solve problems such as garbled codes that cannot be effectively repaired, and achieve the effect of small errors
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0081] The text document garbled code detection and repair method described in this embodiment, such as figure 1 shown, including the following steps:
[0082] The step of establishing a coding range library, the coding range library includes coding ranges composed of all character codes in the text document coding format.
[0083] The step of determining the character encoding: according to the encoding format of the text document, the encoding of each character in the text document is obtained.
[0084] The garbled code determination step compares whether each of the codes is within the code interval, and judges the codes not in the code interval as garbled codes, and the codes between the first garbled code and the last garbled code constitute the garbled code interval.
[0085] The garbled code repairing step is to delete some bytes in the garbled code interval that cause the garbled codes, and repair the text document.
[0086] The main reason for the garbled characters...
Embodiment 2
[0092] On the basis of embodiment 1, the text document garbled detection and repair method described in the present embodiment, such as figure 2 As shown, the following steps are also included:
[0093] The step of establishing a dictionary database, which contains commonly used words in different languages.
[0094] In the decoding step, the character encoding of the text document obtained in the garbled character repairing step is decoded to obtain characters.
[0095] In the word segmentation step, a word segmentation operation is performed on the decoded text document to obtain a number of garbled interval words and a number of non-garbled interval words.
[0096] Set the threshold T th A step of.
[0097] Obtain the comparison result step, take out the same number of the garbled interval words and the non-garbled interval words, compare with the commonly used words in the dictionary, and determine the garbled interval words and the non-garbled intervals respectively ...
Embodiment 3
[0105] On the basis of embodiment 1 or embodiment 2, the text document garbled detection and repair method described in the present embodiment, such as image 3 As shown, the garbled character repairing steps further include:
[0106] The byte-by-byte deletion step deletes the bytes that cause the garbled codes in the garbled code interval one by one to form a new garbled code interval.
[0107] The second comparing and judging step is to judge whether the codes in the new garbled code interval are all in the coding interval, if so, the restoration is completed, otherwise return to the byte-by-byte deletion step until the restoration is completed.
[0108] In the byte-by-byte deletion step, the total number of deleted bytes is less than the number of bytes corresponding to the character code.
[0109] Because the code that is destroyed must be located at the initial position of the garbled interval, therefore, from the initial position of the garbled interval, one byte is del...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com