Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A text document garbled detection and repair method and system

A text document and garbled code detection technology, applied in memory systems, instruments, computing, etc., can solve problems such as garbled codes that cannot be effectively repaired, and achieve the effect of small errors

Inactive Publication Date: 2017-11-28
NEW FOUNDER HLDG DEV LLC +1
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is that in the prior art, garbled characters are only repaired by converting the encoding format, but this method cannot effectively repair garbled characters caused by damaged text documents, thereby providing a method that can repair damaged text documents. The generated garbled characters are effectively repaired, thereby improving the text document garbled characters detection and repair method and system for user reading experience

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text document garbled detection and repair method and system
  • A text document garbled detection and repair method and system
  • A text document garbled detection and repair method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0081] The text document garbled code detection and repair method described in this embodiment, such as figure 1 shown, including the following steps:

[0082] The step of establishing a coding range library, the coding range library includes coding ranges composed of all character codes in the text document coding format.

[0083] The step of determining the character encoding: according to the encoding format of the text document, the encoding of each character in the text document is obtained.

[0084] The garbled code determination step compares whether each of the codes is within the code interval, and judges the codes not in the code interval as garbled codes, and the codes between the first garbled code and the last garbled code constitute the garbled code interval.

[0085] The garbled code repairing step is to delete some bytes in the garbled code interval that cause the garbled codes, and repair the text document.

[0086] The main reason for the garbled characters...

Embodiment 2

[0092] On the basis of embodiment 1, the text document garbled detection and repair method described in the present embodiment, such as figure 2 As shown, the following steps are also included:

[0093] The step of establishing a dictionary database, which contains commonly used words in different languages.

[0094] In the decoding step, the character encoding of the text document obtained in the garbled character repairing step is decoded to obtain characters.

[0095] In the word segmentation step, a word segmentation operation is performed on the decoded text document to obtain a number of garbled interval words and a number of non-garbled interval words.

[0096] Set the threshold T th A step of.

[0097] Obtain the comparison result step, take out the same number of the garbled interval words and the non-garbled interval words, compare with the commonly used words in the dictionary, and determine the garbled interval words and the non-garbled intervals respectively ...

Embodiment 3

[0105] On the basis of embodiment 1 or embodiment 2, the text document garbled detection and repair method described in the present embodiment, such as image 3 As shown, the garbled character repairing steps further include:

[0106] The byte-by-byte deletion step deletes the bytes that cause the garbled codes in the garbled code interval one by one to form a new garbled code interval.

[0107] The second comparing and judging step is to judge whether the codes in the new garbled code interval are all in the coding interval, if so, the restoration is completed, otherwise return to the byte-by-byte deletion step until the restoration is completed.

[0108] In the byte-by-byte deletion step, the total number of deleted bytes is less than the number of bytes corresponding to the character code.

[0109] Because the code that is destroyed must be located at the initial position of the garbled interval, therefore, from the initial position of the garbled interval, one byte is del...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and system for detecting and repairing garbled characters in text documents. By establishing a code interval library including code intervals composed of all character codes in a text document code format, and determining the code corresponding to each character according to the code format of the text document The number of bytes, and obtain the code of each character in the text document accordingly, and compare whether each code is within the code range, if not, judge it as garbled code, and determine the code composition between the first garbled code and the last garbled code In the garbled code section, delete several bytes caused by garbled codes in the garbled code section, and repair the text document. In the present invention, by deleting the bytes of the random code interval, the codes of the random code interval after byte deletion all fall within the coding range, thereby effectively repairing the damaged text document. Compared with the current situation that the text document cannot be effectively repaired by only converting the encoding format to repair the text document, it has been greatly improved.

Description

technical field [0001] The invention relates to a method and system for detecting and repairing garbled characters, in particular to a method and system for detecting and repairing garbled characters in text documents, and belongs to the technical field of word processing. Background technique [0002] Garbled characters (such as garbled codes on webpages, documents, etc.) is a problem that often plagues terminal device users when reading, and affects the user's reading experience. The reason for the garbled characters is that the terminal device system lacks support for certain characters, resulting in confusing characters that cannot be read normally. [0003] At present, there are two commonly used recovery methods for garbled characters. One is to manually adjust the settings of the region and language options on the computer control panel; However, these two garbled recovery methods all have the following problems. The first is that users need to perform manual operati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/45
Inventor 童征宇丁力张鹏
Owner NEW FOUNDER HLDG DEV LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products