Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese text compression method

A text compression, Chinese technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of reduced length, different Chinese, etc., to achieve the effect of reducing hardware costs, taking into account the ability and high compression rate

Inactive Publication Date: 2015-03-25
LAUNCH TECH CO LTD
View PDF4 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The length of the compressed code is significantly reduced, and such a code is more likely to appear in the compression of documents with more proper nouns
[0010] However, Chinese is different from English, there are no signs such as spaces to distinguish words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text compression method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] In order to further illustrate the technical means adopted by the present invention and its effects, the following describes in detail in conjunction with preferred embodiments of the present invention and accompanying drawings.

[0030] see figure 1 , the present invention provides a kind of Chinese text compression method, comprises the following steps:

[0031] Step 10, set up coding dictionary, comprise:

[0032] Step 101, perform word segmentation on the Chinese text, divide the Chinese text into multiple Chinese phrases, and the Chinese phrases are composed of multiple Chinese characters.

[0033] Step 102, word frequency statistics;

[0034] Step 103, uniformly encode the phrases and other characters of different word frequencies with a Huffman code to obtain a Huffman binary tree, and establish a coding dictionary through the Huffman binary tree;

[0035] Wherein in the encoding process, the phrase with high word frequency is represented with less bits, and t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a Chinese text compression method. According to the characteristics of a Chinese text, a dictionary compression algorithm is combined with Huffman coding, word segmentation is conducted on the Chinese text, the Chinese text is segmented into a plurality of Chinese word groups, word frequencies are counted, the word groups with high word frequencies are expressed with fewer bits, and the word groups with low word frequencies are expressed with more bits, so that the Chinese text is compressed at a high compression ratio, and the capacity of a processor and memory overhead are both considered; the hardware cost is lowered accordingly.

Description

technical field [0001] The invention relates to the field of data storage, in particular to a Chinese text compression method. Background technique [0002] When dealing with Chinese text, it is often encountered that the Chinese text is very large, and the external flash memory (flash) is no longer enough to store the text, and directly replacing the hardware will increase the cost. On the premise of not replacing the hardware, to solve the above problems, the Chinese text needs to be compressed. Commonly used large-scale compression algorithms require high processing power of the processor and cannot be applied to all situations. There are a large number of repetitive Chinese phrases in Chinese text, so there is a large amount of redundant content. However, the existing Huffman compression algorithm has a small compression rate, which can only compress about one-third. Therefore, it is necessary to provide a Chinese text compression method with a high compression rate th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H03M7/30G06F17/22
Inventor 刘均杨向辉
Owner LAUNCH TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products