Text compression method

A text compression and dictionary compression technology, applied in electrical components, code conversion, etc., can solve the problems of low compression rate and weak adaptability, and achieve the effect of high compression rate, saving memory space and reducing cost.

Pending Publication Date: 2020-04-17
HARBIN UNIV OF SCI & TECH
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The existing Huffman compression algorithm has a small compression rate, requires strong statistical characteristics, and weak adaptability. Therefore, it is necessary to provide a text compression method w

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text compression method
  • Text compression method
  • Text compression method

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0018] The concrete steps of method of the present invention are as follows:

[0019] Step a, converting the source file into a binary file, using dictionary compression, and using a hash table as the entry storage structure;

[0020] Step b, perform unified encoding according to the result of dictionary compression output;

[0021] Step c, the file is operated on the hardware platform according to the encoding dictionary.

specific Embodiment approach 2

[0023] Different from Embodiment 1, in the text compression method of this embodiment, step a converts the source file into a binary file, and defines:

[0024] src is used to store source files;

[0025] include is used to store header files;

[0026] lib is used to store compiled library files;

[0027] bin is used to store compiled executable binary files;

[0028] Include the following steps:

[0029] Step a11, create a new folder and name it code;

[0030] Step a12, create subdirectories under the code directory, name them src, include, lib, bin respectively, and create a cmake compilation file under the code directory, name it CMakeLists.txt, and write in it, including the following steps:

[0031] Step a121, setting the cmake version and project name used;

[0032] Step a122, setting the directory where the compiler and the compiled executable binary file are generated, and setting the directory as a link directory;

[0033] Step a123, setting the directory where ...

specific Embodiment approach 3

[0050] The difference from Embodiment 1 or Embodiment 2 is that in the text compression method of this embodiment, the finite state entropy coding described in step b uses one number to store information before and after compression, saving memory space, and uses decimal bits to record information. Finite state encoding requires only one natural number, the state, to keep track of its current position. This is done by distributing the symbols uniformly rather than in a range, i.e. putting the information in the least significant position, the state x ∈ N contains equal to log2(x) bits of information, furthermore, no multiplication / division is required to update the state, when we are in Change the rules when processing symbol s:

[0051]

[0052] Include the following steps:

[0053] Step b1, creating a coding table according to the probability distribution of symbols in letters;

[0054] Step b2, shorten the execution time, generate three variables for each symbol, namel...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text compression method, which belongs to an application of a dictionary compression algorithm. The method is based on redundancy and repeatability of texts. The method is implemented on a hardware platform by combining finite-state entropy coding on the basis of a dictionary compression algorithm. Binary file conversion is carried out on the text and dictionary compression of entries is stored by adopting a hash table; finite state entropy coding is used, information before and after compression is stored by using a number, the memory space is saved, and informationis recorded by using a decimal bit, so that the compression rate is high, the self-adaptation is strong, the requirements on the capacity of a processor and the memory overhead are low, and the cost is reduced.

Description

technical field [0001] The invention relates to the field of data compression, in particular to a text compression method. Background technique [0002] In the field of data compression, in view of the problems of large-capacity storage and lossless transmission in the high-speed data acquisition system, there are problems such as large space occupation and slow transmission speed, using the redundant relationship of information between massive text data and big data processing technology to generate Encoding dictionary for efficient storage and transmission of massive text. To solve the above problems, you need to compress the text. The existing Huffman compression algorithm has a small compression rate, requires strong statistical characteristics, and weak adaptability. Therefore, it is necessary to provide a text compression method with a high compression rate that takes into account processor capabilities and memory overhead, and change the traditional compression algor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H03M7/30
CPCH03M7/3059
Inventor 陈宝远叶洪娜
Owner HARBIN UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products