Learning text hashing method based on autoencoder

A self-encoder and learning-based technology, applied in the field of computer information processing, can solve the problems of high collision rate and long hashing time of traditional hash functions, achieve low hash collision rate, improve computing time, and improve efficiency

Active Publication Date: 2022-05-27
GUILIN UNIV OF ELECTRONIC TECH
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] What the present invention aims to solve is the problem of high collision rate and long hashing time in traditional hash functions, and provides a learning-type text hashing method based on autoencoder

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Learning text hashing method based on autoencoder
  • Learning text hashing method based on autoencoder
  • Learning text hashing method based on autoencoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to specific examples.

[0019] The learning text hashing method based on autoencoder includes the following steps:

[0020] Step 1. Construct a training data set using real collected text data and / or program-generated text data.

[0021] The process of the text data generated by the program is: first, you need to obtain the number of characters L of the text to be generated, and the number of records N, and specify the path of the file saved in the file. The number of characters is fixed at 128, and the save path is train.txt, then L is 128 and N is 1000. After obtaining the above parameters, prepare a text set, and generate N pieces of text data one by one in a loop. When the N pieces of data are generated, jump out of the loop and save the text data in the text set to the specified path. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a learning-type text hashing method based on an autoencoder. First, a training data set is constructed by using actually collected text data and / or program-generated text data; and then a hash function of a 5-layer autoencoder structure is constructed. model, and use the training data set to train the hash function model; then input the text data to be hashed into the hash function model trained in step 3, and obtain the hash value of the text data to be hashed. The present invention uses a machine learning method to build a learning hash function model to realize the hashing of text data. Compared with the traditional hashing method, it has a lower hash collision rate. It has been greatly improved, which improves the efficiency of text hashing, and can be adapted to hash processing of large-scale text data.

Description

technical field [0001] The invention relates to the technical field of computer information processing, in particular to a learning text hashing method based on an autoencoder. Background technique [0002] Hash (or translated as hash) is a basic technology used for information storage and query. It converts input data of any length through a hash algorithm and compression mapping into a fixed-length output hash value. The hash value is also known as the message digest. Hash algorithms are widely used in many fields, such as database indexing, blockchain, information retrieval, etc. Hash function refers to mapping a large range to a small range, and its purpose is often to save space and make data easy to save. The main object to which the Hash function is applied is an array (eg, a string), and its target is generally an integer type. Generally speaking, Hash functions can be divided into the following categories: mathematical operation Hash, bit operation Hash, table lo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/31G06N3/04G06N3/08
CPCG06N3/08G06F16/325G06N3/045
Inventor 林煜明黄正果李优周娅
Owner GUILIN UNIV OF ELECTRONIC TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products