Learning type text hashing method based on auto-encoder

A self-encoder and learning-based technology, applied in the field of computer information processing, can solve problems such as long hashing time and high collision rate of traditional hash functions, and achieve the effects of improved computing time, low hash collision rate, and improved efficiency

Active Publication Date: 2021-09-28
GUILIN UNIV OF ELECTRONIC TECH
View PDF12 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] What the present invention aims to solve is the problem of high collision rate and long hashing time in traditional hash functions, and provides a learning-type text hashing method based on autoencoder

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Learning type text hashing method based on auto-encoder
  • Learning type text hashing method based on auto-encoder
  • Learning type text hashing method based on auto-encoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific examples.

[0019] A learning-type text hashing method based on an autoencoder, which specifically includes the following steps:

[0020] Step 1. Construct a training data set using actually collected text data and / or program-generated text data.

[0021] The process of the text data generated by the program is as follows: firstly, it is necessary to obtain the number of characters L of the text to be generated, and the number of records N, and specify the path of the file where the file is saved. For example, the training set to be generated consists of 1000 short texts, each The number of characters is fixed at 128, and the save path is train.txt, then L is 128 and N is 1000. After obtaining the above parameters, prepare a text collection and generate N pieces of text data one by ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a learning type text hashing method based on an auto-encoder. The method comprises the following steps: firstly, constructing a training data set by utilizing actually collected text data and / or text data generated by a program; constructing a Hash function model of a five-layer auto-encoder structure, and training the Hash function model by using the training data set; and inputting the to-be-hashed text data into the Hash function model trained in the step 3 to obtain a Hash value of the to-be-hashed text data. According to the method, the machine learning method is used, the learning type hash function model is constructed to achieve hash of the text type data, compared with a traditional hash method, the method has the advantages that the hash conflict rate is low and improved hash operation time, improves text hash efficiency, and can be suitable for hash processing of large-scale text data.

Description

technical field [0001] The invention relates to the technical field of computer information processing, in particular to a learning-type text hashing method based on an autoencoder. Background technique [0002] Hash (Hash, or translated as hash) is a basic technology used in information storage and query. It converts input data of any length into a fixed-length output hash value through a hash algorithm and compressed mapping. The output The hash value is also known as the message digest. Hash algorithm has a wide range of applications in many fields, such as database indexing, blockchain, information retrieval, etc. The Hash function refers to mapping a large range to a small range, and its purpose is often to save space and make data easy to save. The main object of the Hash function application is an array (for example, a string), and its target is generally an integer type. Generally speaking, Hash functions can be divided into the following categories: mathematical ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N3/04G06N3/08G06F16/31
CPCG06N3/08G06F16/325G06N3/045
Inventor 林煜明黄正果李优周娅
Owner GUILIN UNIV OF ELECTRONIC TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products