Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

124 results about "Locality-sensitive hashing" patented technology

In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets are much smaller than the universe of possible input items.) Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

System and method for efficiently finding near-similar images in massive databases

Massive amounts of multimedia data are stored in databases supporting web pages and servers, including text, graphics, video and audio. Searching and finding matching multimedia images can be time and computationally intensive. A method for storing and retrieving image data includes computing a descriptor, such an a Fourier-Mellin Transform (FMT), corresponding to a multidimensional space indicative of each of the stored images and organizing each of the descriptors according to a set similarity metric. The set similarity metric is based on Locality-Sensitive Hashing (LSH), and orders descriptors near to other descriptors in the database. The set similarity metric employs set theory which allows distance between descriptors to be computed consistent with LSH. A target image for which a match is sought is then received, and a descriptor indicative of the target image is computed. The database is referenced, or mapped, to determine close matches in the database. Mapping includes selecting a candidate match descriptor from among the descriptors in the database and employing a distance metric derived from the similarity metric to determine if the candidate match descriptor is a match to the target descriptor.
Owner:HEWLETT-PACKARD ENTERPRISE DEV LP

Method and Apparatus for Detection of Anomalies in Integrated Parameter Systems

A system, method, and tangible computing apparatus is disclosed for the detection of anomalies in an integrated data network. Said system, method and apparatus comprises the creation and construction of a mathematical model that utilizes multi-dimensional mutual information to detect interactions and interrelationships between pairs of data streams and among pluralities of data streams. Real-time analysis of the operations of an integrated data network is enhanced and expedited via use of locality sensitive hashing that relies on density determinations of clusters of data.
Owner:SCHNEIDER ELECTRIC SOFTWARE LLC

Distributed data stream clustering method and system

The invention discloses distributed data stream clustering method and system and overcomes the defect that the existing most data steam clustering algorithms are unable to run in the distributed cloud environment, unable to easily extend and low in operational time efficiency. The method includes: summarizing data streams to obtain a plurality of eigenvectors of the data streams; performing locality-sensitive hashing algorithm to obtain a plurality of clusters with each comprising at least one eigenvector, and selecting at least one cluster as a candidate cluster; periodically using the candidate cluster to cluster eigenvectors of newly arrived data streams. The real-time performance better than that of the prior art is guaranteed by the use of the method and system based on the locality-sensitive hashing algorithm.
Owner:CHINA INFORMATION TECH SECURITY EVALUATION CENT +1

Content aggregation method based on distributed web crawlers

The invention provides a content aggregation method based on distributed web crawlers, which comprises the steps that firstly different crawler platforms are arranged at different devices, a request is sent to a crawling network information source end, and the crawler platforms fabricate crawling rules according to target information required by a user and crawl information in which the target user is interested; the crawled network information is processed, similarity detection is carried out based on a data transmission and conversion method in a real-time database and by being combined with a locality sensitive hashing (LSH) method so as to reduce the redundancy of the information; and the information is classified and sorted by the system according to the category, the heat and keywords and then displayed on user equipment. According to the method provided by the invention, LSH and similarity comparison are carried out according to the data information acquired in an actual network so as to acquire a comparison result. Compared with a comparison result acquired by adopting a traditional mode of whole data duplication checking in the prior art, the content aggregation method is higher in calculation speed and more accurate in similarity comparison.
Owner:江苏未来网络集团有限公司

LSH (Locality Sensitive Hashing)-based clustering and indexing method and LSH-based clustering and indexing system

ActiveCN103631928AEvenly distributed dataReduce matching performance instabilityRelational databasesSpecial data processing applicationsData setRelative stability
The invention relates to an LSH (Locality Sensitive Hashing)-based clustering and indexing method and an LSH-based clustering and indexing system. The LSH-based clustering and indexing method comprises the steps of step 1, carrying out clustering analysis on a data set, dividing the data set into a plurality of categories, and determining and ensuring a clustering center of each category; step 2, establishing a hashing table in each category by adopting an LSH method; step 3, calculating Euclidean distance between each clustering center and a query point, and selecting multiple categories in minimum Euclidean distances as candidate categories; step 4, calculating a hashing value of the query point in each candidate category, and selecting data points of which the hashing values are the same as that of the query point in the candidate categories as candidate points according to the hashing table established in step 2; step 5, calculating the Euclidean distances between the candidate points and the query point, and taking the candidate point in minimum Euclidean distance as a nearest adjacent point to the query point. According to the LSH-based clustering and indexing method and the LSH-based clustering and indexing system, disclosed by the invention, great increasing of query efficiency and relative stability of query performance can be obtained under the situation of less sacrificing the accuracy rate.
Owner:INST OF INFORMATION ENG CHINESE ACAD OF SCI

Method for managing metadata of redundancy deletion and storage system based on location sensitive Hash

The invention discloses a method for managing metadata of redundancy deletion and storage system based on location sensitive Hash, which combines the metadata of a similar file data block together rapidly by a location sensitive Hash function, so that when a data block is written into the redundancy deletion and storage system, the method can quickly search whether the data block has existed in the system, improves the metadata search performance of the redundancy deletion and storage system and finally improves the throughput rate of the system. In the method, the query speed, the memory overhead and the redundancy deletion effect of a metadata management system are changed by setting the number of the used location sensitive Hash functions and adjusting the identification rate of similar files. The method can lead the metadata management to be suitable for different demands of the redundancy deletion and storage system, can improve the identification rate of similar files by using a plurality of Hash functions, improves the redundancy deletion capability of the redundancy deletion and storage system and reduces the memory overhead of the metadata index.
Owner:TSINGHUA UNIV

Distributed index method based on LSH (Locality Sensitive Hashing)

The invention relates to a distributed index method based on LSH (Locality Sensitive Hashing). The distributed index method comprises the following steps: firstly utilizing a clustering algorithm to cluster mass data sets; then mapping clustering centers to different computational nodes; then mapping original mass picture or video characteristic data to the computational nodes corresponding to the type so that each node can process one type; finally utilizing the method based on p-stable distribution LSH to establish data index on different nodes. In order to reduce the merging time of search results on different computational nodes and improve the quality of the search results, the invention provides two methods to select m types recently for subsequent detailed search. The invention provides a guide for automatically mapping the mass data to the different computational nodes; moreover, according to the method, the detailed comparison times during the search period of the LSH further can be effectively reduced, so that the search of the mass data is more accurate and efficient.
Owner:NANJING UNIV

Garment attribute retrieval method based on deep convolutional neural network

The present invention discloses a garment attribute retrieval method based on a deep convolutional neural network. The method comprises: employing a rapid convolutional neural network based on an area to perform portrait detection of an input image; employing a pre-training deep convolutional neural network to perform attribute feature extraction, and retaining the features of a final pooling layer; employing a sharing layer to connect with the features retained by the pooling layer, and fusing the feature information of all the attributes; establishing an attribute tree, performing classification of garment attributes, performing branching of the sharing layer according to the classification, each attribute branching being configured for prediction of one group of related attributes; and performing attribute branching output series overlaying, performing normalization, performing similarity measurement through the locality-sensitive hashing method, and obtaining a result. The feature description of the garment attributes is used for garment attribute detection so as to observably improve the accuracy of prediction of garment attributes.
Owner:XIAN JIAOTONG LIVERPOOL UNIV

Locality-sensitive-hashing-based high-dimensional indexing method for large-scale multimedia data

The invention relates to a locality-sensitive-hashing-based high-dimensional indexing method for large-scale multimedia data. The method includes the following steps of extracting high-dimensional features of the multimedia data at the offline indexing stage; establishing an internal storage index, storing the multimedia high-dimensional features in a feature storage area, calculating the locality sensitive hashing vectors of the high-dimensional features, and storing feature numbers and the locality sensitive hashing vectors corresponding to the features in a hashing list storage area, wherein the internal storage index comprises the feature storage area and the hashing list storage area; establishing a first-stage disk index, wherein the first-stage disk index comprises a feature storage area, an index storage area and a plurality of hashing list storage areas; establishing a second-stage disk index, wherein the second-stage disk index comprises a hashing barrel storage area; repeatedly executing the steps mentioned above till all multimedia input is indexed. At the online query stage, features of the multimedia data used for queries are extracted, the queries are conducted on the basis of the established indexes, and similar query results are returned. By means of the method, the scheduling performance of internal storage and disks is improved, and the indexing speed and the retrieving speed of the multimedia data are increased.
Owner:PEKING UNIV

Efficient distributed locality sensitive Hashing method

The invention provides a distributed locality sensitive Hashing method. The method comprises the steps that original data is loaded from a distributed file system, an original data vector set is read, and a first elastic distributed dataset is generated; L composite Hash functions are constructed according to the number L of Hash tables and the number k of Hash functions designed by a user; L Hash values of each piece of data in the dataset are calculated, each piece of data is mapped into one Hash bucket of each Hash table, key value pairs composed of Hash table identifiers in all the data and values of the composite Hash functions are merged into a string, the string is mapped into digital key values, the digital key values and data identifiers form key value pairs, and the key value pairs are saved as a second elastic distributed dataset; and repartitioning is performed according to the digital key value of each piece of data in the second dataset, so that data with the same digital key value is saved in the same partition, and construction of the Hash tables is completed. Through the method, the shuffle amount generated in the Hash table construction process can be reduced, index construction efficiency can be improved, and message transmission overhead can be reduced during query.
Owner:NAT UNIV OF DEFENSE TECH

Multi-feature locality sensitive hashing (LSH) indexing combination-based remote sensing image retrieval method

The invention discloses a multi-feature locality sensitive hashing (LSH) indexing combination-based remote sensing image retrieval method and belongs to the technical field of remote sensing image retrieval. According to the multi-feature LSH indexing combination-based remote sensing image retrieval method disclosed by the invention, LSH indexing of one of the best indexing technologies in high-dimensional feature spaces is introduced into the field of the remote sensing image retrieval, so that the problems of curse of dimensionality and retrieval time consuming can be effectively solved on a large scale, and the rapid retrieval of remote sensing images is realized. Meanwhile, the invention provides a new indexing validation index-a feature discriminative-ness-based indexing validation index (FDIVI) by aiming at the LSH indexing, and features best capable of distinguishing targets and backgrounds are evaluated and selected by the LSH indexing in all feature spaces, and therefore, the accuracy of a retrieval result is effectively improved. Compared with the prior art, the multi-feature LSH indexing combination-based remote sensing image retrieval method disclosed by the invention is capable of more rapidly and accurately realizing the retrieval of a great amount of remote sensing image data.
Owner:HOHAI UNIV

Multi-label learning design method based on hashing method

The invention discloses a multi-label learning design method based on a hashing method. Through the combination of a hashing algorithm and a multi-label learning algorithm based on Bayesian statistics, the correlation between labels is effectively utilized so as to improve the predicting performance of a multi-label learning model, labels and neighbors of the labels are introduced to computation of the posterior probability through the characteristics of the neighbors, the correlation between the labels is fully considered, and the accuracy of the algorithms is improved; the problem that the label space in multi-label learning of large-scale data is higher in dimension and sparse is solved through an MinHash algorithm; the purpose of learning large-scale data is achieved by finding the neighbors through locality sensitive hashing (LSH), the neighbors can be rapidly and efficiently found, and the expandability of the multi-label learning algorithm is improved.
Owner:NANJING UNIV OF POSTS & TELECOMM

Indexing Method For Multimedia Feature Vectors Using Locality Sensitive Hashing

A computer implemented method for indexing multimedia vectors and for searching and retrieving a query vector using a locality sensitive hashing. Indexing is performed by calculating hash codes from the multimedia vectors using several hash functions. Each hash code is a different subset of the entries in the hash vector. The method utilizes the structure of the hash vector space in order to define the hash codes in a way that improves the retrieval efficiency. Retrieval is performed by applying the hash functions to a query vector and measuring the distances between the query vector and multimedia vectors with hash codes identical to the hash codes of the query vector.
Owner:CORRIGON

Methods for creating and searching a database of speakers

A method of performing a search of a database of speakers, includes: receiving a query speech sample spoken by a query speaker; deriving a query utterance from the query speech sample; extracting query utterance statistics from the query utterance; performing Kernelized Locality-Sensitive Hashing (KLSH) using a kernel function, the KLSH using as input the query utterance statistics and utterance statistics extracted from a plurality of utterances included in a database of speakers in order to select a subset of the plurality of utterances; and comparing, using an utterance comparison equation, the query utterance statistics to the utterance statistics for each utterance in the subset to generate a list of speakers from the database of utterances having a highest similarity to the query speaker.
Owner:MOTOROLA SOLUTIONS INC

Ciphertext image retrieval method and system under a cloud environment

ActiveCN108959478AReduce feature extraction timeImproving the Efficiency of Encrypted Image RetrievalCharacter and pattern recognitionSpecial data processing applicationsFeature vectorAlgorithm
The invention discloses a ciphertext image retrieval method and system under a cloud environment. Firstly, a Harris algorithm is optimized from two aspects of an adaptive threshold value and a featurepoint pre-screening, and image features are extracted. Secondly, the Harris corner eigenvector of each image is generated by SURF algorithm and sack model. Then, a locality sensitive hashing (LSH) algorithm is used to construct a searchable index of feature vectors, and the image and index are encrypted by traditional encryption schemes. Finally, a secure similarity retrieval is performed on a cloud server. The experimental result proves that by optimizing the Harris corner selection and the feature description of the SURF and the sack model, and optimizing the parameters of the local sensitive hash algorithm, the retrieval scheme proposed by the invention, compared with existing conventional encrypted retrieval scheme, shortens the feature extraction time and also effectively improves the encrypted image retrieval efficiency.
Owner:CENTRAL SOUTH UNIVERSITY OF FORESTRY AND TECHNOLOGY

Method for detecting repetition data of social media

The invention discloses a method for detecting repetition data of social media. The method comprises following steps: dividing each text data of social data into multiple text elements constituting sets corresponding to text data; utilizing a Hush function to map all text elements in sets to corresponding Hash values and obtaining minimum Hash values, repeating mapping for multiple times in order to obtain an array composed of multiple minimum Hash values as the minimum Hash signature for text data; utilizing a locality-sensitive hashing algorithm to map text elements of each minimum Hash value to different detection queues; and calculating Jaccard similarity between any two text elements in the same detection queue. Text elements with Jaccard similarity larger than threshold value are determined as repetition data.The a method for detecting repetition data of social media is capable of increasing repeatability detection efficiency of large texts.
Owner:EAST CHINA NORMAL UNIV

Method and device for obtaining similar object set and providing similar object set

The invention discloses a method and device for obtaining similar object set and providing similar object set. The method comprises as follows: obtaining input file comprising M objects, N attributes, attribute values corresponding to each attribute; inputting each attribute to first level of pre-created minimum hash function minhash, obtaining the returned value of the first level of minhash of each attribute; according to each attribute, weighted value corresponding to the attribute in the current object and the second level of pre-created minhash function, obtaining the returned value of the second level of the minhash of each attribute; calculating the combined minhash value of each attribute in each object respectively; determining the minimum value of the combined minhash value corresponding to each attribute of the same object as the minhash value of the object; circularly executing the operation to each object for K times, respectively obtaining K minhash values in allusion to each object; inputting K minhash values of each object to the locality sensitive hashing (LSH) computing framework. The method and device are capable of improving the operating efficiency, and improving the validity and accuracy degree of the similar object information.
Owner:ALIBABA GRP HLDG LTD

Content duplicate detection method of network image based on GIST (generalized search tree) global feature and SIFT (scale-invariant feature transform) local feature

The invention discloses a content duplicate detection method of a network image based on a GIST global feature and an SIFT local feature. The method comprises the steps as follows: the GIST global feature and the SIFT local feature of an image are extracted respectively; the GIST global feature is used for identifying the image global invariance; the SIFT local feature is used for identifying the image local invariance; and indexing and matching image features are constructed for the two types of the features with an LSH (Locality Sensitive Hashing) algorithm, and content duplicate detection is performed.
Owner:北京明日时尚信息技术有限公司

Locality-sensitive hash-based detection of malicious codes

Malicious code is detected in binary data by disassembling machine language instructions of the binary data into assembly language instructions. Opcodes of the assembly language instructions are normalized and formed into groups, with each group being a subsequence of a sequence of machine language instructions of the binary data. The subsequence is delimited by a predetermined machine language instruction. Locality-sensitive hashes are calculated for each group and compared to locality-sensitive hashes of known malicious machine language instructions to detect malicious code in the binary data.
Owner:TREND MICRO INC

Method and device for retrieving similarity of picture messages

The invention discloses a method and a device for retrieving the similarity of pictures and belongs to the field of the graphic images. The method comprises the steps of obtaining the to-be-retrieved picture characteristics of to-be-retrieved pictures, hashing the to-be-retrieved picture characteristics by use of an LSH (Locality Sensitive Hashing) algorithm to generate hash values of the to-be-retrieved pictures, finding database picture hash values of which are matched and similar to the hash values of the to-be-retrieved pictures om a hash table corresponding to every hash values of the to-be-retrieved pictures, finding database pictures according to the database picture hash values, and according to Euclidean distances of the characteristics of the database pictures and the characteristics of the to-be-retrieved pictures, selecting a preset number of database pictures from the database pictures having relatively smaller Euclidean distances. The method and the device have the characteristics that the problem that a user cannot retrieve the most similar pictures from a plurality of results obtained by the LSH algorithm at the present is solved, and the advantages of LSH in reducing the time-space complexity in picture similarity retrieval and supporting high-dimensional data retrieval are better developed by virtue of the combination of the LSH algorithm and a linear retrieval algorithm.
Owner:BEIJING FEINNO COMM TECH

Method for Web service clustering

The invention discloses a method for Web service clustering which comprises a Web service library, a main control device and a tag library. The method comprises the following the steps: step 1, using a vector space model (VSM) method to convert Web services into vector sets; step 2, according to an application demand, determining the weight of the Web services; and step 3, using a locality sensitive Hashing (LSH) method to cluster the vector sets. Compared with the prior art, the method provided by the invention has the beneficial effects that: 1, the compatibility with the exiting protocols and technologies is maintained aiming at the clustering for web services description language (WSDL) files; 2. compared with the Kmeans method and the like, the efficiency of the method provided by the invention is very high; and the higher the vector space dimension of the Web service is, the more obvious the high efficiency of the method provided by the invention is; and 3, the Web service clustering result can be used for finding out the Web services and combining the Web services, thus having stronger universality and causing the method provided by the invention to have strong backward compatibility.
Owner:ZHEJIANG UNIV

Malicious software detection method and device

The invention provides a malicious software detection method and device, relates to the field of computer system security and solves the problems that a dynamic detecting method is insufficient in expandability, and the detection result is short of accuracy. The method comprises the steps of calculating a unique digital signature of malicious software to be detected; calculating a target content fingerprint vector quantity of the malicious software to be detected; constituting a nearest neighbor set of the target content fingerprint vector quantity and generating a query set of the content fingerprint vector quantity; getting access to a preset location-sensitive hash table data structure according to the query set of the content fingerprint vector quantity and obtaining a candidate result set; selecting variant software of the malicious software to be detected in the candidate result set. According to the technical scheme of the malicious software detection method and device, the method and device are applicable to protection against variants of the malicious software, and the malicious software detection based on a location-sensitive hash table is achieved.
Owner:BEIJING VENUS INFORMATION SECURITY TECH +1

Power grid state similarity quantitative analyzing method based on locality sensitive hashing

The invention provides a power grid state similarity quantitative analyzing method based on locality sensitive hashing. The method comprises the following steps: power grid state information is read, power grid key information is counted, and key information is formed into signature vectors representing power grid states; angles among the signature vectors are computed; and power grid state similarity is judged. The method is applicable to the locality sensitive hashing (LSH in short) technology of an electrical power system. Power grid original state data are converted into the corresponding signature vectors, and the power grid states of corresponding moments are expressed via the signature vectors. On the aforementioned basis, computation of degree of similarity of the power grid states is realized by utilizing similarity of the signature vectors. Large data of power grid state information are converted into the signature vectors of small data so that analysis of overall data is avoided.
Owner:STATE GRID CORP OF CHINA +2

Indoor positioning method for efficient privacy protection based on Wi-Fi fingerprint

The invention provides an indoor positioning method for efficient privacy protection based on Wi-Fi fingerprint. The indoor positioning method comprises the following steps: at first, collecting the fingerprint of each position indoors and generating an index set; and then, transmitting the index set to a client through a wireless network, for enabling a user to look up the position information per se, wherein the fingerprint refers to a RSS signal of each position corresponding to each Wi-Fi access point, and the index set comprises multiple hash tables, parameters of a corresponding function group of each hash table and a position coordinate of each fingerprint marked by a fingerprint serial number. According to the indoor positioning method provided by the invention, a method of using a position sensitive hash function to encrypt position privacy is adopted to achieve high efficiency and privacy protection properties. The positioning is redesigned to achieve the purposes of reducing the computing time and space, improving the positioning precision and protecting the privacy of the position information of a user and a server.
Owner:广州高新兴机器人有限公司

K nearest neighbor approximation query method based on multi-layer locality sensitive hashing

The invention belongs to the field of data analysis, and relates to a k nearest neighbor approximation query method based on multi-layer local sensitive hashing. The method comprises the following steps: firstly evaluating the number of data points mapped to each hash bucket, determining an overload hash bucket and an underload hash bucket according to the number of the data points in each hash bucket, then further hashing and dividing the overload hash bucket into a plurality of sub-buckets, and merging the underload hash buckets at the same time; and recursively performing re-hashing on thesub-buckets which are still overloaded after re-division, and balancing the sizes of the plurality of hash buckets as much as possible after multiple times of re-hashing. Therefore, the LSH index structure becomes a multi-layer tree-like structure. According to the method, the initially constructed LSH hash table is reconstructed, so that the kNN search efficiency of the query points in the denseregion and the kNN search accuracy of the query points in the sparse region are improved. The hash buckets in the multi-layer local sensitive hash structure are relatively uniform in size distribution, and the advantages are very obvious when kNN search is carried out on obliquely distributed big data on the hash buckets.
Owner:NORTHEASTERN UNIV LIAONING

Storage method for redundancy deletion block device based on location-sensitive hash

The invention discloses a storage method for a redundancy deletion block device based on location-sensitive hash, and belongs to the data storage field. The method comprises the following steps: putting data blocks of redundant writing detection operation and a corresponding digital finger print into the current operating queue; D: judging whether the number of the data blocks in the queue exceeds threshold value or not, if so, taking threshold value of data blocks as a data section, and executing the step F, and otherwise, executing the step E; E: judging whether the data block at the front of the queue is overtime or not, if so, taking the data blocks as the data section, and executing the step F, and otherwise, executing the step D; F: judging whether the set of metadata of similar data sections exists or not, if so, executing the step G, and otherwise, establishing an empty set, and executing the step G; and G: orderly judging whether digital finger prints of data blocks exist in the set of metadata of the similar data sections or not, if so, modifying the memory addresses of the data blocks, and otherwise, generating the metadata of the data blocks. The method reduces the time of accessing the metadata in the redundant writing detection operation process.
Owner:TSINGHUA UNIV

High efficiency clustering method based on locality-sensitive hashing and non-parametric Bayes method

The invention relates to a high efficiency clustering method based on a locality-sensitive hashing (LSH) and non-parametric Bayes method; the high efficiency clustering method can effectively process mass sequence data, such as 16s rRNA and 18s rRNA data; a high efficiency partitioning iteration method can prevent contrast of mass non-similar sequences, so the clustering result can be fast given for a large scale data set clustering problem; the high efficiency clustering method is the most efficient method for processing the large scale clustering problem in existing bioinformation field; in addition, a DP-means algorithm can more accurately estimate the cluster center, so the clustering result by the novel method can ensure very high accuracy.
Owner:TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products