Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

33 results about "MinHash" patented technology

In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder (1997), and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.

Hashing techniques for data set similarity determination

Methods, systems and computer program product embodiments for hashing techniques for determining similarity between data sets are described herein. A method embodiment includes, initializing a random number generator with a weighted min-hash value as a seed, wherein the weighted min-hash value approximates a similarity distance between data sets. A number of bits in the weighted min-hash value is determined by uniformly sampling an integer bit value using the random number generator. A system embodiment includes a repository configured to store a plurality of data sets and a hash generator configured to generate weighted min-hash values from the data sets. The system further includes a similarity determiner configured to determine a similarity between the data sets.
Owner:GOOGLE LLC

Multi-label learning design method based on hashing method

The invention discloses a multi-label learning design method based on a hashing method. Through the combination of a hashing algorithm and a multi-label learning algorithm based on Bayesian statistics, the correlation between labels is effectively utilized so as to improve the predicting performance of a multi-label learning model, labels and neighbors of the labels are introduced to computation of the posterior probability through the characteristics of the neighbors, the correlation between the labels is fully considered, and the accuracy of the algorithms is improved; the problem that the label space in multi-label learning of large-scale data is higher in dimension and sparse is solved through an MinHash algorithm; the purpose of learning large-scale data is achieved by finding the neighbors through locality sensitive hashing (LSH), the neighbors can be rapidly and efficiently found, and the expandability of the multi-label learning algorithm is improved.
Owner:NANJING UNIV OF POSTS & TELECOMM

Method and device for obtaining similar object set and providing similar object set

The invention discloses a method and device for obtaining similar object set and providing similar object set. The method comprises as follows: obtaining input file comprising M objects, N attributes, attribute values corresponding to each attribute; inputting each attribute to first level of pre-created minimum hash function minhash, obtaining the returned value of the first level of minhash of each attribute; according to each attribute, weighted value corresponding to the attribute in the current object and the second level of pre-created minhash function, obtaining the returned value of the second level of the minhash of each attribute; calculating the combined minhash value of each attribute in each object respectively; determining the minimum value of the combined minhash value corresponding to each attribute of the same object as the minhash value of the object; circularly executing the operation to each object for K times, respectively obtaining K minhash values in allusion to each object; inputting K minhash values of each object to the locality sensitive hashing (LSH) computing framework. The method and device are capable of improving the operating efficiency, and improving the validity and accuracy degree of the similar object information.
Owner:ALIBABA GRP HLDG LTD

A secure retrieval method for large-scale images in cloud environment

The invention belongs to the field of multimedia information security protection, in particular to an image security retrieval method based on the combination of a word bag model and a minimum hash principle, which can be used for the security retrieval of large-scale images. A content owner combines a sack model with the minimum hash principle to construct a secure index of the image features. Inthe safe index data set of image features, the noise index vector is introduced, and the index vector corresponding to some visual words is randomly extracted to construct the safe index table. The image security index table and the encrypted image are uploaded to the cloud server. When the user requests retrieval, the cloud service only searches the index table according to the query image indexinformation, and the user obtains the image to be retrieved according to the similarity between the index vectors. This retrieval method has higher efficiency and is more suitable for large-scale dataset retrieval. The feature vector based on SIFT descriptor and binary signature can achieve high precision matching, and has high retrieval accuracy.
Owner:WUHAN UNIV

Method and system for identifying homologous binary files

The invention provides a method and a system for identifying homologous binary files in a database. The database comprises multiple binary basic files. The method comprises the steps of obtaining signatures of to-be-identified files and signatures of the basic files according to a min-hash algorithm; for any signature, performing bucket dividing processing on the signature according to a bucket dividing method; according to a reverse indexing method and the signatures, subjected to bucket dividing, of all the basic files, obtaining dictionaries in one-to-one correspondence with buckets, wherein each dictionary comprises at least one key value pair; according to character strings in the buckets of the to-be-identified files, traversing the corresponding dictionaries, and according to valuescorresponding to matching keys, obtaining the homologous binary files of the to-be-identified files. According to the method and the system, the signatures are obtained by adopting the min-hash algorithm and the bucket dividing is performed by adopting a local sensitive hash algorithm, so that the calculation amount can be remarkably reduced; and by adopting the reverse indexing method, an indextable is established for all the signatures, so that the speed of identifying the homologous binary files is increased.
Owner:INST OF INFORMATION ENG CAS

Intelligent recommendation method and device, computer equipment and readable storage medium

The invention relates to the technical field of big data, and discloses an intelligent recommendation method and device, computer equipment and a readable storage medium, and the method comprises thesteps: obtaining user information, and carrying out the characterization of the user information to obtain a user vector; calling a product quantization process to segment the user vector to obtain aplurality of sub-vectors, identifying the category to which each sub-vector belongs, and summarizing the categories to obtain a user category set; calling a minimum hash process to perform similaritycomparison on the user category set and each reference category set in a preset index library, and setting the reference category set of which the similarity exceeds a preset similarity threshold as atarget category set; and taking the associated information corresponding to the target category set as recommendation information. According to the method, the fineness and the accuracy of user vector category identification are improved, the operation efficiency of the server is improved, the matching speed between the user information and the reference information in the index database is increased, and the data calculation amount and the data storage amount are reduced.
Owner:CHINA PING AN LIFE INSURANCE CO LTD

Combination optimizing method based on Lucene index section

The invention relates to a combination optimizing method based on a Lucene index section, and belongs to the technical field of the computer index. The method comprises the following steps: combiningcurrent node load information and section information of index, building a combination analyzing module to judge whether to meet a combination condition or not; according to a dictionary file contained in each index section, to obtain a characteristic matrix in the index with respect to an index section, processing by combining a minHash algorithm and a minimum hash signature algorithm, so as to calculate the signature matrix of the index section; through combining the signature matrix of the index section and a Jaccard similarity principle, calculating a similarity coefficient between the index sections, and according to the similarity coefficient, dividing the index sections into different similar sets; and using a similarity evaluation model to grade each similar set, and sorting according to a set score, selecting one or more sets with the highest score to be combined by a combination thread. The optimizing method is capable of reducing the effect of combination operation to performance of an index function and a search function and effectively improving a search speed.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Cloud storage similar data detection method and system based on meta-semantic embedding

The invention provides a cloud storage similar data detection method and system based on meta-semantic embedding. The method comprises the following steps: carrying out CDC partitioning on all data in a cloud storage data domain; extracting feature vectors of all the CDC blocks by adopting a MinHash algorithm; processing the context feature vector of any CDC block based on a Mask algorithm, and inputting all the processed context feature vectors into a neural network model for training to obtain a meta-semantic model of a cloud storage data field; extracting semantic feature vectors of the new data uploaded to the cloud storage data domain; and inputting the semantic feature vector of the new data into the new neural network model initialized by the meta-semantic model for similarity detection. According to the method, full-text semantics are embedded based on a meta-semantic embedding method, the reliability of data feature extraction is enhanced, repeated training of the neural network is avoided, and therefore the calculation overhead is reduced.
Owner:NANHUA UNIV

A Merge Optimization Method Based on Lucene Index Segment

The invention relates to a method for merging and optimizing based on Lucene index segments, and belongs to the technical field of computer indexing. It includes the following steps: combining the load information of the current node and the segment information of the index, constructing a merge analysis module to judge whether the merge condition is satisfied. According to the dictionary files contained in each index segment, the feature matrix of the index segment in the index is obtained, and then combined with the minHash algorithm and the minimum hash signature algorithm to calculate the signature matrix of the index segment. Combined with the signature matrix of the index segment and the Jaccard similarity principle, the similarity coefficient between each index segment is calculated, and the index segment is divided into different similar sets according to the similarity coefficient. Use the similarity evaluation model to score each similar set, and sort according to the set score, and select one or more sets with the highest score to be merged by the merge thread. The optimization method of the invention can reduce the impact of the merge operation on the performance of the index function and the retrieval function and can effectively improve the speed of retrieval.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Method and device for obtaining similar object collection and providing similar object information

The invention discloses a method and device for obtaining similar object set and providing similar object set. The method comprises as follows: obtaining input file comprising M objects, N attributes, attribute values corresponding to each attribute; inputting each attribute to first level of pre-created minimum hash function minhash, obtaining the returned value of the first level of minhash of each attribute; according to each attribute, weighted value corresponding to the attribute in the current object and the second level of pre-created minhash function, obtaining the returned value of the second level of the minhash of each attribute; calculating the combined minhash value of each attribute in each object respectively; determining the minimum value of the combined minhash value corresponding to each attribute of the same object as the minhash value of the object; circularly executing the operation to each object for K times, respectively obtaining K minhash values in allusion to each object; inputting K minhash values of each object to the locality sensitive hashing (LSH) computing framework. The method and device are capable of improving the operating efficiency, and improving the validity and accuracy degree of the similar object information.
Owner:ALIBABA GRP HLDG LTD

Intelligent agent behavior responsibility investigation method based on social network privacy negotiation system

The invention discloses an intelligent agent behavior responsibility investigation method based on social network privacy negotiation system, which realizes agent behavior responsibility investigation through qualitative responsibility investigation and quantitative responsibility investigation processes, and adopts a forward simulation negotiation process and a reverse reproduction negotiation process in the qualitative responsibility investigation process. And whether the privacy negotiation agent has improper behaviors or not is accurately judged, and the specific occurrence position of the privacy negotiation agent is accurately locked when the improper behaviors exist. Three quantitative responsibility investigation methods including a simple quantification method, a weighted mahalanobis distance method and an improved Minhash method are further provided, the responsibility quantification value of the privacy negotiation agent can be obtained, and therefore the severity degree of improper behaviors is quantified. According to the invention, the problems of untrusted, unsafe and malicious behaviors of the intelligent agent in the current social network privacy negotiation system are solved.
Owner:JINAN UNIVERSITY

A method and system for identifying homologous binary files

The invention provides a method and a system for identifying homologous binary files in a database. The database comprises multiple binary basic files. The method comprises the steps of obtaining signatures of to-be-identified files and signatures of the basic files according to a min-hash algorithm; for any signature, performing bucket dividing processing on the signature according to a bucket dividing method; according to a reverse indexing method and the signatures, subjected to bucket dividing, of all the basic files, obtaining dictionaries in one-to-one correspondence with buckets, wherein each dictionary comprises at least one key value pair; according to character strings in the buckets of the to-be-identified files, traversing the corresponding dictionaries, and according to valuescorresponding to matching keys, obtaining the homologous binary files of the to-be-identified files. According to the method and the system, the signatures are obtained by adopting the min-hash algorithm and the bucket dividing is performed by adopting a local sensitive hash algorithm, so that the calculation amount can be remarkably reduced; and by adopting the reverse indexing method, an indextable is established for all the signatures, so that the speed of identifying the homologous binary files is increased.
Owner:INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products