Patents

Literature

Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.

33 results about "MinHash" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder (1997), and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.

DCGAN-based spectral imagery secure retrieval method

ActiveCN106997380AImprove precisionImprove retrieval efficiencyDigital data protectionImage data processing detailsGenerative adversarial networkImage retrieval

The invention discloses a DCGAN (Deep Convolutional Generative Adversarial Network)-based spectral imagery secure retrieval method, and belongs to the field of spectral imageries. According to the method, the features of a spectral imagery are highly expressed by utilizing a DCGAN; and a new encrypted domain spectral imagery secure retrieval method is proposed. Firstly the deep spectral-spatial features of the spectral imagery are jointly extracted by utilizing the DCGAN, and the contents of the spectral imagery are accurately represented; in order to ensure the security in a remote sensing image retrieval process, the deep features are encrypted by adopting a Min-Hash method based on a criterion that the similarity of the encrypted features is unchanged, thereby protecting the deep features; and finally under the non-decryption condition, Jaccard similarity distance measurement is performed on image features directly by comparing the number of same Min-Hash values, and images similar to a query image are returned. Therefore, the information security is ensured while the retrieval is realized.

DCGAN-based spectral imagery secure retrieval method

DCGAN-based spectral imagery secure retrieval method

DCGAN-based spectral imagery secure retrieval method

Owner:数安信(北京)科技有限公司

Hashing techniques for data set similarity determination

ActiveUS9311403B1Reduce dimensionalityWeb data indexingSpecial data processing applicationsData setNumber generator

Methods, systems and computer program product embodiments for hashing techniques for determining similarity between data sets are described herein. A method embodiment includes, initializing a random number generator with a weighted min-hash value as a seed, wherein the weighted min-hash value approximates a similarity distance between data sets. A number of bits in the weighted min-hash value is determined by uniformly sampling an integer bit value using the random number generator. A system embodiment includes a repository configured to store a plurality of data sets and a hash generator configured to generate weighted min-hash values from the data sets. The system further includes a similarity determiner configured to determine a similarity between the data sets.

Hashing techniques for data set similarity determination

Hashing techniques for data set similarity determination

Hashing techniques for data set similarity determination

Owner:GOOGLE LLC

Multi-label learning design method based on hashing method

ActiveCN104715021ATroubleshoot tag dependenciesImprove accuracySpecial data processing applicationsText database clustering/classificationAlgorithmPredicting performance

The invention discloses a multi-label learning design method based on a hashing method. Through the combination of a hashing algorithm and a multi-label learning algorithm based on Bayesian statistics, the correlation between labels is effectively utilized so as to improve the predicting performance of a multi-label learning model, labels and neighbors of the labels are introduced to computation of the posterior probability through the characteristics of the neighbors, the correlation between the labels is fully considered, and the accuracy of the algorithms is improved; the problem that the label space in multi-label learning of large-scale data is higher in dimension and sparse is solved through an MinHash algorithm; the purpose of learning large-scale data is achieved by finding the neighbors through locality sensitive hashing (LSH), the neighbors can be rapidly and efficiently found, and the expandability of the multi-label learning algorithm is improved.

Multi-label learning design method based on hashing method

Multi-label learning design method based on hashing method

Multi-label learning design method based on hashing method

Owner:NANJING UNIV OF POSTS & TELECOMM

Method for detecting repetition data of social media

InactiveCN105677661AImprove efficiencyApplicable to repeatability testingSpecial data processing applicationsArray data structureSocial media

The invention discloses a method for detecting repetition data of social media. The method comprises following steps: dividing each text data of social data into multiple text elements constituting sets corresponding to text data; utilizing a Hush function to map all text elements in sets to corresponding Hash values and obtaining minimum Hash values, repeating mapping for multiple times in order to obtain an array composed of multiple minimum Hash values as the minimum Hash signature for text data; utilizing a locality-sensitive hashing algorithm to map text elements of each minimum Hash value to different detection queues; and calculating Jaccard similarity between any two text elements in the same detection queue. Text elements with Jaccard similarity larger than threshold value are determined as repetition data.The a method for detecting repetition data of social media is capable of increasing repeatability detection efficiency of large texts.

Method for detecting repetition data of social media

Method for detecting repetition data of social media

Method for detecting repetition data of social media

Owner:EAST CHINA NORMAL UNIV

Method and device for obtaining similar object set and providing similar object set

ActiveCN104424254AConsistent collision rateImprove effectivenessSpecial data processing applicationsAlgorithmData mining

The invention discloses a method and device for obtaining similar object set and providing similar object set. The method comprises as follows: obtaining input file comprising M objects, N attributes, attribute values corresponding to each attribute; inputting each attribute to first level of pre-created minimum hash function minhash, obtaining the returned value of the first level of minhash of each attribute; according to each attribute, weighted value corresponding to the attribute in the current object and the second level of pre-created minhash function, obtaining the returned value of the second level of the minhash of each attribute; calculating the combined minhash value of each attribute in each object respectively; determining the minimum value of the combined minhash value corresponding to each attribute of the same object as the minhash value of the object; circularly executing the operation to each object for K times, respectively obtaining K minhash values in allusion to each object; inputting K minhash values of each object to the locality sensitive hashing (LSH) computing framework. The method and device are capable of improving the operating efficiency, and improving the validity and accuracy degree of the similar object information.

Method and device for obtaining similar object set and providing similar object set

Method and device for obtaining similar object set and providing similar object set

Method and device for obtaining similar object set and providing similar object set

Owner:ALIBABA GRP HLDG LTD

A secure retrieval method for large-scale images in cloud environment

ActiveCN108959567AImprove securityAchieve dimensionality reductionCharacter and pattern recognitionDigital data protectionImaging FeatureVisual Word

The invention belongs to the field of multimedia information security protection, in particular to an image security retrieval method based on the combination of a word bag model and a minimum hash principle, which can be used for the security retrieval of large-scale images. A content owner combines a sack model with the minimum hash principle to construct a secure index of the image features. Inthe safe index data set of image features, the noise index vector is introduced, and the index vector corresponding to some visual words is randomly extracted to construct the safe index table. The image security index table and the encrypted image are uploaded to the cloud server. When the user requests retrieval, the cloud service only searches the index table according to the query image indexinformation, and the user obtains the image to be retrieved according to the similarity between the index vectors. This retrieval method has higher efficiency and is more suitable for large-scale dataset retrieval. The feature vector based on SIFT descriptor and binary signature can achieve high precision matching, and has high retrieval accuracy.

A secure retrieval method for large-scale images in cloud environment

A secure retrieval method for large-scale images in cloud environment

A secure retrieval method for large-scale images in cloud environment

Owner:WUHAN UNIV

Set similarity calculation method and system based on minhash

InactiveCN106681688AHigh speedCharacter and pattern recognitionComparison of digital valuesArray data structureComputer science

The invention discloses a set similarity calculation method and system based on minhash. The method includes the steps that each element in a set is mapped into a first hash value with an m-bit length through a hash function, and 2k class groups are established, wherein each class group corresponds to one tag which is a second hash value with a k-bit length, and different class groups correspond to different tags; for any set, the first hash values corresponding to the elements in the set are distributed into the class groups corresponding to the tags with the same first k bits as the first hash values; minhash values, corresponding to the class groups, of the set are determined according to the distribution result; the minhash values, corresponding to the class groups, of the set form an array serving as a minhash signature of the set; according to the minhash signatures of any two sets, the similarity of the two sets is calculated. By means of the technical scheme, the minhash signature speed can be greatly increased, and thus the set similarity calculation speed is greatly increased.

Set similarity calculation method and system based on minhash

Set similarity calculation method and system based on minhash

Set similarity calculation method and system based on minhash

Owner:KUYUN INTERACTIVE TECH

Eclat-based multivariate time series association rule mining method

InactiveCN107562865AIncrease digging speedSave memoryKnowledge representationSpecial data processing applicationsOriginal dataRule mining

The invention provides an Eclat-based multivariate time series association rule mining method. the method comprises the steps of 1, generating a perpendicular dataset; 2, generating a MINHASH matrix,wherein the MINHASH matrix needs a designated parameter k; 3, utilizing the MINHASH matrix for estimating a candidate item set in an original data set; 4, according to the minimum support, pruning thecandidate item set to obtain frequent item sets 1; 5, combining two Hash frequent item sets 1 and generating a new frequent item set 2; 6, repeatedly executing the step 5 till combination cannot be performed, and ending an algorithm. The association rule mining speed is remarkably increased, the purpose of obtaining the time series data analysis result in time is achieved, even though the miningprecision is lowered, the mining efficiency can be greatly improved, and the machine memory can be saved.

Eclat-based multivariate time series association rule mining method

Eclat-based multivariate time series association rule mining method

Eclat-based multivariate time series association rule mining method

Owner:HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL

Method and system for identifying homologous binary files

ActiveCN107704501AHigh speedSmall amount of calculationSpecial data processing applicationsData miningComputer science

The invention provides a method and a system for identifying homologous binary files in a database. The database comprises multiple binary basic files. The method comprises the steps of obtaining signatures of to-be-identified files and signatures of the basic files according to a min-hash algorithm; for any signature, performing bucket dividing processing on the signature according to a bucket dividing method; according to a reverse indexing method and the signatures, subjected to bucket dividing, of all the basic files, obtaining dictionaries in one-to-one correspondence with buckets, wherein each dictionary comprises at least one key value pair; according to character strings in the buckets of the to-be-identified files, traversing the corresponding dictionaries, and according to valuescorresponding to matching keys, obtaining the homologous binary files of the to-be-identified files. According to the method and the system, the signatures are obtained by adopting the min-hash algorithm and the bucket dividing is performed by adopting a local sensitive hash algorithm, so that the calculation amount can be remarkably reduced; and by adopting the reverse indexing method, an indextable is established for all the signatures, so that the speed of identifying the homologous binary files is increased.

Method and system for identifying homologous binary files

Method and system for identifying homologous binary files

Method and system for identifying homologous binary files

Owner:INST OF INFORMATION ENG CAS

Intelligent recommendation method and device, computer equipment and readable storage medium

ActiveCN112116436AImprove finenessImprove accuracyCharacter and pattern recognitionOther databases indexingCategory recognitionEngineering

The invention relates to the technical field of big data, and discloses an intelligent recommendation method and device, computer equipment and a readable storage medium, and the method comprises thesteps: obtaining user information, and carrying out the characterization of the user information to obtain a user vector; calling a product quantization process to segment the user vector to obtain aplurality of sub-vectors, identifying the category to which each sub-vector belongs, and summarizing the categories to obtain a user category set; calling a minimum hash process to perform similaritycomparison on the user category set and each reference category set in a preset index library, and setting the reference category set of which the similarity exceeds a preset similarity threshold as atarget category set; and taking the associated information corresponding to the target category set as recommendation information. According to the method, the fineness and the accuracy of user vector category identification are improved, the operation efficiency of the server is improved, the matching speed between the user information and the reference information in the index database is increased, and the data calculation amount and the data storage amount are reduced.

Intelligent recommendation method and device, computer equipment and readable storage medium

Intelligent recommendation method and device, computer equipment and readable storage medium

Intelligent recommendation method and device, computer equipment and readable storage medium

Owner:CHINA PING AN LIFE INSURANCE CO LTD

Method and device for generating text fingerprint information

ActiveCN105589962AEfficient clusteringImprove accuracyRelational databasesSpecial data processing applicationsFeature vectorMinimum weight

The invention discloses a method and a device for generating text fingerprint information. The method comprises the steps as follows: an initial feature vector of a text is extracted; a weighted value of at least one element in the initial feature vector is endowed with a multiple value of the minimum weighted value; the weighted values of the other elements are endowed with the minimum weighted value; the corresponding element is added to the initial feature vector according to the multiple to form a new feature vector; and the text fingerprint information is generated after minimum hash algorithm is carried out on the new feature vector. The method and the device for generating the text fingerprint information disclosed by the invention can improve the accuracy of fingerprint information, so that information clustering has a relatively excellent effect.

Method and device for generating text fingerprint information

Method and device for generating text fingerprint information

Method and device for generating text fingerprint information

Owner:BEIJING QIHOO TECH CO LTD

MES-oriented mass data redundancy elimination method and system

ActiveCN112162977AImprove efficiencyReduce time complexityDigital data information retrievalSpecial data processing applicationsEngineeringTime complexity

The invention relates to an MES-oriented mass data redundancy elimination method and system. According to the MES-oriented mass data redundancy removal method and system, a minimum hash algorithm is adopted to compress preprocessed data to obtain a minimum hash signature, an LSH (Local Sensitive Hash) algorithm is adopted to avoid similarity calculation, and the data is subjected to bucket division according to hash values, so the time complexity in the process of finding out similar repeated data from mass data is greatly reduced, and the overall efficiency of data processing is improved. Moreover, the Jaccard similarity is used as a screening condition, the data of which the Jaccard similarity is greater than a threshold value is defined as potential similar data, and then similarity detection from distribution to overall is performed on the potential similar data to remove similar repeated data, so the redundancy removal capability is improved.

MES-oriented mass data redundancy elimination method and system

MES-oriented mass data redundancy elimination method and system

MES-oriented mass data redundancy elimination method and system

Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Illegal request identification method and device

InactiveCN110381017AAvoid crawlingIllegal requests are identified in a timely mannerUser identity/authority verificationSpecial data processing applicationsUniform resource locatorData mining

The embodiment of the invention provides an illegal request identification method and device, and the method comprises the steps: receiving an access request for a certain interface, and analyzing aninterface identification from a uniform resource locator URL of the access request; carrying out minimum hash signature calculation on the URL, so as to obtain a MinHash signature of the URL; and determining the similarity between the MinHash signature of the URL and the MinHash signature of the target illegal URL corresponding to the predetermined interface identifier, and if the similarity is not less than a set similarity threshold, determining that the access request is an illegal request. According to the technical scheme provided by the invention, the URLs are converted into the minimumhash signatures to compare the similarity between the URLs, so that the illegal request identification efficiency is greatly improved.

Illegal request identification method and device

Illegal request identification method and device

Illegal request identification method and device

Owner:MICRO DREAM TECHTRONIC NETWORK TECH CHINACO

Combination optimizing method based on Lucene index section

ActiveCN108920687AQuick calculationImprove retrieval speedSpecial data processing applicationsDegree of similarityData mining

The invention relates to a combination optimizing method based on a Lucene index section, and belongs to the technical field of the computer index. The method comprises the following steps: combiningcurrent node load information and section information of index, building a combination analyzing module to judge whether to meet a combination condition or not; according to a dictionary file contained in each index section, to obtain a characteristic matrix in the index with respect to an index section, processing by combining a minHash algorithm and a minimum hash signature algorithm, so as to calculate the signature matrix of the index section; through combining the signature matrix of the index section and a Jaccard similarity principle, calculating a similarity coefficient between the index sections, and according to the similarity coefficient, dividing the index sections into different similar sets; and using a similarity evaluation model to grade each similar set, and sorting according to a set score, selecting one or more sets with the highest score to be combined by a combination thread. The optimizing method is capable of reducing the effect of combination operation to performance of an index function and a search function and effectively improving a search speed.

Combination optimizing method based on Lucene index section

Combination optimizing method based on Lucene index section

Combination optimizing method based on Lucene index section

Owner:CHONGQING UNIV OF POSTS & TELECOMM

Cloud storage similar data detection method and system based on meta-semantic embedding

PendingCN114625315AFaster and more stable generationAccurate identificationInput/output to record carriersNeural architecturesFeature vectorFeature extraction

The invention provides a cloud storage similar data detection method and system based on meta-semantic embedding. The method comprises the following steps: carrying out CDC partitioning on all data in a cloud storage data domain; extracting feature vectors of all the CDC blocks by adopting a MinHash algorithm; processing the context feature vector of any CDC block based on a Mask algorithm, and inputting all the processed context feature vectors into a neural network model for training to obtain a meta-semantic model of a cloud storage data field; extracting semantic feature vectors of the new data uploaded to the cloud storage data domain; and inputting the semantic feature vector of the new data into the new neural network model initialized by the meta-semantic model for similarity detection. According to the method, full-text semantics are embedded based on a meta-semantic embedding method, the reliability of data feature extraction is enhanced, repeated training of the neural network is avoided, and therefore the calculation overhead is reduced.

Cloud storage similar data detection method and system based on meta-semantic embedding

Cloud storage similar data detection method and system based on meta-semantic embedding

Owner:NANHUA UNIV

A learning method for multi-label learning based on hashing method

ActiveCN104715021BTroubleshoot tag dependenciesImprove accuracySpecial data processing applicationsText database clustering/classificationAlgorithmPredicting performance

The invention discloses a multi-label learning design method based on a hashing method. Through the combination of a hashing algorithm and a multi-label learning algorithm based on Bayesian statistics, the correlation between labels is effectively utilized so as to improve the predicting performance of a multi-label learning model, labels and neighbors of the labels are introduced to computation of the posterior probability through the characteristics of the neighbors, the correlation between the labels is fully considered, and the accuracy of the algorithms is improved; the problem that the label space in multi-label learning of large-scale data is higher in dimension and sparse is solved through an MinHash algorithm; the purpose of learning large-scale data is achieved by finding the neighbors through locality sensitive hashing (LSH), the neighbors can be rapidly and efficiently found, and the expandability of the multi-label learning algorithm is improved.

A learning method for multi-label learning based on hashing method

A learning method for multi-label learning based on hashing method

A learning method for multi-label learning based on hashing method

Owner:NANJING UNIV OF POSTS & TELECOMM

Method and system for detecting open source components in mixed source software

ActiveCN113721978BSoftware metricsSource code fileAlgorithm

The embodiments disclosed in this application provide a method and system for detecting open-source components in mixed-source software. Wherein, the method includes: obtaining the source code file in the target mixed-source software, that is, obtaining the first source code file, and classifying the first source code file and performing corresponding same-source analysis; wherein, for the first source code file Source code files whose size exceeds the first threshold are subjected to homology analysis based on the Simhash algorithm; for source files whose size does not exceed the first threshold among the first source files, homology analysis is performed based on the Minhash algorithm. Compared with the existing technology, the above scheme can balance the contradiction between the efficiency requirements and the accuracy of open source component detection of mixed-source software, and obtain acceptable open source component detection results under the premise of ensuring detection efficiency.

Method and system for detecting open source components in mixed source software

Method and system for detecting open source components in mixed source software

Method and system for detecting open source components in mixed source software

Owner:PEKING UNIV

A Merge Optimization Method Based on Lucene Index Segment

ActiveCN108920687BSolve insufficient resourcesImprove retrieval speedText database indexingAlgorithmEngineering

The invention relates to a method for merging and optimizing based on Lucene index segments, and belongs to the technical field of computer indexing. It includes the following steps: combining the load information of the current node and the segment information of the index, constructing a merge analysis module to judge whether the merge condition is satisfied. According to the dictionary files contained in each index segment, the feature matrix of the index segment in the index is obtained, and then combined with the minHash algorithm and the minimum hash signature algorithm to calculate the signature matrix of the index segment. Combined with the signature matrix of the index segment and the Jaccard similarity principle, the similarity coefficient between each index segment is calculated, and the index segment is divided into different similar sets according to the similarity coefficient. Use the similarity evaluation model to score each similar set, and sort according to the set score, and select one or more sets with the highest score to be merged by the merge thread. The optimization method of the invention can reduce the impact of the merge operation on the performance of the index function and the retrieval function and can effectively improve the speed of retrieval.

A Merge Optimization Method Based on Lucene Index Segment

A Merge Optimization Method Based on Lucene Index Segment

A Merge Optimization Method Based on Lucene Index Segment

Owner:CHONGQING UNIV OF POSTS & TELECOMM

A method and device for obtaining a collection of similar objects

ActiveCN110019531BImprove accuracyReduce computational complexityDatabase distribution/replicationSpecial data processing applicationsAlgorithmTheoretical computer science

The invention discloses a method and a device for acquiring similar object sets, and relates to the technical field of computers. A specific implementation of the method includes: obtaining a set of target objects and a set of objects to be similar; setting a locally sensitive comparison step size r; Feature data, local sensitivity comparison step length r, obtain the similar object set of the target object from the object set to be similar. This implementation method adopts the local sensitivity-minimum hash value algorithm to obtain the similar object set of the target object from the set of similar objects to be similar, which overcomes the Hive SQL distributed method that only compares objects with a certain same attribute and misses most of them. It reduces the complexity of calculation, speeds up calculation, and improves the accuracy of calculation results and the coverage of similar objects.

A method and device for obtaining a collection of similar objects

A method and device for obtaining a collection of similar objects

A method and device for obtaining a collection of similar objects

Owner:BEIJING JINGDONG SHANGKE INFORMATION TECH CO LTD +1

Malicious file detection method and device, computer equipment and storage medium

PendingCN113704761AReduce computing pressureEasy to detectPlatform integrity maintainanceFile allocationEngineering

The invention discloses a malicious file detection method and device, computer equipment and a storage medium, relates to the technical field of information, and mainly aims at relieving the calculation pressure of malicious file detection. The method comprises the steps of obtaining a calling interface sequence corresponding to a to-be-detected file; determining a feature sequence corresponding to the calling interface sequence, and determining a minimum hash signature corresponding to the to-be-detected file according to the feature sequence; according to the minimum Hash signature corresponding to the to-be-detected file, distributing the to-be-detected file to corresponding Hash buckets under different Hash intervals, wherein a plurality of Hash buckets exist in any Hash interval; determining a first target sample file which is distributed to the same hash bucket with the to-be-detected file in the different hash intervals; and judging whether the to-be-detected file is a malicious file or not according to the category information corresponding to the first target sample file. The method and the device are suitable for malicious file detection.

Malicious file detection method and device, computer equipment and storage medium

Malicious file detection method and device, computer equipment and storage medium

Malicious file detection method and device, computer equipment and storage medium

Owner:SHANGHAI GUAN AN INFORMATION TECH

Malicious program visual detection method based on deep learning

ActiveCN111125699AImprove reliabilityImprove practicalityCharacter and pattern recognitionPlatform integrity maintainanceGraph mappingData stream

The invention discloses a malicious program visual detection method based on deep learning. The method comprises the steps of running a malicious program and constructing a data flow graph of the malicious program; extracting feature sub-graphs from the data flow graph to form a sub-graph corpus; mapping the sub-graphs in the sub-graph corpus into character strings; vectorizing the sub-images by adopting a deep learning algorithm; carrying out hash calculation on the sub-graph vectors contained in the malicious program by adopting a minHash algorithm to construct a visual picture matrix of themalicious program; constructing a classification model; and for a to-be-detected program, constructing a visual picture matrix of the to-be-detected program, and classifying the visual picture matrixof the to-be-detected program by adopting the classification model to obtain a classification result. The method is easy to implement, high in reliability, good in practicability and accurate in detection result.

Malicious program visual detection method based on deep learning

Malicious program visual detection method based on deep learning

Malicious program visual detection method based on deep learning

Owner:CENT SOUTH UNIV

Document detection processing method and device, storage medium and electronic equipment

PendingCN114444464ANatural language data processingText database indexingEngineeringData mining

The invention discloses a document detection processing method and device, a storage medium and electronic equipment. The method comprises the following steps: receiving a document detection request sent by a client; performing text processing and hash table generation processing on the read document content in the to-be-detected document to obtain a first hash signature; searching a minimum Hash signature list in a pre-stored document fingerprint database according to the first Hash signature to obtain an index value of a second Hash signature having coincident elements with the first Hash signature; and positioning a sample document corresponding to the second hash signature through the index value, and calculating a similarity value between the first hash signature and the second hash signature to obtain an overlap ratio value between the to-be-detected document and the sample document. The technical problems that in the prior art, a document detection processing method cannot recognize extraction conditions of a small number of documents, and the document overlap ratio cannot be calculated are solved.

Document detection processing method and device, storage medium and electronic equipment

Document detection processing method and device, storage medium and electronic equipment

Document detection processing method and device, storage medium and electronic equipment

Owner:北京明朝万达科技股份有限公司

Webshell malicious family clustering analysis method

ActiveCN114036515AAchieve integrationImprove discovery efficiencyPlatform integrity maintainanceCluster algorithmPredictive learning

The invention discloses a webshell malicious family clustering analysis method, and relates to the technical field of information security. The method comprises the following steps: step 1, obtaining function call information, parameter values and return value information during Webshell operation; step 2, cleaning, splicing and sequencing the function call information; 3, vectorizing the function call sequence information in the step 2; 4, calculating information entropies of the parameter values and the return values, and sorting according to a function calling sequence; 5, according to the func_seq, the argv_seq and the return_seq obtained in the step 2 and the step 4, building an RNN model to predict the three types of sequences respectively, and learning code family features; 6, after minhash processing is carried out on the original sequence data and the predicted sequence data, mapping the original sequence data and the predicted sequence data into pixel points, and thus, a pixel map is formed; 7, superposing the original pixel image obtained in the step 6 and the predicted pixel image, and drawing a final pixel image; and step 8, clustering the pixel image obtained in the step 7 by using a DBSCAN clustering algorithm.

Webshell malicious family clustering analysis method

Webshell malicious family clustering analysis method

Webshell malicious family clustering analysis method

Owner:CENT SOUTH UNIV

Method and device for obtaining similar object collection and providing similar object information

ActiveCN104424254BConsistent collision rateImprove effectivenessSpecial data processing applicationsAlgorithmData mining

The invention discloses a method and device for obtaining similar object set and providing similar object set. The method comprises as follows: obtaining input file comprising M objects, N attributes, attribute values corresponding to each attribute; inputting each attribute to first level of pre-created minimum hash function minhash, obtaining the returned value of the first level of minhash of each attribute; according to each attribute, weighted value corresponding to the attribute in the current object and the second level of pre-created minhash function, obtaining the returned value of the second level of the minhash of each attribute; calculating the combined minhash value of each attribute in each object respectively; determining the minimum value of the combined minhash value corresponding to each attribute of the same object as the minhash value of the object; circularly executing the operation to each object for K times, respectively obtaining K minhash values in allusion to each object; inputting K minhash values of each object to the locality sensitive hashing (LSH) computing framework. The method and device are capable of improving the operating efficiency, and improving the validity and accuracy degree of the similar object information.

Method and device for obtaining similar object collection and providing similar object information

Method and device for obtaining similar object collection and providing similar object information

Method and device for obtaining similar object collection and providing similar object information

Owner:ALIBABA GRP HLDG LTD

An Image Retrieval Method Based on Latent Semantic Minimal Hash

ActiveCN106033426BHigh precisionImprove efficiencyDigital data information retrievalCharacter and pattern recognitionQuantization (image processing)Data set

The invention relates to the technical field of image processing and in particular relates to a latent semantic min-Hash-based image retrieval method comprising the steps of (1) obtaining datasets through division; (2) establishing a latent semantic min-Hash model; (3) solving a transformation matrix T; (4) performing Hash encoding on testing datasets Xtest; (5) performing image query. Based on the facts that the convolution network has better expression features and latent semantics of primitive characteristics can be extracted by using matrix decomposition, minimizing constraint is performed on quantization errors in an encoding quantization process, so that after the primitive characteristics are encoded, the corresponding Hamming distances in a Hamming space of semantically-similar images are smaller and the corresponding Hamming distances of semantically-dissimilar images are larger. Thus, the image retrieval precision and the indexing efficiency are improved.

An Image Retrieval Method Based on Latent Semantic Minimal Hash

An Image Retrieval Method Based on Latent Semantic Minimal Hash

An Image Retrieval Method Based on Latent Semantic Minimal Hash

Owner:XI'AN INST OF OPTICS & FINE MECHANICS - CHINESE ACAD OF SCI

A method and device for establishing an index

ActiveCN107784110BReduce storage indexImprove retrieval speedText database indexingSpecial data processing applicationsIndex mappingEngineering

An embodiment of the invention discloses an index creating method and device. The method includes: extracting feature words of a target text; sorting the feature words to obtain a feature character string; applying the MinHash algorithm to the feature character string to obtain a hash value corresponding to the target text; searching a mapping buffer pool to determine whether an index mapping barrel matched with the hash value exists or not, if yes, creating an index between the hash value and the target text in the index mapping barrel, and if not, establishing the index mapping barrel matched with the hash value, and creating the index between the hash value and the target text. By adoption of the index creating method, index storage quantity is decreased; by creation of indexes of similar texts in the same index mapping barrel, classification of the similar texts is realized, and similar text retrieval speed is increased.

A method and device for establishing an index

A method and device for establishing an index

A method and device for establishing an index

Owner:RUN TECH CO LTD BEIJING

Intelligent agent behavior responsibility investigation method based on social network privacy negotiation system

PendingCN113837235AQuantify severityLock accuratelyCharacter and pattern recognitionNatural language data processingInvestigation methodsMahalanobis distance

The invention discloses an intelligent agent behavior responsibility investigation method based on social network privacy negotiation system, which realizes agent behavior responsibility investigation through qualitative responsibility investigation and quantitative responsibility investigation processes, and adopts a forward simulation negotiation process and a reverse reproduction negotiation process in the qualitative responsibility investigation process. And whether the privacy negotiation agent has improper behaviors or not is accurately judged, and the specific occurrence position of the privacy negotiation agent is accurately locked when the improper behaviors exist. Three quantitative responsibility investigation methods including a simple quantification method, a weighted mahalanobis distance method and an improved Minhash method are further provided, the responsibility quantification value of the privacy negotiation agent can be obtained, and therefore the severity degree of improper behaviors is quantified. According to the invention, the problems of untrusted, unsafe and malicious behaviors of the intelligent agent in the current social network privacy negotiation system are solved.

Intelligent agent behavior responsibility investigation method based on social network privacy negotiation system

Intelligent agent behavior responsibility investigation method based on social network privacy negotiation system

Intelligent agent behavior responsibility investigation method based on social network privacy negotiation system

Owner:JINAN UNIVERSITY

A method and device for generating text fingerprint information

ActiveCN105589962BEfficient clusteringImprove accuracyRelational databasesSpecial data processing applicationsFeature vectorMinimum weight

The invention discloses a method and a device for generating text fingerprint information. The method comprises the steps as follows: an initial feature vector of a text is extracted; a weighted value of at least one element in the initial feature vector is endowed with a multiple value of the minimum weighted value; the weighted values of the other elements are endowed with the minimum weighted value; the corresponding element is added to the initial feature vector according to the multiple to form a new feature vector; and the text fingerprint information is generated after minimum hash algorithm is carried out on the new feature vector. The method and the device for generating the text fingerprint information disclosed by the invention can improve the accuracy of fingerprint information, so that information clustering has a relatively excellent effect.

A method and device for generating text fingerprint information

A method and device for generating text fingerprint information

A method and device for generating text fingerprint information

Owner:BEIJING QIHOO TECH CO LTD

Similar crowd extension algorithm based on locality sensitive Hash algorithm

PendingCN113282775AImprove accuracyReduce computationAdvertisementsStill image data indexingFeature vectorAlgorithm

The invention provides a similar crowd extension algorithm based on a locality sensitive Hash algorithm. An open source tool datasketch is adopted to calculate original data features so as to obtain weighted minimum Hash of feature vectors of all users, the calculation amount can be greatly reduced, the calculation speed and accuracy are improved, and meanwhile the calculation cost is reduced; and in addition, a local sensitive Hash model constructed by using an open source tool datasketch can be obtained according to the memory size of the memory and the requirement of calculation accuracy, so that the accuracy of the local sensitive Hash model is high.

Similar crowd extension algorithm based on locality sensitive Hash algorithm

Similar crowd extension algorithm based on locality sensitive Hash algorithm

Similar crowd extension algorithm based on locality sensitive Hash algorithm

Owner:上海垚亨信息科技有限公司

A method and system for identifying homologous binary files

ActiveCN107704501BHigh speedSmall amount of calculationSpecial data processing applicationsDatabase indexingAlgorithmEngineering

The invention provides a method and a system for identifying homologous binary files in a database. The database comprises multiple binary basic files. The method comprises the steps of obtaining signatures of to-be-identified files and signatures of the basic files according to a min-hash algorithm; for any signature, performing bucket dividing processing on the signature according to a bucket dividing method; according to a reverse indexing method and the signatures, subjected to bucket dividing, of all the basic files, obtaining dictionaries in one-to-one correspondence with buckets, wherein each dictionary comprises at least one key value pair; according to character strings in the buckets of the to-be-identified files, traversing the corresponding dictionaries, and according to valuescorresponding to matching keys, obtaining the homologous binary files of the to-be-identified files. According to the method and the system, the signatures are obtained by adopting the min-hash algorithm and the bucket dividing is performed by adopting a local sensitive hash algorithm, so that the calculation amount can be remarkably reduced; and by adopting the reverse indexing method, an indextable is established for all the signatures, so that the speed of identifying the homologous binary files is increased.

A method and system for identifying homologous binary files

A method and system for identifying homologous binary files

A method and system for identifying homologous binary files

Owner:INST OF INFORMATION ENG CHINESE ACAD OF SCI

Popular searches

Information security Similarity distance Convolution Spectral image MinHash Computer program Large scale data Locality-sensitive hashing Design methods Learning problem

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

© 2025 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com