Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

31 results about "Levenshtein distance" patented technology

In information theory, linguistics and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965.

Method and device for matching between questions and answers

The invention provides a method and a device for matching between questions and answers and relates to the technical field of intelligent questions and answers. The method for matching between questions and answers include extracting keywords input into a question text, determining a target matching question text from a pre-built question base by an index filtering method according to the keywords, determining the optimum matching question text with highest similarity with the input question text from the target matching question text on the basis of the Levenshtein distance algorithm, and outputting an answer text corresponding to the input question text according to the optimum matching question text. By the method and the device, answers matched with questions can be outputted and inputted within a short time, thus, time for matching between the question and the answer can be shortened while matching accuracy can be improved.
Owner:CAPITAL NORMAL UNIVERSITY

Method of Syntactic Pattern Recognition of Sequences

InactiveUS20080208854A1Fast and efficient and highly accurate methodFast and efficient and highly methodData processing applicationsDigital data information retrievalPattern recognitionSyntactic pattern recognition
This invention relates to the Pattern Recognition (PR) of noisy / inexact strings and sequences and particularly to syntactic Pattern Recognition. The present invention presents a process by which a user can recognize an unknown sting X, which is an element of a finite, but possibly larger Dictionary, H, by processing the information contained in its noisy / inexact version, Y, where Y is assumed to contain substitution, insertion or deletion errors. The recognized string, which is the best estimate X+ of X, is defined as that element of H which minimizes the Generalized Levenshtein Distance D(X,Y) between X and Y, for all X<H. Rather than evaluate D(X,Y) for every X<H sequentially, the present invention achieves this simultaneously for every X<H by representing the Dictionary as a Trie, and searching the Trie using a new AI-based search strategy.
Owner:3618633 CANADA

Table structure analyzing apparatus, table structure analyzing method, and table structure analyzing program

A table structure analyzing apparatus extracts first row data and second row data in table data. Similarity between the data is computed based on Levenshtein distance or the number of characters. Further, similarity between the first row and the second row as a whole is determined. When the similarity is equal or less than a predetermined threshold value, it is determined that the boundary between the first and second rows is the boundary between a header part and a substantive part. A similar determination is made in the direction of columns.
Owner:JUSTSYSTEMS

Intelligent question and answer method and system based on pet knowledge graph

PendingCN110209787AFill in the lack of intelligent question and answerNatural language data processingSpecial data processing applicationsEntity linkingSequence graph
The invention discloses an intelligent question answering method and system based on a pet knowledge graph, and the method comprises the steps: constructing a named entity dictionary, abstracting questions, and facilitating the classification of the questions. A method of combining word2vec with Levenshtein Distance is provided to realize entity linking, and experiments show that the method is effective. Texts are trained by constructing a text classifier based on Naive Bayes, and the improved TF-IDF naive Bayes classification algorithm is provided, the distribution situation of feature wordsin a text set and the category distribution situation are considered, and the improved TF-IDF effectively improves the text classification effect. Through the result of the text classifier, the intention of the natural language question is determined, and the natural language question is matched with the corresponding word sequence graph. The word order graph is converted into a similar SQL querystatement of the OrientDB, and querying is performed in a graph database storing the knowledge graph. Finally, the constructed intelligent question and answer system based on the knowledge graph is displayed in an example, and experiments show that the system has a relatively high application value in question and answer application in the field of pets.
Owner:袁琦

Identifying Non-Exactly Matching Text

A computer-implemented method for matching user inputted text to stored text. The user inputted text is compared to each of the text strings stored in a database using a string similarity score determined using a Levenshtein distance algorithm, the n-gram or trigram methods, the Jaro-Winkler algorithm, the Cosine similarity algorithm, the Hamming distance algorithm, the Damerau-Levenshtein distance algorithm, or similar. For each comparison, the string similarity score is analyzed to determine exact matches, non-matches, and probable matches. Probable matches are further analyzed using a keyboard distance algorithm to differentiate between matches and non-matches.
Owner:BOTTOMLINE TECH

Modified levenshtein distance algorithm for coding

ActiveUS20070172124A1Improved accuracy in the mapping of an OCR text stringPromote resultsCharacter and pattern recognitionTheoretical computer scienceText string
Methods and systems of mapping of an optical character recognition (OCR) text string to a code included in a coding dictionary by supplementing the Levenshtein Distance Algorithm (LDA) with additional information in the form of adjustments based on particular character substitutions, insertions and deletions together with weighting based on multiple alternatives for the OCR text string. In one embodiment, an OCR text string mapping method (100) includes receiving (110) an OCR text string, comparing (120) it with selected text strings from a coding dictionary, computing (130) modified Levenshtein distances associated with the comparisons by determining (140) substitution penalties, determining (150) insertion penalties, determining (160) deletion penalties and combining (170) the penalties, selecting (180) the best matching text string from the coding dictionary based on the modified Levenshtein distances, determining (190) whether a maximum threshold distance is met, and assigning (200) a code associated with the best matching text string to the OCR text string when met, and assigning (210) a null or no code when not met.
Owner:LEIDOS INNOVATIONS TECH INC

Optimization of text-based training set selection for language processing modules

A device and a method provide for selection of a database from a corpus using an, optimization function. The method includes defining a size of a database, calculating a distance using a distance function for each pair in a set of pairs, and executing an optimization function using the distance to select each entry saved in the database until the number of saved entries equals the size of the database. Each pair in the set of pairs includes either two entries selected from a corpus or one entry selected from a set of previously selected entries and another entry selected from a set of a remaining portion of the corpus. The distance function may be a Levenshtein distance function or a generalized Levenshtein distance function.
Owner:CORE WIRELESS LICENSING R L

Optimization of text-based training set selection for language processing modules

A device and a method provide for selection of a database from a corpus using an, optimization function. The method includes defining a size of a database, calculating a distance using a distance function for each pair in a set of pairs, and executing an optimization function using the distance to select each entry saved in the database until the number of saved entries equals the size of the database. Each pair in the set of pairs includes either two entries selected from a corpus or one entry selected from a set of previously selected entries and another entry selected from a set of a remaining portion of the corpus. The distance function may be a Levenshtein distance function or a generalized Levenshtein distance function.
Owner:CORE WIRELESS LICENSING R L

Method for building integrated enterprise process reference model based on model combination

The invention discloses a method for building an integrated enterprise process reference model based on model combination, which belongs to the field of modeling technology. The invention is characterized in that the method is realized on a platform formed by sequentially connecting a plurality of users, the Internet and servers, and an enterprise process reference model can be semi-automatically obtained by converting original enterprise models into character strings for combination, wherein the enterprise models are two modeled enterprise process models which are selected form an enterprise model database, approved by modeling experts, and put into practice. A best-fit enterprise model is obtained by the following steps of: firstly, obtaining a longest common subsequence of two process sequences; and then respectively building an auxiliary progression based on the longest common subsequence for the two process sequences; and finally obtaining the Levenshtein distance of the minimum operation time for covering one process sequence into the other process sequence by basic operation. The invention can effectively improve the reusage of enterprise model knowledge so as to effectively improve the modeling efficiency.
Owner:TSINGHUA UNIV

Lip detection and reading method based on cascade feature extraction

The invention discloses a lip detection and reading method based on cascade feature extraction. The method of the invention comprises firstly detecting the lip region of the input video, and realizingthe lip region detection through a Viola-Jones method based on a Hall classifier and an adaptive advancing algorithm; secondly, according to the color feature of the lip region, performing thresholdbinarization on the detected area to extract the lip region, performing discrete cosine transformation on the image of the lip region to concentrate the information of the video image in a certain area of the data matrix, and extracting the data by an appropriate screening method; then, the dimension of the eigenvalues with large contribution value is extracted by a principal component analysis algorithm, so that the dimension of the data can be further reduced; according to a certain scale of data samples, a sequence dictionary tree is established for recognition, and Levinstein distance is used for similarity analysis and fuzzy matching of sequence, finally, the static characteristics of each frame and the dynamic characteristics of the video are combined to query the dynamic sequence tocomplete the reading of the lip region. The invention can improve the speed and accuracy of lip reading and has good practicability and robustness through multi-level extraction and dimension reduction of lip region image features.
Owner:NANJING UNIV OF POSTS & TELECOMM

Multi-channel hand-written Chinese error correction method based on voice

The invention pertains to the field of man-machine interaction, and particularly relates to a multichannel handwritten Chinese error correction method based on voice. The method repeats handwritten content by the voice and corrects handwriting identification error by adopting a mode which blends handwriting and the voice. The method calculates Levenshtein distance by cutting the handwriting into a plurality of segment sequences and simultaneously expressing both the handwritten and the voice with phonemes, and calculates divide and conquer blending cost, and the Chinese character cutting result with the lowest divide and conquer blending cost is the final cutting result. The core of the method is the multichannel blending of the handwritten and the voice, and the error of handwritten identification is corrected by utilizing the voice and adopting the method of the complementary of voice input and hand input.
Owner:INST OF SOFTWARE - CHINESE ACAD OF SCI

Dual authentication method for identifying non-exactly matching text

A computer-implemented method for matching user inputted text to stored text. The user inputted text is compared to each of the text strings stored in a database using a Levenshtein distance algorithm. For each comparison, the Levenshtein distance is analyzed to determine exact matches, non-matches, and probable matches. Probable matches are further analyzed using a keyboard distance algorithm to differentiate between matches and non-matches.
Owner:BOTTOMLINE TECH

System and A Method for Speech Analysis

A computer implemented method and system for processing an audio signal. The method includes the steps of extracting prosodic features from the audio signal, aligning the extracted prosodic features with a script derived from or associated with the audio signal, and segmenting the script with the aligned extracted prosodic features into structural blocks of a first type. The method may further include determining a distance measure between a structural block of a first type derived from the script with another structural block of the first type using, for example, the Damerau-Levenshtein distance.
Owner:BLUE PLANET TRAINING INC

Company name matching method and device, computer equipment and storage medium

The invention relates to a company name matching method and device, computer equipment and a storage medium. The company name matching method comprises the following steps: S1, receiving a company name submitted by a user; S2, performing word segmentation on the company name, and calculating the frequency and weight of words according to a word segmentation structure; S3, constructing a point-edgerelationship by taking the company name as a point and the same word as an edge; S4, for two company names with the same side relationship, calculating the Leivelstein distance similarity, and calculating the cosine distance similarity of the two company names according to the word weight; S5, filtering out edges lower than a threshold value, and quantifying the similarity of company names; S6, calculating a connected graph according to the filtered edge data, dividing the data, and finding out similar or same company names; according to the company name matching method and device, the computer equipment and the storage medium, pairwise calculation is avoided by constructing the edge relationship, so that the calculation amount is greatly reduced, and the calculation efficiency and the calculation accuracy are improved.
Owner:中邮消费金融有限公司

Method for recognizing CSRF token elements in web pages

The invention discloses a method for recognizing CSRF token elements in web pages. The method comprises the following steps: establishing a first conversation of HTTP; examining whether a page source code has <form> labels or not by acquiring a targeted page source code; searching whether form elements of which input types are hidden exist in the <form> labels of the web page source code of the first conversation; establishing a second conversation of HTTP; examining whether a page source code has <form> labels or not by acquiring a targeted page source code; searching whether form elements of which input types are hidden exist in the <form> labels of the web page source code of the second conversation; successively comparing the values of the form elements of which the input types are hidden in the forms of the first conversation and the second conversation; calculating the proportion of Levenstein distance of the values of the suspected CSRF token form elements; judging whether the parameters are CSRF token or not. The CSRF token elements are dynamically recognized on the basis of an algorithm, the recognition rate of the CSRF token elements is increased, and missing report is greatly reduced.
Owner:成都知道创宇信息技术有限公司

Deep learning evaluation model and input method pinyin error correction method and device

The invention provides a deep learning evaluation model, an input method pinyin error correction method and an input method pinyin error correction device, which use a method based on an automatic state conversion machine to realize efficient levenshtein distance (editing distance) matching of an input pinyin string and a standard syllable. And then, through an evaluation model based on deep learning, a combination score of the currently input pinyin string and each different approximate standard syllable are given. And finally, an optimal combined pinyin analysis result is obtained through calculation based on dynamic programming. According to the method, syllables possibly having wrong input can be corrected, a correct syllable division result with the maximum probability is output, thepossible ambiguity problem is solved, a localization information platform is considered, and the operation efficiency of the input method is improved.
Owner:BEIJING THUNISOFT INFORMATION TECH

Escape behavior detection method based on multiple environments

The invention discloses an escape behavior detection method based on multiple environments. The method comprises the steps of obtaining a to-be-analyzed program; analyzing the to-be-analyzed program by adopting a multi-environment virtualization sandbox; extracting an API call sequence of each sample in the behavior analysis report of each sandbox; converting the API calling sequence into an API character sequence; carrying out comparison detection on API character sequences of the same sample in different sandboxes based on a Smith-waterman algorithm; extracting a difference subsequence in the comparison detection result; and calculating the Levenshtein distance of the difference subsequence and comparing the API character sequences of the same sample in multiple environments in pairs soas to judge whether the to-be-analyzed program has an escape detection behavior or not. The method is high in reliability, good in practicability and high in detection efficiency.
Owner:CENT SOUTH UNIV

System and a method for speech analysis

A computer implemented method and system for processing an audio signal. The method includes the steps of extracting prosodic features from the audio signal, aligning the extracted prosodic features with a script derived from or associated with the audio signal, and segmenting the script with the aligned extracted prosodic features into structural blocks of a first type. The method may further include determining a distance measure between a structural block of a first type derived from the script with another structural block of the first type using, for example, the Damerau-Levenshtein distance.
Owner:BLUE PLANET TRAINING INC

An English word spelling checking method

The invention relates to an English word spelling checking method, belonging to the technical field of natural language processing. Firstly, the Levenshtein distance is used to compute the editing distance between the input words and the English dictionary, and a similar set of words is selected according to the threshold. Then, the key editing distance model is introduced to calculate the key editing distance between the input word and all the words in the word set. Secondly, the visual editing distance model is used to calculate the visual editing distance between the input word and all thewords in the word set. Finally, the weights of similarity calculated above are given and calculated by weighted edit distance. Compared with the prior art, the present invention mainly solves the inaccuracy and redundancy of spelling checking of English words by a text editor at the present stage, and can reduce the matched approximate word set to a more accurate range.
Owner:KUNMING UNIV OF SCI & TECH

Power Internet of Things network security risk prediction method based on Levenshtein distance algorithm

PendingCN113886811AFacilitate long-term forecastingRelational databasesDesign optimisation/simulationAttackEngineering
The invention relates to the field of network security prediction, in particular to a power Internet of Things network security risk prediction method based on a Levenshtein distance algorithm. The method comprises the following steps of: firstly, taking an attack source IP, an attack behavior and an attack target IP in a single alarm event as effective alarm information, and aiming at each piece of alarm information, taking a current alarm event as a result, searching six most similar alarm events before the occurrence time of the alarm event as causes, thereby constructing a piece of causal data, and storing the causal data into a database to form a causal database; secondly, filtering the causal database; and finally, predicting an alarm event by using the Levenshtein distance algorithm. The problem of poor long-term prediction effect in the system in the prior art is solved.
Owner:INFORMATION & TELECOMM COMPANY SICHUAN ELECTRIC POWER

A lip detection and reading method based on cascade feature extraction

The invention discloses a lip detection and reading method based on cascade feature extraction. The invention first detects the lip area of ​​the input video through the Viola Jones method based on the Haar classifier and the adaptive advancement algorithm; then performs threshold binarization on the detection area according to the color characteristics of the lip area to realize lip area extraction , the image information is concentrated in the corner area of ​​the data matrix through the discrete cosine transform, and the threshold value screening method is used to extract the data; then, the dimensionality extraction of the features with the highest contribution value is performed through the principal component analysis method, so that the data is further reduced in dimension; the sample data is used Establish a sequence dictionary tree for recognition, use Levenstein distance for sequence similarity analysis and fuzzy matching; finally, combine the static features of each frame with the dynamic features of the video, and complete the lip area reading through dynamic sequence query. The invention can improve the speed and accuracy of lip reading through the multi-level extraction and dimension reduction of the image features of the lip area, and has good implementability and robustness.
Owner:NANJING UNIV OF POSTS & TELECOMM

Intelligent opinion clue collection method and system based on data repetition

The invention discloses an intelligent opinion clue collection method and system based on data repetition, belongs to the technical field of data processing and data supervision, and aims to solve the technical problems of repetition of opinion clue data, long processing period and high accuracy. According to the technical scheme, the method specifically comprises the following steps that key indexes for judging the repetition rate in opinion clue data are obtained, and the key indexes are preprocessed; the repetition rate of the key indexes is calculated by using a Levenshtein Distance algorithm; and carrying out batch processing on the opinion clue data collected into one class. The system comprises an acquisition module, a calculation module and a processing module.
Owner:INSPUR SOFTWARE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products