Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

62 results about "Longest common subsequence problem" patented technology

The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring problem: unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences. The longest common subsequence problem is a classic computer science problem, the basis of data comparison programs such as the diff utility, and has applications in computational linguistics and bioinformatics. It is also widely used by revision control systems such as Git for reconciling multiple changes made to a revision-controlled collection of files.

Method and Computer Program Product for Finding the Longest Common Subsequences Between Files with Applications to Differential Compression

A differential compression method and computer program product combines hash value techniques and suffix array techniques. The invention finds the best matches for every offset of the version file, with respect to a certain granularity and above a certain length threshold. The invention has two variations depending on block size choice. If the block size is kept fixed, the compression performance of the invention is similar to that of the greedy algorithm, without the expensive space and time requirements. If the block size is varied linearly with the reference file size, the invention can run in linear-time and constant-space. It has been shown empirically that the invention performs better than certain known differential compression algorithms in terms of compression and speed.
Owner:IBM CORP

Automated document revision markup and change control

Automated comparison of Darwin Information Typing Architecture (DITA) documents for revision mark-up includes reading document data from first and second DITA documents into respective document object model trees of nodes, and identifying and collapsing emphasis subtree nodes in the trees into their parent nodes, the collapsing caching emphasis data from the identified subtree nodes. A preorder traversal transforms the model trees into respective pre-order node lists and captures adjacent sibling emphasis subtree nodes as single text nodes. The node lists are merged into a merged node list via a longest common subsequence process that recognizes matches node pairs having primary sort key information and document structure metadata meeting a match threshold, with differences between matching tokens of the node pairs saved. A merged document object model built from the refined merged node list is transformed into a hypertext mark-up language document.
Owner:IBM CORP

User classifying method and system based on mobile user trajectory similarity

The invention relates to a user classifying method and system based on mobile user trajectory similarity; the method comprises: receiving motion trajectory data of mobile users and extracting time-position information of each mobile user; using FP (frequent pattern) tree to mine a trajectory frequent sequence of the corresponding mobile user by using average stay length of the mobile user at each base station as a weight according to the time-position information; extracting according to the trajectory frequent sequence and a preset weighted support threshold to obtain a resident site of the corresponding mobile user, calculating trajectory similarity results of the mobile users through longest common subsequence algorithm according to the resident sites of the mobile users, and classifying the mobile users according to the trajectory similarity results. The FP tree is used to mine the trajectory frequent sequence of the corresponding mobile user and find the resident site of the mobile user by using the average stay length of the mobile user at each base station as a weight, user trajectory law can be guaranteed, data quantity can also be decreased, and calculation complexity is reduced.
Owner:GCI SCI & TECH

Method for finding the longest common subsequences between files with applications to differential compression

A differential compression method and computer program product combines hash value techniques and suffix array techniques. The invention finds the best matches for every offset of the version file, with respect to a certain granularity and above a certain length threshold. The invention has two variations depending on block size choice. If the block size is kept fixed, the compression performance of the invention is similar to that of the greedy algorithm, without the expensive space and time requirements. If the block size is varied linearly with the reference file size, the invention can run in linear-time and constant-space. It has been shown empirically that the invention performs better than certain known differential compression algorithms in terms of compression and speed.
Owner:INT BUSINESS MASCH CORP

Measuring system for similarity between different tracks and measuring method for measuring system

The invention discloses a measuring system for similarity between different tracks. The system comprises a track data file uploading module and a calculation result visualization module, wherein the track data file uploading module passes a data preprocessing module and obtains a most similar section construction module, and lastly, the most similar section construction module is transmitted to a user through the calculation result visualization module. A measuring method for the measuring system for similarity between the different tracks comprises the following steps of 1.uploading data; 2.judging whether a track data file is legal, if yes, carrying out preprocessing, if no, carrying out error display; 3.evaluating similarity between the tracks through a module for calculating the similarity between the tracks; 4.searching two most similar tracks through the most similar section construction module; and 5.displaying a calculation result through the calculation result visualization module. According to the system and the method, the result is more accurate, and meanwhile, the most similar sections between the tracks are constructed by utilizing a longest common subsequence method.
Owner:HOHAI UNIV

Method for automatically extracting sentence template

The invention relates to a method for automatically extracting sentence templates which comprises the following steps that: a text is divided into a plurality of sentences according to the punctuation; serial numbers are marked in front of the sentences according to the sequence; each sentence obtained by sentence separation is divided into small blocks based on each word by using word separation technology; after the word separation is finished, the sentences are divided into a plurality of groups with ascending order or descending order according to the quantity of the words in the sentences; the sentence template can simply be obtained by applying the sentences in the same group with LCS algorithm to obtain a longest public subsequence. The invention can automatically and efficiently statisticize commonly used words and sentences from plenty of text information.
Owner:IFLYTEK CO LTD

Text similarity determination method

The invention provides a text similarity determination method which can improve matching accuracy. The method comprises the steps of expressing pre-determined unit names in a knowledge base with spelling; receiving text input by a user, and extracting unit names in the received text and using spelling to express the unit names; performing one-to-one matching on the unit names in the text expressed with the spelling and each unit name in the knowledge base expressed with spelling, and calculating similarity of a longest common subsequence based on spelling; sorting knowledge in the knowledge base according to the similarity of the longest common subsequence based on spelling, and selecting from the knowledge base one piece of knowledge closest to the text input by the user, wherein each piece of knowledge includes unit name. The invention relates to the field of artificial intelligence.
Owner:北京四海心通科技有限公司

Data compression utilizing longest common subsequence template

In response to receipt of an input string, an attempt is made to identify, in a template store, a closely matching template for use as a compression template. In response to identification of a closely matching template that can be used as a compression template, the input string is compressed into a compressed string by reference to a longest common subsequence compression template. Compressing the input string includes encoding, in a compressed string, an identifier of the compression template, encoding substrings of the input string not having commonality with the compression template of at least a predetermined length as literals, and encoding substrings of the input string having commonality with the compression template of at least the predetermined length as a jump distance without reference to a base location in the compression template. The compressed string is then output.
Owner:IBM CORP

Method for automatically splicing scrap images through computer

The invention provides a fragment matching method based on contour curvature. The method comprises steps as follows: firstly, fragments are pre-processed, a curvature string of a contour curve of each fragment is acquired, then each closed contour curve is segmented into curve edges by the aid of contour corners, regular fragment boundaries are rejected, and finally, longest matched curve sections among contours are found out with an LCS (longest common subsequence) algorithm and spliced. According to the method, on one hand, the matched curve section search range is defined by the aid of the contour corners of the fragments, so that the time for searching the matched contours is shortened, and the time complexity of the algorithm is greatly reduced; on the other hand, the regular fragment boundaries with constant curvature are rejected, so that mismatching is reduced, and the accuracy of the algorithm is improved.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Text paragraph identification comparison method and system based on longest common subsequence

ActiveCN108734110ATo achieve the purpose of comparisonSolve the problem of comparisonCharacter and pattern recognitionLongest common subsequence problemTheoretical computer science
The invention discloses a text paragraph identification comparison method and a text paragraph identification comparison system based on a longest common subsequence. The text paragraph identificationcomparison method comprises the steps of acquiring a first text character string and a second text character string; performing paragraph identification on the first text character string and the second text character string; performing paragraph order adjustment on the first text character string and the second text character string; and comparing the first text character string and the second text character string which are subjected to paragraph order adjustment to obtain a difference item. The text paragraph identification comparison system comprises a front end, a conversion module, a paragraph identification module and a comparison module. With the text paragraph identification comparison method and the text paragraph identification comparison system based on the longest common subsequence, the problems that texts whose paragraph information cannot be acquired cannot be compared and the paragraph adjustment situation cannot be processed well in an existing text comparison tool are solved.
Owner:DATAGRAND TECH INC

Method and system for automatically extracting virus characteristics based on family samples

The invention provides a method and a system for automatically extracting virus characteristics based on family samples. According to the method and the system, a longest public subsequence algorithm is modified, a sequence A and a sequence B are established by using samples in the family samples, Hash values of subsequences with lengths equal to preset values in the sequence A and the sequence B are calculated respectively through preset feature code lengths, and the Hash values of the subsequences in the sequence A and the sequence B are matched through a red black tree manner, if the Hash values are same, the subsequences corresponding to the Hash values are public subsequences of the sequence A and the sequence B, and the public subsequences are feature codes of the family samples; and when surplus samples are taken as the sequence B and searched in a red black tree, feature codes of all family samples are obtained and combined into a feature set of the family samples, a weighting model is evaluated according to qualities of the established feature codes, the qualities of the established feature codes are judged, and the feature codes of the family samples are determined. According to the method, the time complexity of the algorithm is simplified, and the extraction efficiency and the accuracy of the feature codes are improved.
Owner:HARBIN ANTIY TECH

Semantic recognition method and device

The invention discloses a semantic recognition method and device, and relates to the technical field of computers. The method comprises the specific steps that to-be-recognized statement information is acquired; the to-be-recognized statement information and preset statement templates are matched based on the longest common subsequence, and a matching result with a weight value is determined; semantic recognition is carried out according to the matching result. The method can perform more accurate semantic recognition on the acquired to-be-recognized statements on the basis of the matching method with the weight value of the longest common subsequence according to the preset statement templates, completely utilize statement template information, and be higher in flexibility and high in efficiency. In addition, the method can update the statement templates in real time in a short time to perform subsequent testing feedback.
Owner:BEIJING HUIJUN TECH CO LTD

Text comparison method and device, storage medium and electronic equipment

The invention relates to a text comparison method and device, a storage medium and electronic equipment. The method comprises the steps that a first text and a second text are partitioned; for a character block pair composed of the character block in the first text and the character block in the second text, the longest common sub-sequence pair of the character block pair is obtained; and according to the character information between two adjacent longest common subsequence pairs, difference description information is generated, and the difference description information is used for displayingthe character difference of the character block pair. According to the technical scheme, text comparison is carried out on the basis of the text block pairs composed of the text blocks, and the efficiency is higher than the efficiency and accuracy of text comparison with lines or words as the minimum unit.
Owner:NEUSOFT CORP

Analysis and integration method and device for sequencing of medium-short gene segment

The present invention provides an analysis and integration method and device for sequencing of a medium-short gene segment. The method comprises: checking a read sequence and removing gene sequences comprising errors and unreliable information; reading processed read data, analyzing the data and constructing a k-mer structure and a quad-tree structure; constructing an integration storage table and recording the progress condition of the integration process and read information which currently participates in integration; after selecting initial k-mer to start to carry out integration, continuously selecting subsequent k-mer according to an integration scoring formula, and updating the information in the integration storage table structure in real time so as to obtain contig sequences; and combining the contig sequences on the basis of a longest common subsequence method by utilizing read-pair information and generating and outputting super-contig. Aiming at the special requirements of the integration method for performance, the device provided by the present invention is of an embedded handheld structure; and by utilizing the method and the device which are provided by the present invention, analysis and integration on sequencing of the medium-short gene segment can be rapidly and accurately implemented.
Owner:XI AN JIAOTONG UNIV

Textual similarity calculation method and device, and intelligent robot

Embodiments of the invention provide a textual similarity calculation method and a textual similarity calculation device, and an intelligent robot. The embodiments of the invention comprise the stepsof: first obtaining the longest common subsequences of two texts, then calculating the intersection and the union of the vocabulary sets corresponding to the two texts, and calculating according to the obtained intersection and union to obtain a first similarity, calculating a second similarity by using the vocabulary sets corresponding to the longest common subsequences and the previously obtained union, and finally calculating according to the first similarity and the second similarity to obtain a target similarity of the two text. The above technical solution combines the longest common subsequences and each vocabulary in the text to calculate the similarity of the two texts, thereby effectively improving the calculation accuracy of the text similarity. Progressively, chat robots or intelligent robots may provide users with more accurate answers by using accurate text similarity, which improves the service quality of the chat machines or intelligent robots, and the user experience.
Owner:北京玄一科技有限公司

An Evolutionary Analysis Method for Association Networks in Forums

The invention provides a method for analyzing the evolution of associated networks in forums, 1. Reasonable time division and segment extraction; 2. Measurement parameters for measuring community evolution; 3. Longest common subsequence algorithm for solving public nodes in groups; 4. , Discovery of cut points in community network graph based on DFS algorithm. The present invention is applicable to different applications, networks of different scales, and networks of different types. It can be used as long as it can be converted into an undirected graph and follows the same principles. Not only can it be used for the visualization of community network discovery, but also it can analyze the changes of the extracted network over time. Through the analysis of multiple posts, it is found that the evolution process of different posts is also different. For posts with unattractive and boring content, they only have a certain number of replies in the initial stage of posting, and the number of generated communities is relatively small. Important There are also relatively few active nodes. And for those attractive posts, it will last for a long time and the number of communities is large.
Owner:HARBIN INST OF TECH

Hotspot aggregation method and device

The present invention discloses a hotspot aggregation method and device. The method comprises: capturing network resources on the Internet; matching the network resources by means of a longest common subsequence (LCS) algorithm to acquire matching results; and generating hotspot phrases based on the matching results. By means of the technical solutions of the present invention, the operation and maintenance cost and the complexity of hotspot aggregation calculation can be reduced, the speed of hotspot aggregation is improved, real-time acquisition and real-time calculation can be achieved, and hotspot events can be discovered fast without substantial delay.
Owner:BEIJING QIHOO TECH CO LTD

Method for building integrated enterprise process reference model based on model combination

The invention discloses a method for building an integrated enterprise process reference model based on model combination, which belongs to the field of modeling technology. The invention is characterized in that the method is realized on a platform formed by sequentially connecting a plurality of users, the Internet and servers, and an enterprise process reference model can be semi-automatically obtained by converting original enterprise models into character strings for combination, wherein the enterprise models are two modeled enterprise process models which are selected form an enterprise model database, approved by modeling experts, and put into practice. A best-fit enterprise model is obtained by the following steps of: firstly, obtaining a longest common subsequence of two process sequences; and then respectively building an auxiliary progression based on the longest common subsequence for the two process sequences; and finally obtaining the Levenshtein distance of the minimum operation time for covering one process sequence into the other process sequence by basic operation. The invention can effectively improve the reusage of enterprise model knowledge so as to effectively improve the modeling efficiency.
Owner:TSINGHUA UNIV

User interface for regular expression generation

Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and / or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and / or spans, and may generate a regular expression based upon the longest common subsequence.
Owner:ORACLE INT CORP

Public resource transaction data-oriented cleaning and duplicate removal method and system

The invention relates to a public resource transaction data-oriented cleaning and duplicate removal method and system. The texts corresponding to public resource transaction data are stored in a dataset in a text data record form, wherein the data sets are grouped according to a preset rule, the number of the text data records in each group is controlled, and the data similarity among the text data records in each group is calculated based on the longest common subsequence. When the data similarity between the two text data records is larger than a preset threshold value, the named entity information of the two text data records is further compared, and when the named entity information of the two text data records is the same, it is judged that the two text data records belong to the repeated data, and otherwise it is judged that the two text data records belong to the non-repeated data. The repeated information in the public resource transaction data is determined in a multi-dimensional cross validation mode, so that the misjudgment of the repeated data can be further prevented on the basis of improving the text processing performance.
Owner:GLODON CO LTD

Length of the longest common subsequence algorithm optimization

Systems and methods perform various optimizations of an LLCS algorithm for use in determining if a set of input sequences are similar to a query sequence. The optimizations include filtering out sequence from the set of input sequences where the estimated similarity of the sequence with the query sequence is below a threshold value. The remaining sequences can then be provided to an LLCS algorithm where the output of the LLCS algorithm is used in a similarity function to determine an actual similarity of an input sequence with a query sequence.
Owner:AVAST SOFTWARE

Regular expression generation using longest common subsequence algorithm on combinations of regular expression codes

Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and / or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and / or spans, and may generate a regular expression based upon the longest common subsequence.
Owner:ORACLE INT CORP

A streaming on-line log analysis method

The invention discloses a streaming on-line log analysis method, which partitions the log according to its length by using the idea of partition, that is, the log of the same length is distributed tothe same partition. After the log partition, the fast matching phase determines whether the log belongs to the current log type by calculating whether the intersection of the log and the log type meets a threshold. After the log type to which the log belongs is quickly matched, the log type extraction phase extracts the log type and log parameters by finding the longest common sub-sequence of thelog and log type. The method can effectively parse the system log and parse the log from unstructured text to structured log type, and the result can be used for abnormal detection of the log, and themethod is simple and effective.
Owner:XI AN JIAOTONG UNIV

Method and apparatus for obtaining similar trademarks, computer device and storage medium

The present application relates to a method and apparatus for obtaining similar trademarks, a computer device and a storage medium. The method is based on the longest common subsequence, the longest common string, and the edit distance to determine the similarity of the character string integrity of the word mark and the prior trademark, thereby screening out similar trademarks with higher similarity and treating them again. The similarity between the word and the glyph is determined by the difference characters between the character trademark to be detected and the similarly preceding trademark. the comprehensive similarity is calculated by the overall character string judgment result and the character independent judgment result, and the prior trademark having high similarity degree is fed back to the user, the quick acquisition of the approximate trademark of the trademark to be detected is realized, and the efficiency of the approximate trademark search is improved. At the same time, the approximation judgment between the detected word mark and the prior trademark is synthesized by the overall judgment result of the string and the independent judgment result of the character. The results are determined to effectively improve the accuracy of the approximate trademark search.
Owner:PING AN TECH (SHENZHEN) CO LTD

Data compression utilizing longest common subsequence template

In response to receipt of an input string, an attempt is made to identify, in a template store, a closely matching template for use as a compression template. In response to identification of a closely matching template that can be used as a compression template, the input string is compressed into a compressed string by reference to a longest common subsequence compression template. Compressing the input string includes encoding, in a compressed string, an identifier of the compression template, encoding substrings of the input string not having commonality with the compression template of at least a predetermined length as literals, and encoding substrings of the input string having commonality with the compression template of at least the predetermined length as a jump distance without reference to a base location in the compression template. The compressed string is then output.
Owner:IBM CORP

Update searching method and device for file comparison, storage medium and equipment

The invention relates to an updating and searching method and device for file comparison, a storage medium and equipment, and the method comprises the steps: taking the content of each preset unit asan element in a first file and a second file, and comparing the first file with the second file, so as to obtain the longest common subsequence of the first file and the second file; performing indexalignment on the common elements in the first file and the common elements in the second file according to the longest common subsequence; and determining update elements in the first file and the second file according to a position corresponding relation between the common element gap where the deleted elements in the first file are located and the common element gap where the added elements in the second file are located. According to the method and the device, comparison between the files and alignment of the common contents can be realized without depending on a complex algorithm, so thatupdate contents between the files can be searched according to the aligned common contents, the implementation difficulty can be reduced, and the method and the device are easy to implement.
Owner:NEUSOFT CORP

Regular expression generation using longest common subsequence algorithm on spans

Disclosed herein are techniques related to automated generation of regular expressions. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and / or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and / or spans, and may generate a regular expression based upon the longest common subsequence.
Owner:ORACLE INT CORP

DRC-based hotspot detection considering edge tolerance and incomplete specification

A range-pattern-matching-type DRC-based process hotspot detection is provided that considers edge tolerances and incomplete specification (“don't care”) regions in foundry-provided hotspot patterns. First, all possible topological patterns are enumerated for the foundry-provided hotspot pattern. Next, critical topological features are extracted from each pattern topology and converted to critical design rules using Modified Transitive Closure Graphs (MTCGs). Third, the extracted critical design rules are arranged in an order that facilitates searching space reduction techniques, and then the DRC process is sequentially repeated on a user's entire layout pattern for each critical design rule in a first group, then searching space reduction is performed to generate a reduced layout pattern, and then DRC process is performed for all remaining critical design rules using the reduced layout pattern. Candidate locations are then identified using the DRC results, and then the true hotspot locations are confirmed using longest common subsequence and linear scan techniques.
Owner:SYNOPSYS INC

Literature classification method and system based on trie and LCS algorithm

The invention discloses a document classification method based on trie and LCS algorithm, comprising the following steps: step 1, pre-compiling an initial classification dictionary and an initial exclusion dictionary; 2, extending each character string in the initial classification dictionary to obtain an extended character string, filtering the obtained extended character string according to theinitial exclusion dictionary, and constructing a dictionary tree; 3, calling that dictionary tree to look up all the strings appear in each sentence in the literature to be classified, taking the longest character string in the initial classification dictionary as the longest common subsequence, and taking the longest common subsequence and its corresponding class as the final character string andfinal class of the sentence, and taking the final class which appears most frequently in a document as the class to which it belongs. The invention also discloses a document classification system based on trie and LCS algorithm. The invention omits the word segmentation process, takes the stable character string as the characteristic, has high accuracy, and reduces the dependence on the context.
Owner:CHINA PETROLEUM & CHEM EXPLORATION & PRODION RES INST +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products