Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

1318 results about "Semantic similarity" patented technology

Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between them is based on the likeness of their meaning or semantic content as opposed to similarity which can be estimated regarding their syntactical representation (e.g. their string format). These are mathematical tools used to estimate the strength of the semantic relationship between units of language, concepts or instances, through a numerical description obtained according to the comparison of information supporting their meaning or describing their nature. The term semantic similarity is often confused with semantic relatedness. Semantic relatedness includes any relation between two terms, while semantic similarity only includes "is a" relations. For example, "car" is similar to "bus", but is also related to "road" and "driving".

Method For Information Retrieval

A method of retrieving documents using a search engine includes providing a reverse index including one or more keywords and a list of documents containing the one or more keywords, the reverse index further including a measure of confidence (MOC) value associated with the one or more keywords. One or more query terms are input into the search engine. The query terms are disambiguated and a MOC value is associated with each meaning of the disambiguated query term. A list of documents is retrieved containing the query terms wherein the documents are initially ranked based at least in part on the MOC values of the keywords and query terms. The list of documents may be re-ranked based at least in part on the semantic similarity of each document to the disambiguated query terms.
Owner:RGT UNIV OF CALIFORNIA

Techniques for similarity analysis and data enrichment using knowledge sources

The present disclosure relates to performing similarity metric analysis and data enrichment using knowledge sources. A data enrichment service can compare an input data set to reference data sets stored in a knowledge source to identify similarly related data. A similarity metric can be calculated corresponding to the semantic similarity of two or more datasets. The similarity metric can be used to identify datasets based on their metadata attributes and data values enabling easier indexing and high performance retrieval of data values. A input data set can labeled with a category based on the data set having the best match with the input data set. The similarity of an input data set with a data set provided by a knowledge source can be used to query a knowledge source to obtain additional information about the data set. The additional information can be used to provide recommendations to the user.
Owner:ORACLE INT CORP

Image captioning utilizing semantic text modeling and adversarial learning

The present disclosure includes methods and systems for generating captions for digital images. In particular, the disclosed systems and methods can train an image encoder neural network and a sentence decoder neural network to generate a caption from an input digital image. For instance, in one or more embodiments, the disclosed systems and methods train an image encoder neural network (e.g., a character-level convolutional neural network) utilizing a semantic similarity constraint, training images, and training captions. Moreover, the disclosed systems and methods can train a sentence decoder neural network (e.g., a character-level recurrent neural network) utilizing training sentences and an adversarial classifier.
Owner:ADOBE SYST INC

Text similarity, acceptation similarity calculating method and system and application system

The invention discloses a calculating method of text similarity degree and vocabulary meaning similarity degree and system and application system, which comprises the following steps: basing on vocabulary data bank; proceeding initialize; calculating; getting initial vocabulary meaning similarity degree among vocabulary in the vocabulary data bank; basing on the initial vocabulary meaning similarity degree; calculating initial semantic similarity degree among text; iterating semantic similarity degree among each text and vocabulary meaning similarity degree among vocabulary till constriction; constructuring final vocabulary meaning similar matrix with final vocabulary similarity degree; transforming the text vocabulary frequency vector of the initial text to the new text vocabulary text vocabulary frequency vector; calculating text similarity degree in the text collection. This invention can improve related property of current text especially about short text.
Owner:蒙圣光 +1

Information block extraction apparatus and method for Web pages

A method and apparatus for identifying coherent areas within a Web page. First, a Web page is parsed into an HTML DOM tree and an HTML tag token stream. Next, repeated-patterns are induced from the Web page. After filtering out improper repeated-patterns and generating corresponding instances of the repeated-patterns, the repeated-patterns are mapped back to corresponding regions in the Web page. Based on the mappings, a hierarchical RST tree containing information blocks is generated. Information items within the information blocks are detected then used to generate a hierarchical structural information block tree. Information blocks from the structural information block tree are then classified into text information blocks and link information blocks. Based on the classification and block semantic similarity, the bocks are clustered then grouped into semantic information blocks. The semantic information blocks contain main text information blocks and related link blocks which, if necessary, can be labeled.
Owner:FUJITSU LTD +1

System and method for similarity search of images

A system and method for an efficient semantic similarity search of images with a classification structure are provided. The system and method provide for building a semantic classification-search tree for the plurality of images, the classification tree including at least two categories of images, each category of images representing a subset of the plurality of images, receiving a query image, classifying the query image to select one of the at least two categories of images, and restricting the search for the image of interest using the query image to the selected one of the at least two categories of images.
Owner:THOMSON LICENSING SA

Summarized network graph for semantic similarity graphs of large corpora

Provided is a process including: obtaining a clustered graph, the clustered graph having three or more clusters, each cluster having a plurality of nodes of the graph, the nodes being connected in pairs by one or more respective edges; determining visual attributes of cluster icons based on amounts of nodes in clusters corresponding to the respective cluster icons; determining positions of the cluster icons in a graphical visualization of the clustered graph; obtaining, for each cluster, a respective subset of nodes in the respective cluster; determining visual attributes of node icons based on attributes of corresponding nodes in the subsets of nodes, each node icon representing one of the nodes in the respective subset of nodes; determining positions of the node icons in the graphical visualization based on the positions of the corresponding cluster icons of clusters having the nodes corresponding to the respective node icons; and causing the graphical visualization to be displayed.
Owner:QUID LLC

Methods and systems for creating and using an adaptive thesaurus

Methods and systems are provided for creating an adaptive thesaurus. A term pair including an index term and an expansion term is received. A recall gain, an expansion independence, and a semantic similarity of the term pair are calculated using a processor. Whether to store the term pair is determined based on the recall gain, the expansion independence, and the semantic similarity. The term pair is stored based on the determination. Methods and systems are provided for searching using an adaptive thesaurus. A search query including a query term is received. An expansion term stored in association with an index term matching the query term in the adaptive thesaurus is retrieved. Using a processor, the search query is expanded using the expansion term based on a recall gain, an expansion independence, and a semantic similarity.
Owner:RELX INC

Method for detecting code similarity based on semantic analysis of program source code

The invention discloses a method for detecting code similarity based on semantic analysis of a program source code, which relates to computer program analyzing technology and a method for detecting complex codes of computer software. The method solves the prior problems of low similarity detection accuracy and high computing complexity on the codes of different syntactic representations and similar semantemes, and also solves the problem of incapability of realizing large-scale program code similarity detection. The method comprises the following steps: resolving two segments of source codes to be detected into two control dependence trees of a system dependence graph respectively and executing basic code standardization respectively; utilizing a measure method to extract candidate similar code control dependence trees of the control dependence trees which are subjected to the basic code standardization; executing an advanced code standardization operation on extracted candidate similar codes; and computing semantic similarity to obtain a similarity result so as to finish the code similarity detection. The method is applied to source code piracy detection, software component library query, software defect detection, program comprehension and the like.
Owner:HARBIN INST OF TECH

Method and apparatus for semantic search of schema repositories

Mechanisms for searching XML repositories for semantically related schemas from a variety of structured metadata sources, including web services, XSD documents and relational tables, in databases and Internet applications. A search is formulated as a problem of computing a maximum matching in pairwise bipartite graphs formed from query and repository schemas. The edges of such a bipartite graph capture the semantic similarity between corresponding attributes of the schema based on their name and type semantics. Tight upper and lower bounds are also derived on the maximum matching that can be used for fast ranking of matchings whilst still maintaining specified levels of precision and recall. Schema indexing is performed by ‘attribute hashing’, in which matching schemas of a database are found by indexing using query attributes, performing lower bound computations for maximum matching and recording peaks in the resulting histogram of hits.
Owner:IBM CORP

Problem cluster-based automatic asking and answering method and device

ActiveCN103810218AMeet needsImplement automatic question answeringWeb data indexingText database indexingUser inputData mining
The invention provides a problem cluster-based automatic asking and answering method and device. The method comprises the steps of clustering problems in an asking and answering database based on semantic similarity in advance to obtain more than one problem clusters, and determining fine quality answers corresponding to the problem clusters from answers of the problems in the problem clusters, thus forming a cluster-format asking and answering database; when the problem input by a user is obtained, determining the problem cluster with the highest semantic similarity with the problem input by the user in the cluster-format asking and answering database and returning the fine quality answer corresponding to the problem cluster to the user. According to the problem cluster-based automatic asking and answering method and device, efficient accurate automatic asking and answering can be realized aiming at the problem of the user and the user demands can be better met.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Question and answer recommendation method based on artificial intelligence, device and computer device

The application presents a question and answer recommendation method based on artificial intelligence, a question and answer recommendation device and a computer device. The question and answer recommendation method based on artificial intelligence includes steps of receiving search terms input by a user, wherein the search terms are questions; matching the search terms with questions having answers in a question and answer knowledge base; if the terms in question and answer knowledge base are not matched with the search terms completely, calculating the semantic similarity between the search terms and the questions having answers in the question and answer knowledge base; judging if the question and answer knowledge base has a question of which semantic similarity with the search terms is larger than a preset threshold value; if it does, using the answer of which semantic similarity with the search terms is more than the preset threshold value as the answer of the search term, and recommending to a user. Answer recommendation is realized according to the semantic similarity, thus the meaning transfer risk is reduced and the interference by meaningless oral expression element is reduced.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Text-based query expansion and sort method in image retrieval

InactiveCN101901249AGuaranteed a high degree of commonalityImprove accuracySpecial data processing applicationsData setImage retrieval
The invention belongs to the field of multimedia information retrieval and relates to a method for realizing thesaurus-based query expansion and sort in image retrieval. The method comprises a WordNet-based English word semantic similarity metric algorithm, a HowNet-based Chinese word semantic similarity metric algorithm, an expansion rule-based query expansion word selection and optimization algorithm and a retrieval result evaluation and optimization algorithm. In the method, an image search engine is improved by the relevant text processing method and the relevant semantic network dictionary; and the retrieval result is sorted through semantic expansion, user interaction and improved similarity measurement. Compared with the traditional method, the method has the advantages of high accuracy rate, high integrality and low space-time cost. The method has very important significance for performing high-efficiency image retrieval according to image high-layer semantic information and on the basis of a large-scale image data set, and has wide application value in the field of cross-linguistic and cross-media retrieval.
Owner:FUDAN UNIV

Mongolian-Chinese machine translation method for enhancing semantic feature information based on Transformers

The invention provides a Mongolian-Chinese machine translation method for enhancing semantic feature information based on a Transformer model. The method comprises the following steps: firstly, starting from the language characteristics of Mongolian, finding out the characteristics of the additional components of the Mongolian in terms of stem, affixes and lattices, and merging the language characteristics into the training of a model; secondly, distributed representation for measuring the similarity between the two words is taken as a research background, and the influence of depth, density and semantic coincidence degree on the concept semantic similarity is comprehensively analyzed; in the translation process, a Transformer model is adopted, and the Transformer model is a multi-layer encoder which performs position encoding by using a trigonometric function and is constructed on the basis of an enhanced multi-head attention mechanism. A decoder architecture, which completely dependson the mechanism of attention to draw the global dependency between the input and the output, eliminates recursion and convolution.
Owner:INNER MONGOLIA UNIV OF TECH

Telecommunication field package recommending method based on intelligent customer service robot interaction

The invention provides a telecommunication field package recommending method based on intelligent customer service robot interaction. The method comprises the following steps: a. acquiring a user interest model; and b. recommending individual demand-satisfied package service for a user by adopting a decision tree algorithm. The invention further provides an intelligent customer service robot system recommending engine device. Compared with the existing recommending method, the telecommunication field package recommending method is based on a scene interaction model in a recommending process, and can be used for carrying out similarity calculation according to similarity among the calculation labels, calculating the similarity among the labels by combining similarity between semantic similarity calculation with the traditional TF-IDF (Term Frequency-Inverse Document Frequency) in the similarity calculation, better reflecting the characteristics of users and resources by applying labels to show resource and user models, and improving the recommending quality.
Owner:EAST CHINA NORMAL UNIV

Automatic question-answer processing method and automatic question-answer system

The invention discloses an automatic question-answer processing method and an automatic question-answer system. The method includes: acquiring question text from question-answer data pairs collected in advance, performing word separation on the question text to obtain the corresponding key words of the question text, and building the index relation between the key words and the question text; whenoptional target question text is received, and performing word separation on the target question text to acquire target key words corresponding to the target text question text; according to the index relation of the key words and the question text, determining key words matched with the target key words, and acquiring the question text having index relation with the key words to serve as the candidate question text; calculating the semantic similarity of the candidate question text and the target question text; determining an answer corresponding to the target question text according to thesemantic similarity. The method has the advantages that the semantic similarity of the target question text and each question text is considered to determine the answer of the target question text, and the accuracy of automatic question-answer processing is increased.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Pivoting from a graph of semantic similarity of documents to a derivative graph of relationships between entities mentioned in the documents

Provided is a process including: obtaining a first graph comprising nodes and edges, each of the first-graph edges linking two of the first-graph nodes and denoting semantic similarity of unstructured text in documents corresponding to the two linked first-graph nodes; for each of the first-graph nodes, selecting nodes for a second graph from attributes of the unstructured text documents to which the first-graph node corresponds, wherein the attributes are entities mentioned in the unstructured text documents, and wherein each of the second-graph nodes corresponds to a respective selected attribute; and for each pair of the second-graph nodes, determining a respective edge weight indicating similarity between a first entity corresponding to a first node of the respective pair and a second entity corresponding to a second node of the respective pair.
Owner:QUID LLC

Crowd portrayal system and method based on microblog label

The invention belongs to the technical field of wireless communication networks and particularly discloses a crowd portrayal system and a crowd portrayal method based on a microblog label. The system comprises two main modules of a microblog label recommendation module and a label theme clustering module; the first module adopts a label recommendation algorithm covering three steps, wherein a first step is homogeneous label recommendation, a second step is co-occurrence label extension, a third step is that a semantic network is built on the basis of a Chinese mapping knowledge domain, the semantic similarity between labels is measured by using a network topology property, the labels with same or similar semantics are thus removed, and the refining property of the label used for portraying a user is ensured. According to the system and the method, the condition that the commercial application value of the label of the microblog user is wide is utilized, and the research direction is indicated for the mining algorithm of labels of internet users and the application of the Chinese mapping knowledge domain.
Owner:FUDAN UNIV

Intelligent response method, electronic device and storage medium

The invention provides an intelligent response method, which comprises the following steps that: after a consultation question is preprocessed, constructing an inverted index for a question and answerknowledge base; through an inverted index query way, inquiring a candidate question set related to the consultation question from the question and answer knowledge base; aiming at each candidate question in the candidate question set; independently calculating a question similarity between the consultation question and the candidate question, wherein the question similarity is obtained through the linear weighting of a text similarity, a semantic similarity, a theme similarity and a syntax similarity between the consultation question and the corresponding candidate question; and finally, selecting a candidate question corresponding to the highest question similarity obtained by calculation, and inquiring the associated answer of the selected candidate question in the question and answer knowledge base as a target answer to be output. The invention also provides an electronic device and a storage medium. By use of the intelligent response method, the accuracy and the response efficiency of intelligent response can be improved, and service quality is improved.
Owner:PING AN TECH (SHENZHEN) CO LTD

Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations

Determining semantically equivalent text or questions using hybrid representations based on neural network learning. Weighted bag-of-words and convolutional neural networks (CNN) based distributed vector representations of questions or text may be generated to compute the semantic similarity between questions or text. Weighted bag-of-words and CNN based distributed vector representations may be jointly used to compute the semantic similarity. A pair-wise ranking loss function trains neural network. In one embodiment, the parameters of the system are trained by minimizing a pair-wise ranking loss function over a training set using stochastic gradient descent (SGD).
Owner:IBM CORP

Method and system to compose software applications by combining planning with semantic reasoning

A system and method for composing application services includes an indexing module configured to index words in a request and available application descriptions to create a semantic similarity map. A semantic matcher is configured to determine semantic similarity between concepts / terms in both domain-independent and domain-specific ontologies for the semantic similarity map. A prefiltering module is configured to determine candidate compositions for the request based on the semantic similarity map and the available descriptions. A metric guided composition method is configured to run algorithms to generate a set of alternative compositions by determining which applications can be composed with which others using the semantic similarity map.
Owner:IBM CORP

Chinese network review emotion classification method based on integrated study frame

The invention discloses a Chinese network review emotion classification method based on an integrated study frame. According to the method, a part-of-speech combination mode, an order-preserving sub-matrix mode and a frequent word sequence mode are adopted as input characteristics, in the level of characteristics, factors of the influence of Chinese word order information, interval phrase characteristics and the sentence length are considered, and the characteristic vector sparsity problem is solved through semantic similarities; the problem that many review text characteristics exist is solved, the inter-base-classifier independence is guaranteed, and the classification performance of base classifiers is improved as much as possible; a base classifier algorithm constructed based on product attributes is adopted to comprehensively review emotion information of each attribute in a text, and then the sentence-level emotional tendency of reviews is judged, so that a final classification result is more accurate. The Chinese network review emotion classification method based on the integrated study frame is applicable to e-commerce network review emotion classification in various fields, can make a potential consumer know evaluation information of a commodity before purchase and can also make a merchant better sufficiently know the consumer's opinion, and therefore the service quality is improved.
Owner:NANJING SILICON INTELLIGENCE TECH CO LTD

Theme word vector and network structure-based theme keyword extraction method

The invention discloses a theme word vector and network structure-based theme keyword extraction method, and particularly relates to the technical field of extracting keywords from texts. The theme word vector and network structure-based theme keyword extraction method comprises the following steps of: carrying out theme clustering on a text corpus on the basis of an LDA theme model, and obtaining100 keywords, relevancies of which with each theme are top 100 in the theme; expressing each word in the text corpus as a word vector by utilizing word2vec, obtaining a semantic similarity between every two words through calculation, and respectively calculating 5 words, semantic similarities of which with each keyword in the keywords are top 5, wherein the keywords and the words, the semantic similarities of which with each keyword are top 5 form a new keyword set; and constructing a keyword network and obtaining the top 20 words in each set to serve as keywords of the theme. According to the method, keywords which have relatively high word frequencies in documents can be extracted, and keywords which have relatively word frequencies and are strongly associated with themes can be effectively discovered.
Owner:SHANDONG UNIV OF SCI & TECH

Translation confidence scores

A confidence scoring system can include a model trained using features extracted from translations that have received user translation ratings. The features can include, e.g. sentence length, an amount of out-of-vocabulary or rare words, language model probability scores of the source or translation, or a semantic similarity between the source and a translation. Parameters of the confidence model can then be adjusted based on a comparison of the confidence model output and user translation ratings, where the user translation ratings can be selected or weighted based on a determination of individual user fluentness. After the confidence model has been trained, it can produce confidence scores for new translations. If a confidence score is higher than a threshold, it can indicate the translation should be selected for automatic presentation to users. If the confidence score is below another threshold, it can indicate the translation should be updated.
Owner:META PLATFORMS INC

Text similarity measuring system based on multi-feature fusion

The invention provides a text similarity measuring system based on multi-feature fusion and relates to the field of intelligent information processing. According to the system, the text similarity is measured by fusing multiple features based on word frequencies, word vectors and Wikipedia labels. The invention aims to solve the problem of semantic loss caused by non-considering of contexts in a conventional text similarity measuring system and the problem of low similarity result accuracy caused by larger text length difference. The text similarity measuring system is implemented by the following steps: carrying out preprocessing such as word segmentation and stop word removal on a training text; training corpora of the processed training text as a word vector model; measuring the similarity based on the word frequencies, the similarity based on the word vectors and the similarity based on the Wikipedia labels between input text pairs to be computed, and carrying out weighted summation to obtain a final text semantic similarity measuring result. According to the system, the measurement accuracy of the text similarities can be improved, so that the requirement on intelligent information processing is met.
Owner:XINJIANG TECHN INST OF PHYSICS & CHEM CHINESE ACAD OF SCI

Measuring accuracy of semantic graphs with exogenous datasets

Provided is a process including: obtaining a semantic similarity graph having nodes corresponding to documents in an analyzed corpus and edges indicating semantic similarity between pairs of the documents; for at least a plurality of nodes in the graph, evaluating accuracy of the edges based on neighboring nodes and an external corpus by performing operations including: identifying the neighboring nodes based on adjacency to the respective node in the graph; selecting documents from an external corpus based on references in the selected documents to entities mentioned in the documents of the neighboring nodes; and determining how semantically similar the respective node is to the selected documents.
Owner:QUID LLC

Video bullet screen filtering method and device

The embodiment of the invention provides a video bullet screen filtering method and device. One specific embodiment of the method comprises the following steps: acquiring a current video bullet screen text to be pushed and target users; judging whether the target users already set filtering conditions or not; when the target users already set the filtering conditions, performing semantic analysis on the current video bullet screen text to be pushed through a preset semantic analysis method, and determining a semantic frame of the current video bullet screen text; calculating semantic similarities between the semantic frame of the current video bullet screen text and semantic frames for filtering according to the semantic frame of the current video bullet screen text and at least one semantic frame for filtering; and determining whether to filter the current video bullet screen or not for the target users on the basis of the semantic similarities. Through the embodiment, user operations can be simplified, and the video bullet screen filtering efficiency is increased.
Owner:BEIJING QIYI CENTURY SCI & TECH CO LTD

Statistics-based machine translation method and apparatus, and electronic device

The present invention discloses a statistics-based machine translation method and apparatus and an electronic device, a semantic similarity-degree calculation method and apparatus and an electronic device, and a word quantization method and apparatus and an electronic device. The statistics-based machine translation method comprises: according to a feature that affects a translation probability and that is of each candidate translation and a pre-generated translation probability prediction model generating a translation probability of a sentence to be translated into each candidate translation, wherein the feature that affects the translation probability at least comprises a semantic similarity-degree between the sentence to be translated and the candidate translation; and selecting a preset number of candidate translations whose translation probabilities rank top as a translation of the sentence to be translated. By adoption of the statistics-based machine translation method provided by the present application, the semantic level of the natural language can be reached deeply when the machine translation model is constructed, and the deviation of semantics between the translation and the source text is avoided, so as to achieve the effect of improving translation quality.
Owner:阿里巴巴(中国)网络技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products