Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

50 results about "Latent semantic indexing" patented technology

Latent semantic indexing is an indexing and retrieval method that uses a mathematical technique called singular value decomposition to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts. LSI is also an application of correspondence analysis, a multivariate statistical technique developed by Jean-Paul Benzécri in the early 1970s, to a contingency table built from word counts in documents. Called Latent Semantic Indexing because of its ability to correlate semantically related terms that are latent in a collection of text, it was first applied to text at Bellcore in the late 1980s. The method, also called latent semantic analysis, uncovers the underlying latent semantic structure in the usage of words in a body of text and how it can be used to extract the meaning of the text in response to user queries, commonly referred to as concept searches.

Method for document comparison and selection

Extensions to latent semantic indexing (LSI), including: phrase processing, creation of generalized entities, elaboration of entities, replacement of idiomatic expressions, and use of data fusion methods to combine the aforementioned extensions in a synergistic fashion. Additionally, novel methods tailored to specific applications of LSI are disclosed.
Owner:RELATIVITY ODA LLC

Word sense disambiguation

Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.
Owner:RELATIVITY ODA LLC

Method and system for facilitating the refinement of data queries

Refining a current query. Receiving information regarding the relevancy of documents retrieved from a document collection in response to a current query. Ranking the retrieved documents in accordance with the relevancy information. Forming a candidate query based on the rankings and analysis of locations of the retrieved documents in a latent semantic index vector space formed from the retrieved document. Applying the candidate query to the document collection. Ranking the documents retrieved in response to the candidate query in accordance with the received relevancy information. Comparing the ranking of documents retrieved in response to the candidate query and the ranking of documents retrieved in response to the current query with the received relevancy information. Choosing the query that produces the best ranking.
Owner:RELATIVITY ODA LLC

Automatic recommendation of products using latent semantic indexing of content

Techniques for using latent semantic structure of textual content ascribed to the items to provide automatic recommendations to the user. A user inputs a selected item and, in turn, a latent semantic algorithm is applied to the user selection and the textual content of the items in a database to generate a conceptual similarity between the selection and the items. A set of nearest items to the selected item is provided as a recommendation to the user of other items that may be of particular interest or relevance to the user's original selection based upon the conceptual similarity measure.
Owner:CONTENT ANALYST

System and method for hierarchical segmentation with latent semantic indexing in scale space

A system and method for automatically generating a hierarchical table of contents or outline for indexing a document and identifying clusters of related information in the document. The document may comprise text, audio, video, or a multimedia presentation. The invention employs a unique and novel combination of latent semantic indexing techniques to identify related blocks and major topic changes within the document with scale space segmentation techniques to respectively identify self-similar blocks within the document and to thus find topic changes of various sizes at block edges. The invention then produces a visual presentation of the semantic structure of the document.
Owner:IBM CORP

Differential LSI space-based probabilistic document classifier

A computerized method for automatic document classification based on a combined use of the projection and the distance of the differential document vectors to the differential latent semantics index (DLSI) spaces. The method includes the setting up of a DLSI space-based classifier to be stored in computer storage and the use of such classifier by a computer to evaluate the possibility of a document belonging to a given cluster using a posteriori probability function and to classify the document in the cluster. The classifier is effective in operating on very large numbers of documents such as with document retrieval systems over a distributed computer network.
Owner:SUNFLARE CO LTD

Semantic querying a peer-to-peer network

In a method of semantic querying in a peer-to-peer network, an item of information is mapped into a semantic vector based on the latent semantic indexing algorithm or any IR algorithms that can derive a vector representation. The semantic vector is associated with an address index as a key pair. The key pair is stored in an overlay network formed from the peer-to-peer network such that the stored key pair is proximally located to at least one other key pair having a similar semantic vector.
Owner:HEWLETT PACKARD DEV CO LP

Network-based method for analyzing opinion information in discrete text

The invention relates to a network-based system for analyzing opinion information in a discrete text, belonging to the field of network information safety. The system comprises the following modules: a discrete text information acquisition module which acquires network information in a preset analysis cycle, a discrete text information tracking and restoring module which restores ellipsis and remote anaphora in the original content to obtain a text which contains a relatively complete text structure and semantic information, a semantic information mining and characteristic extracting module which realizes semantic information mining and characteristic extracting on text information by utilizing a latent semantic indexing technology, an opinion information clustering module which realizes information clustering by combining a niche genetic algorithm with a K-Means method, a hot opinion event discovery module which mines the hot opinion in the obtained topic and event, and a background information processing and data supporting center which analyzes data and provides a repertoire specially for a network, new words in the network, the existing class information and the existing hot topics. By applying the invention, the problem that information analysis is influenced as the text structure of the existing network opinion information is incomplete, ellipsis and remote anaphora are more and the new works in the network are more is solved, and the accuracy for discovery of the opinion and hot event is improved by adopting a high-efficiency clustering method.
Owner:GUILIN UNIV OF ELECTRONIC TECH

Anchor Text-Based Focused Web Crawler Search Method and System

The invention discloses a search method for focused web crawler based on an anchor text and a system thereof. The method mainly comprises the following steps of obtaining a URL (uniform resource locator) from a URL priority query and downloading from the Internet to obtain a Web page according to the URL; analyzing the downloaded Web page and extracting the URL and the anchor text thereof; screening the extracted URL and anchor text thereof; and selecting an algorithm combined by TF-IDF (term frequency-inverse document frequency) and LSI (latent semantic indexing) to calculate a topic correlativity of the URL and putting the URL matched with the condition in the priority query. The system comprises a URL priority query, a web crawler downloader, a Web page library, a URL parser, a URL filter and a topic correlativity identifier. With the adoption of the search method of focused web crawler based on the anchor text and the system thereof, the topic correlativity of the crawling result of the focused web crawler and the crawling efficiency are improved.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

System and Method for Configuring Voice Readers Using Semantic Analysis

A system and method for using semantic analysis to configure a voice reader is presented. A text file includes a plurality of text blocks, such as paragraphs. Processing performs semantic analysis on each text block in order to match the text block's semantic content with a semantic identifier. Once processing matches a semantic identifier with the text block, processing retrieves voice attributes that correspond to the semantic identifier (i.e. pitch value, loudness value, and pace value) and provides the voice attributes to a voice reader. The voice reader uses the text block to produce a synthesized voice signal with properties that correspond to the voice attributes. The text block may include semantic tags whereby processing performs latent semantic indexing on the semantic tags in order to match semantic identifiers to the semantic tags.
Owner:ATKIN STEVEN +2

Region based multiple features Integration and multiple-stage feedback latent semantic image retrieval method

The invention discloses a latent semantic image retrieval method of region-oriented multi-feature integration and multi-level feedback. It uses result list returned by the initial keyword search, extracting a variety of region-oriented images characteristics, constructing attribute-image matrix, using latent semantic indexing algorithm to get the semantic space of image sets and semantic features of each image, and then using similar images by users feedback to construct or update image query vector, searching again the semantic space, calculating image semantics features and images inquiries vector similarity, getting outcome sets by descending order, and repeatable retrieval. The invention takes full advantage of image content information, making up for the deficiencies of the keyword search, and through the region-oriented multi-feature integration, enhances image content information from the bottom physical layer to the object layer, then further enhances to the semantic layer by HCI feedback, thereby reducing the gap between the image bottom features and high-level semantic, and allowing Web image retrieval to get higher retrieval accuracy.
Owner:HUAZHONG UNIV OF SCI & TECH

A latent semantic min-Hash-based image retrieval method

The invention relates to the technical field of image processing and in particular relates to a latent semantic min-Hash-based image retrieval method comprising the steps of (1) obtaining datasets through division; (2) establishing a latent semantic min-Hash model; (3) solving a transformation matrix T; (4) performing Hash encoding on testing datasets Xtest; (5) performing image query. Based on the facts that the convolution network has better expression features and latent semantics of primitive characteristics can be extracted by using matrix decomposition, minimizing constraint is performed on quantization errors in an encoding quantization process, so that after the primitive characteristics are encoded, the corresponding Hamming distances in a Hamming space of semantically-similar images are smaller and the corresponding Hamming distances of semantically-dissimilar images are larger. Thus, the image retrieval precision and the indexing efficiency are improved.
Owner:XI'AN INST OF OPTICS & FINE MECHANICS - CHINESE ACAD OF SCI

Computer-assisted memory translation scheme based on template automaton and latent semantic index principle

A new, more efficient memory translation algorithm facilitating the acquisition of a most appropriate translation in a target language from among those of nearly narrowed-down candidates of translation by separately applying the so-called dimension reducing functions of a template automaton and the LSI (latent semantic index) technique. Both the template automaton and the LSI principle play an important role in implementing an efficient process of narrowing down an efficient solution space from among the many example sentences of the databases in a target language by exploiting their respective unique search space reduction function. Once developed into a fully operational system, an expert editor rather than an expert translator can tune up the translation memory system, markedly widening the range of available experts who can utilize the system.
Owner:SUNFLARE CO LTD

Unit selection module and method for Chinese text-to-speech synthesis

InactiveUS20060095264A1Prevent inappropriate unit generationAvoid it happening againSpecial data processing applicationsSpeech synthesisNatural language processingStructural distance
This invention relates to a unit selection module for Chinese Text-to-Speech (TTS) synthesis, mainly comprising a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme; any Chinese sentence is firstly input and then parsed into a context-free grammar (CFG) by the PCFG parser; wherein there are several possible CFGs for every Chinese sentence, and the CFG (or the syntactic structure) with the highest probability is then taken as the best CFG (or the syntactic structure) of the Chinese sentence; the LSI module is then used to calculate the structural distance between all the candidate synthesis units and the target unit in a corpus; through the modified variable-length unit selection scheme, tagged with the dynamic programming algorithm, the units are searched to find the best synthesis unit concatenation sequence.
Owner:NAT CHENG KUNG UNIV

Semantic gene organizer

A semantic gene classification and annotation system, method and computer program can utilize Latent Semantic Indexing (LSI) to identify conceptually related genes based on textual information in biomedical literature, including MEDLINE citations. In addition, term weights calculated from the usage of the gene terms in and across gene documents can be used to automatically assign gene aliases and extend gene function annotation based upon primary biomedical literature.
Owner:UNIV OF TENNESSEE RES FOUND

Solution recommendation based on incomplete data sets

In accordance with one aspect of the present exemplary embodiment, a system determines a solution based on received data. An intake component receives an incomplete data set from one or more sources. A recommendation system transforms the incomplete data set into a semantic data set via latent semantic indexing, classifies the semantic data set into an existing cluster and provides one or more solutions of the existing cluster as one or more recommendations.
Owner:XEROX CORP

API (Application Programing Interface) tag recommendation method based on heterogeneous information

The invention discloses an API (Application Program Interface) tag recommendation method based on heterogeneous information, and mainly adopts a random walk algorithm based on the heterogeneous information. The API tag recommendation method comprises the following steps: firstly, according to a relationship among the API, mashup and a mashup tag, establishing a heterogeneous network, wherein the network comprises an inclusion relationship between the API and the mashup, a corresponding relationship between the mashup and the tag and an isomorphic relationship among three elements; then, according to the heterogeneous network, generating a corresponding transfer matrix, carrying out random walk with restart on the basis of the transfer matrix, iteratively transferring to a mashup layer and a tag layer from an API vertex, and finally achieving globally stable distribution so as to obtain a probability for the API to each tag vertex; and finally, importing text processing model (Latent Semantic Indexing) to calculate the semantic similarity of the API and the tag, combining with the obtained probability to generate a final tag sorting list to recommend a proper tag for the API so as to improve tag recommendation accuracy to a large extent.
Owner:ZHEJIANG UNIV

Selective latent semantic indexing method for information retrieval applications

A term-by-document (or part-by-collection) matrix can be used to index documents (or collections) for information retrieval applications. Reducing the rank of the indexing matrix can further reduce the complexity of information retrieval. A method for index matrix rank reduction can involve computing a singular value decomposition and then retaining singular values based on the singular values corresponding to singular values of multiple topics. The expected singular values corresponding to a topic can be determined using the roots of a specially formed characteristic polynomial. The coefficients of the special characteristic polynomial can be based on computing the determinants of a Gram matrix of term (or part) probabilities, a method of recursion, or a method of recursion further weighted by the probability of document (or collection) lengths.
Owner:SELECTIVE

Method, machine learning engines and file management platform systems for content and context aware data classification and security anomaly detection

Systems, methods and computer readable medium are provided for perform a method for content and context aware data classification or a method for content and context aware data security anomaly detection. The method for content and context aware data confidentiality classification includes scanning one or more documents in one or more network data repositories of a computer network and extracting content features and context features of the one or more documents into one or more term frequency-inverse document frequency (TF-IDF) vectors and one or more latent semantic indexing (LSI) vectors. The method further includes classifying the one or more documents into a number of category classifications by machine learning the extracted content features and context features of the one or more documents at a file management platform of the computer network, each of the category classifications being associated with one or more confidentiality classifications.
Owner:DATHENA SCI PTE LTD

Method for establishing high-efficient semantic indexing for large-amount RDF (resource description framework) data

The invention discloses a method for establishing high-efficient semantic indexing for large-amount RDF (resource description framework) data. The method comprises the following steps: step 1, configuring an open-source distributive RDF database to be used as a duration database for storing the RDF data; step 2, distinguishing TBox data and ABox data in the RDF database; step 3, generating a child-parent semantic relation indexing among categories in TBox data; step 4, generating child-parent semantic relation indexing among attributes in TBox data; step 5, incorporating the generated semantic relations into the RDF data including the original TBox data and the ABox data to form novel RDF data; step 6, persisting the novel generated RDF data into the well-configured RDF database. For inquiring and reasoning a large amount of RDF data, the novel scheme for establishing the RDF data semantic relation indexing is finally provided, so that the inquiring efficiency is guaranteed, and meanwhile rich offline reasoning can be supported.
Owner:TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products