Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

3772results about "Text database indexing" patented technology

Method and system for usage analyzer that determines user accessed sources, indexes data subsets, and associated metadata, processing implicit queries based on potential interest to users

The present invention relates to systems and methods providing content-access-based information retrieval. Information items from a plurality of disparate information sources that have been previously accessed or considered are automatically indexed in a data store, whereby a multifaceted user interface is provided to efficiently retrieve the items in a cognitively relevant manner. Various display output arrangements are possible for the retrieved information items including timeline visualizations and multidimensional grid visualizations. Input options include explicit, implicit, and standing queries for retrieving data along with explicit and implicit tagging of items for ease of recall and retrieval. In one aspect, an automated system is provided that facilitates concurrent searching across a plurality of information sources. A usage analyzer determines user accessed items and a content analyzer stores subsets of data corresponding to the items, wherein at least two of the items are associated with disparate information sources, respectively. An automated indexing component indexes the data subsets according to past data access patterns as determined by the usage analyzer. A search component responds to a search query, initiates a search across the indexed data, and outputs links to locations of a subset and / or provides sparse representations of the subset.
Owner:MICROSOFT TECH LICENSING LLC

Key indicators view

A system and computer-implemented is provided for displaying a configurable metric relating to an environment in a graphical display along with a value of the metric calculated over a configurable time period. The metric is used to identify events of interest in the environment based on processing real time machine data from one or more sources. The configurable metric is selected and a corresponding value is calculated based on the events of interest over the configurable time period. The value of the metric may be continuously updated in real time based on receiving additional real-time machine data and displayed in a graphical interface as time progresses. Statistical trends in the value of the metric may also be determined over the configurable time period and displayed in the graphical interface as well as an indication if the value of the metric exceeds a configurable threshold value. Further, a selection of one or more thresholds for the value of the metric may be applied and an indication displayed indicating if the threshold(s) have been exceeded.
Owner:SPLUNK INC

Linguistic user interface

A system for retrieval of text includes a processor which identifies grammar rules associated with text fragments of a text string that is retrieved from an associated storage medium, and retrieves text strings from the storage medium which satisfy the grammar rules. A display displays retrieved text strings. A user input device in communication with the processor enables a user to select text fragments of the displayed text strings for generating a query. The processor identifies grammar rules associated with the user-selected text fragments and retrieves text strings from the storage medium which satisfy the grammar rules.
Owner:MAJANDRO LLC

Method and system for indexing and searching timed media information based upon relevance intervals

A method and system for indexing, searching, and retrieving information from timed media files based upon relevance intervals. The method and system for indexing, searching, and retrieving this information is based upon relevance intervals so that a portion of a timed media file is returned, which is selected specifically to be relevant to the given information representations, thereby eliminating the need for a manual determination of the relevance and avoiding missing relevant portions. The timed media includes streaming audio, streaming video, timed HTML, animations such as vector-based graphics, slide shows, other timed media, and combinations thereof.
Owner:COMCAST CABLE COMM MANAGEMENT LLC

Methods and systems to efficiently find similar and near-duplicate emails and files

A set of trigrams can be generated for each document in a plurality of documents processed by an e-discovery system. Each trigram in the set of trigrams for a given document is a sequence of three terms in the given document. A set of trigrams for each similar document is then determined based on the set of trigrams for the original document. To facilitate identification of the similar documents, a full text index is then generated for the plurality of documents and the set of trigrams for each document are indexed into the full text index, as individual terms. Queries can be generated into the full text index based on trigrams of a document to determine other similar or near-duplicate documents. After a set of potentially similar documents are identified, a separate distance criteria can be applied to evaluate the level of similarity between the two documents in an efficient way.
Owner:VERITAS TECH

Drawing Device for Relationship Diagram of Documents Arranging the Documents in Chronolgical Order

A document correlation diagram drawing device includes extracting means (20, 30) for extracting content data and time data of document elements (E) each including one or more documents, dendrogram drawing means (50) for drawing a dendrogram showing a correlation between documents on the basis of the content data of the document elements, clustering means (70) for cutting the dendrogram in accordance with a predetermined rule and extracting clusters, and intra-cluster arranging means (90) for determining an intra-cluster arrangement of the document elements belonging to each cluster on the basis of the time data of the document elements. Accordingly, a dendrogram adequately showing the chronological development in each field can be automatically drawn.
Owner:INTPROP BANK CORP (JP)

System for discrete parallel processing of queries and updates

A data driven discrete parallel processing computing system for searches with a key-ordered list of data objects distributed over a plurality of servers. The invention is a data-driven architecture for distributed segmented databases consisting of lists of objects. The database is divided into segments based on content and distributed over a multiplicity of servers. Updates and queries are data driven and determine the segment and server to which they must be directed avoiding broadcasting. This is effective for systems such as search engines. Each object in the list of data objects must have a key on which the objects can be sorted relative to each other. Each segment is self-contained and doesn't rely on a schema. Multiple simultaneous queries and simultaneous updates and queries on different segments on different servers result in parallel processing on the database taken as a whole.
Owner:MEC MANAGEMENT LLC

Systems and methods for performing background queries from content and activity

ActiveUS7225187B2Relieving userRicher and more effectiveData processing applicationsText database indexingRelevant informationQuery formulation
Most information retrieval systems start with a user's explicit query. Systems and methods are provided that perform implicit or background queries to one or more information sources based on the ongoing activities of users. The methods provide users with the results of such automated contextualized searches in an unobtrusive manner. In one aspect, implicit queries are run when users are reading, working on or composing an application. Queries can be automatically generated by analyzing an application, and results can be presented in a variety of peripheral display configurations, including a small pane adjacent to a current window to provide peripheral awareness of related information that is automatically determined from existing user context and / or related content from the application. The invention includes methods for building models that predict the value of different queries, and of the results generated by such queries, based on logged data, and for using such models to control query formulation and to mediate decisions about displaying the results of implicit queries.
Owner:MICROSOFT TECH LICENSING LLC

Method, apparatus, and computer program product for indexing, synchronizing and searching digital data

A system, method and computer program product provide a search module for searching digital data. The search module operates, according to an embodiment, by indexing stored data without interrupting use of the stored data, synchronizing the indexed data with data stored subsequent to the indexing step, searching at least one of the synchronized data and the indexed data, and outputting results of the searching step.
Owner:GOOGLE LLC

Phrase-based searching in an information retrieval system

An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Related phrases and phrase extensions are also identified. Phrases in a query are identified and used to retrieve and rank documents. Phrases are also used to cluster documents in the search results, create document descriptions, and eliminate duplicate documents from the search results, and from the index.
Owner:GOOGLE LLC

Personalizing anchor text scores in a search engine

A search engine identifies a list of documents from a set of documents in a database in response to a set of query terms. For each document in the list, the search engine determines an information retrieval score based on its content and the query terms, and also identifies a set of source documents that have links to the document and that also have anchor text satisfying a predefined requirement with respect to the query terms. The search engine calculates a personalized page importance score for each of the identified source documents according to a set of user-specific parameters and accumulates the personalized page importance scores to produce a personalized anchor text score for the document. The personalized anchor text score is then combined with the document's information retrieval score to generate a personalized ranking for the document. The documents are ordered according to their respective personalized rankings.
Owner:GOOGLE LLC

Snapshot indexing

Managing backup data comprises accessing a snapshot of a data set, wherein the data set includes at least one object and the snapshot includes a replica of the data set, and adding to an index associated with the snapshot, with respect to each of one or more objects included in the snapshot, index data indicating at least where the object is located within the snapshot.
Owner:EMC IP HLDG CO LLC

Method, system, and computer program product for searching for, navigating among, and ranking of documents in a personal web

InactiveUS20060059144A1Least labor-intensiveScale be relatively lowData processing applicationsText database indexingHyperlinkRanking
A method, apparatus, and computer program product for a personal search engine that includes a hybrid web composed of: the similarity web, and directed hyperlinks. Components include a parser (extracting words from documents); a text relevance analyzer; a link analysis method; the similarity web; a similarity analyzer; and hyperlinks. Other components include a navigation window; and FQSs. The combination of all the above may be incorporated into a working personal search engine.
Owner:TELENOR AS

Full text search of schematized data

Full text searching may be made available for resources stored in a database according to a database schema. A method for conducting a search on structured data using a text search engine includes the steps of: modeling a resource stored in a relational data store as a web page; providing a locator to the resource; and providing the resource in a consumable format to the text search engine. The method may include the additional steps of: receiving a search on the resource; converting the search into a converted query consumable by the search engine; and providing the converted query to the search engine.
Owner:MICROSOFT TECH LICENSING LLC

Method and system for providing a search index for an electronic messaging system based on message threads

When a message having at least one attachment is obtained for indexing, it is indexed as N+1 separate documents, where N is the number of attached documents. If the message is part of a message thread, then information regarding the last message in the thread is retrieved, and search index attachment meta data for the last message is extracted. A unique identifier is computed for the newly obtained attachments, and used to search for matches in the attachments for the last message in the thread. If there is a match, then the newly obtained attachment is not indexed, but the unique identifier of the previously indexed matching attachment is added to a body index document for the new message. A unique identifier associated with the new message body is also added to a list of parent identifiers associated with the attachment. If a search is subsequently issued that matches the contents of the attachment, all documents whose parent identifiers are listed in the attachment document meta data will be returned as matches. If an attachment is obtained for a message is not part of a previous message thread, or if a newly obtained attachment is not a match with any previously obtained attachment within the message thread to which it belongs, then the attachment is indexed into the search index, and its unique identifier is included in the index document for the newly obtained message body.
Owner:TWITTER INC

Method and system for smart search engine and other applications

The present invention provides a new method for indexing a given text objects, using text parsing module and words indexing databases. According to this method each word is assigned a first index code according to words meaning, a second index code according to each word syntax category and a third index code according to word syntactical role. The words indices are arranged according to hierarchical order based on syntactical relations between the text words. At the last stage, differentiating symbols, which represent indices hierarchical order, are assigned between adjacent words indices. The indexing process may be implemented as automatic computerized program or as wizard application enabling human intervention in the indexing process. The indexing method can be utilized for enabling text search utilities based on matching between The query indices and source text indices.
Owner:GOVRIN OMRI +1

System for indexing textual and non-textual files

In a system for indexing computer files or records, a data storage device stores the computer files or records, wherein each of the computer files or records is identifiable by one or more attributes, a first collection of information including a series of the attributes, and a second collection of information including entries for each of the computer files or records that is to be indexed. Linking means then link the information with attributes and entries to identify the presence or absence of one of the attributes in each computer files or records being indexed.
Owner:CHEO MENG SOON

Providing answers to questions including assembling answers from multiple document segments

A method, system and computer program product for generating answers to questions. In one embodiment, the method comprises receiving an input query, identifying a plurality of candidate answers to the query; and for at least one of these candidate answers, identifying at least one proof of the answer. This proof includes a series of premises, and a multitude of documents are identified that include references to the premises. A set of these documents is selected that include references to all of the premises. This set of documents is used to generate one or more scores for the one of the candidate answers. A defined procedure is applied to the candidate answers to determine a ranking for the answers, and this includes using the one or more scores for the at least one of the candidate answers in the defined procedure to determine the ranking for this one candidate answer.
Owner:IBM CORP

Method, apparatus and computer program for managing the processing of extracted data

Data is processed at a central data processor using multiple processing steps. The data is processed for the extraction of entities. Relationships between the extracted entities are also extracted. A system map is built using one or more factors derived from the extracted entities and relationships and organized by influence relationships. Each factor is associated with one or more options.
Owner:PRECIPIA SYST

Methods and systems for assisting information processing by using storage system

In a networked information system, a portion of the information processing is offloaded from servers to a storage system to reduce network traffic and conserve server resources. The information system includes a storage system storing files or objects and having a function which automatically extracts portions of text from the files and transmits the extracted text to the servers. The text extraction is responsive to file requests from the servers. The extracted text and files are stored on the storage system, decreasing the need to send entire files across the network. Thus, by transmitting smaller extracted text data instead of entire files over the network, network performance can be increased through the reduction of traffic. Additionally, the processing strain on physical resources of the servers can be reduced by extracting the text at the storage system rather than at the servers.
Owner:HITACHI LTD

Prioritized merging for full-text index on relational store

A full-text search index system and method is generated by creating instances of a database index from an in-memory inverted list of keywords associated with a text identifier and the occurrences of the keyword in the text. Instances of the index are placed in a priority queue. A merge scheduling process determines when a merge should be initiated, selects instances of the index to be merged and selects a type of merge to perform.
Owner:MICROSOFT TECH LICENSING LLC

Method and apparatus for automated tag generation for digital content

A method and apparatus for automatically generating tags for digital content are provided. The method is adapted to be run on a computer, which is an example of the type of apparatus which may generate the tags. The generated tags describe the digital content, and may be used as topics for the content to organize, retrieve, and process the content. The tag generation begins by accessing content from a content collection unit and a tags candidate tag database unit, which are then processed using techniques from computational linguistics in a multi-pass process that generates sets of tags, then refines and normalizes them. Finally, scores are generated and stored along with the tags.
Owner:FEDERATED MEDIA PUBLISHING

Systems and methods for indexing content for fast and scalable retrieval

InactiveUS20050120004A1Fast and efficient and scalable retrievalFast and efficient generationData processing applicationsDigital data processing detailsPaper documentReverse index
Systems and methods for query processing and indexing of documents in connection with a content store in a computing system are provided. In various embodiments, an indexing model is provided that is optimized for fast, efficient and scalable retrieval of documents satisfying a query, including the mixed use of forward and inverted indexing representations, including algorithms for achieving a balance between the two representations. When processing queries, fast and efficient generation of reverse chronologically ordered posting lists is enabled for efficient execution of logical operators on query result sets. A term expand index is also provided wherein the overall terms included in the term expand index are decomposed into a plurality of lexicon files, which are combined when convenient for fast, scalable efficiency when performing queries of the content in the content store.
Owner:R2 SOLUTIONS

Polyarchical data indexing and automatically generated hierarchical data indexing paths

Data indexing using polyarchical indexing codes and automatically generated expansion paths. For a piece of data, an indexing code is received relating to a particular categorization or other indexing parameter. Based upon the indexing code, one or more expansion sets of codes are retrieved and applied to the piece of data. The expansion sets of codes may include indexing codes that relate to hierarchical levels of indexing. The expansion sets of codes may also include different expansion paths through the hierarchical levels of indexing. The polyarchical codes may include multiple cross-categorization of the data across the same or different levels of categories. They may also include multiple expansion paths in different directions across hierarchical levels of categories or indexing.
Owner:DOW JONES REUTERS BUSINESS INTERACTIVE

Methods and systems for implementing approximate string matching within a database

A computer-based method for character string matching of a candidate character string with a plurality of character string records stored in a database is described. The method includes a) identifying a set of reference character strings in the database, the reference character strings identified utilizing an optimization search for a set of dissimilar character strings, b) generating an n-gram representation for one of the reference character strings in the set of reference character strings, c) generating an n-gram representation for the candidate character string, d) determining a similarity between the n-gram representations, e) repeating steps b) and d) for the remaining reference character strings in the set of identified reference character strings, and f) indexing the candidate character string within the database based on the determined similarities between the n-gram representation of the candidate character string and the reference character strings in the identified set.
Owner:MASTERCARD INT INC

Indexing systems and methods

Described herein are systems and methods for indexing documents in a quasi real-time manner. The method can include the steps of indexing documents and storing document information in a database, registering with an operating system for notification of changes to the documents, and responding to received notification of changes by updating the database to reflect the addition, modification, renaming and / or deletion of documents. Unlike traditional document systems, the document index described herein can be updated without rescanning all the indexed documents.
Owner:COPERNIC TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products