Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Creation, use and training of computer-based discovery avatars

a computer-based discovery and avatar technology, applied in the field of data management, discovery, and organization, can solve the problems of increasing complexity in organizing, searching and discovering data elements within large data repositories, insufficient culling of wanted data, and traditional techniques for searching data for needed elements,

Inactive Publication Date: 2012-11-29
GCP IP HLDG I
View PDF0 Cites 66 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

"The patent describes a method for creating and using computer-based avatars to assist human analysts in conducting analysis and exploration of large data sets. The avatars are created by using machine learning processes to analyze data and identify relevant information. The avatars can be used in a wide range of applications, such as national security, enterprise management, and forensic analysis. The method involves extracting data features from the source data, scoring them based on their relevance to a particular topic, and optimizing the scoring based on a comparison of the data features. The avatars can also be trained to improve their performance over time. Overall, the method provides a way to automate the analysis of data and improve the efficiency of human analysts."

Problems solved by technology

With the rapid increase in data creation and the capability to cheaply and reliably store vast volumes of data has come an increasing complexity in organizing, searching and discovering data elements within large data repositories.
One result is that traditional techniques for searching data for needed elements, such as keyword searching, Boolean operators, and enhanced search are insufficient to cull wanted data from large data repositories because even a small mismatch between, for example, a keyword and data included in a document, may result in the document being omitted from the search results.
Similarly, the presence of a keyword in too many documents within a data stream may result in over-inclusive searching, producing search results that are too voluminous for a human to review in an acceptable amount of time.
Further, a keyword match may lack intelligence and produce data query results that combine documents simply on the basis of sharing a word (e.g. “state”), even though that keyword has substantively different meanings in the documents (e.g., “solid state” and “state of mind,”).
Also, individuals may have a strong intuitive sense of what information is valuable within a set of results, but may not be able to develop keywords that properly reflect that intuition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Creation, use and training of computer-based discovery avatars
  • Creation, use and training of computer-based discovery avatars
  • Creation, use and training of computer-based discovery avatars

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036]Referring to FIG. 1, in embodiments of the present invention, a computer-based discovery avatar may be created based at least in part on starting with a data ingestion 101 or entry phase in which a set of data are selected to be used for creating and training a discovery avatar. In embodiments, data ingestion 101 may be performed using a web crawler or any search engine combined with a data storage system. An example paradigm may include a combination such as, but not limited to, a web search software tool such as the open source tool Nutch® provided by Apache® and a search server, such as the Solr search tool provided by Apache, which is based on the Lucene Java search library. Such a paradigm may use a distributed storage and computation tool such as the open source Hadoop™ framework from Apache™. In various embodiments, a wide variety of tools known to those of ordinary skill in the art may be used to extract, transform, load and store data from disparate sources into one o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

In embodiments of the present invention improved capabilities are described for developing, training, validating and deploying discovery avatars embodying mathematical models that may be used for document and data discovery and deployed within large data repositories.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application claims the benefit of the following provisional application, which is hereby incorporated by reference in its entirety:[0002]App. No. 61 / 491,140 filed on May 27, 2011, and entitled “CURIOSITY ENGINE FOR CONTENT DISCOVERY.”BACKGROUND[0003]1. Field[0004]The invention is related to data management, discovery, and organization within voluminous data repositories.[0005]2. Description of the Related Art[0006]With the rapid increase in data creation and the capability to cheaply and reliably store vast volumes of data has come an increasing complexity in organizing, searching and discovering data elements within large data repositories. One result is that traditional techniques for searching data for needed elements, such as keyword searching, Boolean operators, and enhanced search are insufficient to cull wanted data from large data repositories because even a small mismatch between, for example, a keyword and data included in ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F15/18G06F17/30G06N20/00
CPCG06N99/005G06N20/00
Inventor DOLAN, BRIAN
Owner GCP IP HLDG I
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products