A grid-based approach for enterprise-scale
data mining that leverages
database technology for I / O parallelism and on-demand compute servers for compute parallelism in the statistical computations is described. By enterprise-scale, we mean the highly-automated use of
data mining in vertical business applications, where the data is stored on one or more
relational database systems, and where a distributed architecture comprising of high-performance compute servers or a network of low-cost, commodity processors, is used to improve application performance, provide better
quality data mining models, and for overall
workload management. The approach relies on an algorithmic
decomposition of the
data mining kernel on the data and compute grids, which provides a simple way to
exploit the parallelism on the respective grids, while minimizing the data transfer between them. The overall approach is compatible with existing standards for data mining task specification and results reporting in databases, and hence applications using these standards-based interfaces do not require any modification to realize the benefits of this grid-based approach.