An analytical computing environment for large data sets comprises a
software platform for
data management. The platform provides various
automation and self-service features to enable those users to rapidly provision and manage an agile analytics environment. The platform leverages a
metadata repository, which tracks and manages all aspects of the data lifecycle. The repository maintains various types of platform
metadata including, for example, status information (load dates, quality exceptions, access rights, etc.), definitions (business meaning, technical formats, etc.), lineage (data sources and processes creating a
data set, etc.), and user data (user rights,
access history, user comments, etc.). Within the platform, the
metadata is integrated with all platform services, such as load
processing, quality controls and
system use. As the
system is used, the metadata gets richer and more valuable, supporting additional
automation and quality controls.