An embodiment is related to automatic summarization for digital audio raw data (12), more specifically, for identifying pure music and vocal music (40,60) from digital audio data by extracting distinctive features from music frames (73,74,75,76), designing a classifier and determining the classification parameters (20) using adaptive learning / training algorithm (36), and identifying music into pure music or vocal music according to the classifier. For pure music, temporal, spectral and cepstral features are calculated to characterise the musical content, and an adaptive clustering method is used to structure the musical content according to calculated features. The summary (22,24,26,48,52,70,72) is created according to clustered result and domain-based music knowledge (50,150). For vocal music, voice related features are extracted and used to structure the musical content, and similarly, the music summary is created in terms of structured content and heuristic rules related to music genres.