A perceptual coder is disclosed for encoding image signals, such as speech or music, with different spectral and temporal resolutions for redundancy reduction and irrelevancy reduction. The
image signal is initially spectrally shaped using a prefilter. The prefilter output samples are thereafter quantized and coded to minimize the
mean square error (MSE) across the spectrum. The disclosed
perceptual image coder can use fixed quantizer step-sizes, since
spectral shaping is performed by the pre-filter prior to quantization and coding. The disclosed pre-filter and post-filter support the appropriate frequency dependent temporal and
spectral resolution for irrelevancy reduction. A filter
structure based on a frequency-warping technique is used that allows
filter design based on a non-linear frequency scale. The characteristics of the pre-filter may be adapted to the masked thresholds, using techniques known from
speech coding, where linear-predictive coefficient (LPC) filter parameters are used to model the
spectral envelope of the speech
signal. Likewise, the filter coefficients may be efficiently transmitted to the decoder for use by the post-filter using well-established techniques from
speech coding, such as an LSP (line spectral pairs) representation, temporal interpolation, or
vector quantization.