A method for video compression through image processing and object detection, to be carried out by an electronic processing unit, based either on images or on a digital video stream of images, the images being defined by a single frame or by sequences of frames of said video stream, with the aim of enhancing and then isolating the frequency domain signals representing a content to be identified, and decreasing or ignoring the frequency domain noise with respect to the content within the images or the video stream, comprises the steps of: obtaining a digital image or a sequence of digital images from either a corresponding single frame or a corresponding sequence of frames of said video stream, all the digital images being defined in a spatial domain; selecting one or more pairs of sparse zones, each covering at least a portion of said single frame or at least two frames of said sequence of frames, each pair of sparse zones generating a selected feature, each zone being defined by two sequences of spatial data; transforming the selected features into frequency domain data by combining, for each zone, said two sequences of spatial data through a 2D variation of an L-transformation, varying the transfer function, shape and direction of the frequency domain data for each zone, thus generating a normalized complex vector for each of said selected features; combining all said normalized complex vectors to define a model of the content to be identified; and inputting that model from said selected features in a classifier, therefore obtaining the data for object detection or visual saliency to use for video compression.