A scaleable video
encoder has one or more encoding
modes in which at least some, and possibly all, of the motion information used during motion-based predictive encoding of a video
stream is excluded from the resulting encoded
video bitstream, where a corresponding
video decoder is capable of performing its own
motion computation to generate its own version of the motion information used to perform motion-based predictive decoding in order to decode the
bitstream to generate a decoded video
stream. All
motion computation, whether at the
encoder or the decoder, is preferably performed on decoded data. For example, frames may be encoded as either H, L, or B frames, where H frames are intra-coded at full resolution and L frames are intra-coded at
low resolution. The motion information is generated by applying
motion computation to decoded L and H frames and used to generate synthesized L frames. L-frame residual errors are generated by performing inter-frame differencing between the synthesized and original L frames and are encoded into the
bitstream. In addition, synthesized B frames are generated by tweening between the decoded H and L frames and B-frame residual errors are generated by performing inter-frame differencing between the synthesized B frames and, depending on the implementation, either the original B frames or sub-sampled B frames. These B-frame residual errors are also encoded into the
bitstream. The ability of the decoder to perform motion computation enables motion-based predictive encoding to be used to generate an encoded bitstream without having to expend bits for explicitly encoding any motion information.