A representation of an audiovisual asset is obtained and is processed to obtain at least one normal playback video elementary stream, (optionally, at least one audio elementary stream), and at least one trick mode video elementary stream. A transport stream is formed from the at least one normal playback video elementary stream, the at least one audio elementary stream (where present), and the at least one trick mode video elementary stream, with the at least one trick mode video elementary stream encapsulated in the transport stream. Streaming of the transport stream within a network is facilitated, for example, to a VOD server (for de-encapsulation), and in some approaches, to a set-top box.