An automatic, multi-stage rich media content creation system includes a network based server executing an operating system, a text editing program, typically XML editor, and an authoring or batch processing program for combining Rich Media in a Multimedia Vehicle Repository (MVR) file. Video, stills, panoramas, sound, film, etc are combined as raw Rich Media from one or more sources and transmitted to the server in a framework over a digital network. The raw Rich Media is incorporated into the framework as a series of related frames. The raw Rich Media is stored on a storage device, typically a disk at the network based server. A creator using a standard graphical or text editing tools has access to the raw media assets on the disk for preparing a textual specification description, typically XML in an electronic template of desired Rich Media content. The template and raw media assets are transmitted by the creator to the server for combining the raw media assets and the XML textual specification in the template as a composed MVR file using the batch-processing program. The composed MVR file and textual specification may be returned to the creator for further editing of the specification, if necessary, and / or stored on the server disk for access by other creators. The template based, composed or edited MVR file of Rich Media content and related textual specification can be transmitted to other servers on the network for automatic, multistage creation of Rich Media content by several user groups. One user group can create a template based, composed or edited MVR file which another group accesses via the network to inject other content into the template for revising the XML text specification to create another embodiment of the Rich Media content.