Encoding versus Packaging – Streaming media terminology explained

What is the difference between Encoding and Packaging?

Short answer:  It’s the difference between building toys, and gift-wrapping them.

The terminology used in media-streaming is hard to master.  There are a lot of moving parts and concepts that are linked one to the other.  In this post I review some of the most used terms and try to draw analogies that are easy to grasp.  We’ll go from content creation to content consumption.

Production of digital media is reasonably well understood, you have contributions from you creative department, or have purchased or licensed content.  At some point, these go through analog to digital conversion and typically are stored as very-high-resolution and very-low-compression (very-high-quality) files.  These are often referred to as Mezzanine or Source files, they form your input streams for encoding.  These are your ideas and your raw materials, you’ll use these to make toys for kids to play with.

Encoding (transcoding) is the process of rendering a frame of the input stream onto a bitmap buffer, analyzing it (and the frames just prior and just after), and then re-compressing it back into the target encoding profile.  Encoding is hard work it requires lots of memory and CPU cycles.  It uses codecs to build the frames of video or audio.

I refer to Encoding as ‘building toys’ in a ‘toy factory’.  You need specialized machines for this type of work.  It is CPU intensive to decompress, analyze and compress video.  In this analogy, the toys are made of codecs.


An encoding profile, in the context of adaptive bitrate streaming, is a set of target resolutions and bitrates.  Think 1080p/3Mbps, 720p/1.5Mbps, 480p/800kbps, 240p/400kbps; plus at least one audio stream.  With an encoding profile, you are choosing how many sizes of the same toy you need.  The output of the encoding process is often call a digital intermediate.  It is not the source/mezzanine, but neither is it ready to stream.


Packaging (trans-muxing) is the process of taking the frames of video and describing their sequence and presentation order and timing.  Specific methods of describing video frames are used, these could be ISO-14496 MP4 containers and their derivatives (Smooth, HDS, CSF), or MPEG2 – Transport Stream (TS).  Typically an adaptive bitrate asset will have a group of streams which are then referenced in a manifest that describes additional metadata for the streams themselves.  The manifest will describe how to request media information for each stream and can take different form depending on the protocol (Smooth Manifests, HDS Manifests, DASH Media Presentation Description manifests, HLS Manifests).  The product of the Packaging process is ready-to-stream a protocol; more sophisticated servers can work directly from digital intermediates to create protocols on demand.

I often describe Packaging as ‘gift wrapping’, all you need is gift-wrapping skills and to choose what color wrapping paper (protocol) to use (for your target client framework).  This operation is typically bandwidth limited, as the CPU has very little to do when simply re-wrapping the media (it mostly just copies around the compressed video frames in memory).


A client framework will use the protocol semantics to first read a manifest, then request media, and finally feed frames of video into a decoding pipeline.  In this analogy, imagine that the client framework is your child that doesn’t care much for the toy, but only likes unwrapping a particular color of paper; if the wrapping is not right, it gets all upset and refuses to let anyone play with the toy.


A decoding pipeline on a device is capable of reading the frames of video which have been encoded with a particular codec, and decompress them back into an image for you to view.  This is your other child, the one who doesn’t care about wrapping, but only likes a few kinds of toys, it refuses to play with any other types (not all devices can decode all codecs), and this child gets particularly upset if you tell it you have one kind of toy, but substitute in another (pipeline initialization and data consistency are important).

Have fun playing with media!


This entry was posted in Windows Azure Media Services and tagged , , , , . Bookmark the permalink.