Today, we got this question on our forums:
Does anyone have any information on the new ‘on the fly converting’ for media services?
I am trying to allow multiple users (hundreds) the ability to upload video files at the same time. Obviously, currently the queuing process takes too long. Would like to know if this new way might help.
Things tend to get pushed down quickly in the forums, and since I own this feature, I thought I would cover the answer in a blog post. But first things first, read this post about architecting a system for user-contributed content.
Yes, that feature is shipping in the coming days, we refer to it as dynamic packaging, you could also call it just-in-time packaging, dynamic muxing, etc.
Specifically, it will offer the ability to:
Transmux from a single source format into two different streaming formats.
What it will NOT do:
Take a single-bitrate source file and produce multi-bitrate streams. That requires re-encoding is very CPU intensive. See this post for the difference between muxing and encoding.
Supported input formats:
Supported output formats:
- HLS v4
- Primarily AAC and H.264 codecs (you can use VC-1, but it will not remux to HLS).
- A single MP4 to single-bitrate smooth and single-bitrate HLS.
- Many closed GOP, GOP aligned MP4’s to multi-bitrate smooth or multi-bitrate HLS.
- Multi-bitrate Smooth to HLS, and of course, Smooth.
Unsupported scenarios at this time:
- Encrypted content as source files, neither Storage Encryption nor Common Encryption.
Pictures always help:
Cost and setup:
At this time, dynamic packaging will only be offered if you reserve origin capacity in the management portal. That is, you need to enable and buy an origin reserved unit (~200$/mo charged by 24hr period increments, this is a blog, pricing changes, check the management portal for details).
If you are not using a reserved origin to stream, then you are sharing a small pool of origin servers amongst all the media services users of a datacenter. We call this the preview-pool, and are not updating it at this time to support dynamic packaging. There is no Service Level Agreements for the preview pool, if someone’s video goes viral, you’ll be competing with them for bandwidth. If you’re serious about streaming, get a reserved unit, that’s what they are for.
To reserve on-demand streaming capacity:
Go to the scale page in the management portal and move the slider to 1. If you are provisioned with a reserved unit, you’re all set.
It takes a few minutes to spin up an origin reserved unit for you, if it’s been a few hours and you don’t see confirmation in the ‘scale’ page of the management UI, then there was not any capacity within our reservation system in the datacenter. This triggers a request for increased capacity internally, but these need to be provisioned, which takes a while. We all like to think that ‘the cloud’ is infinity elastic; yes, in theory, but in practice racks of servers are powered down when unused and there are an army of guys around the world with box-cutters and screw drivers racking servers. There are well over 10,000 subscribers to media services, if everyone asks for 5 reserved units, we’ll be keeping those guys busy.
Back to the original question:
The original poster was looking for the quickest path to putting user contributed content back out there. That works if the content is ‘just right’. But it tends to go sideways quickly: all your users are capturing video in different ways, using different codecs and container formats (mov != mp4 except in special cases). By encoding each of these, even to a very simple (quicker) single-bitrate encoding profile, you create uniformity in the content that you are streaming. Without uniformity, you will get random failures that will be hard to trace: “this video plays on iOS, but not Android” “my videos wont stream at all”; and you’ll spend all sorts of hours tracing this down. Trust me, I do this for customers all week long — we are taking on this burden so you don’t have to. Skipping the encode step seems like the quickest path, but it will bring you pain. Pay for encoding reserved units (99$/mo) and you will be able to manage your queue, and mostly, you won’t sit in line with everyone else in the datacenter.
Dynamic Packaging Server Manifests
When you encode to multi-bitrate mp4 with the Windows Azure Media Encoder, the system will produce a server manifest for you. If you are familiar with the smooth streaming format, this is the .ism file that essentially says: this file is a video track, so is this file, and this file, and this one is an audio track. When Dynamically Packaging from MP4, you need to explicitly tell the server that these input files are MP4s. This is done with some metadata in the <head/> section:
In the case of several MP4 files, each muxed with the same audio track, it would look like this:
But is it going to work?
The original poster did not want to wait in a queue before he started streaming. The trouble is, he will only find out if he has problems once he’s actually streaming. Unless he sets up a quality check of some sort, he is only going to find out because his users tell him; or may only find out when they abandon his site altogether — which, unfortunately, is more typical of internet users: “if it doesn’t work, I have better things to do with my time”.
To mitigate this risk, we have set up an MP4 Preprocessor task within the existing Windows Azure Media Packager. It analyses the input files and checks that they are streamable to Smooth and HLS. You can make the task fail if the input cannot be streamed to your desired format. Id rather not maintain code snippets for it here in my blog, but it’s quite easy to use if you’ve built any sort of encoding/packaging workflow against WAMS. It is documented on MSDN here in the Dynamic Packaging section.
If you’re not going to encode with WAMS, you should at least check your files. The actual runtime for checking an asset is less than a minute for an asset as large as a few Gigs (set-up time to run the tasks varies proportionally to asset size); you do need a slot in the queue, however. The idea is to fail early, before streams are out there on your web property.
So how do I stream?
Just create a locator and append your .ism file and manifest type.
Create the locator:
var accessPolicy = _context.AccessPolicies.Create(assetName, TimeSpan.FromDays(365), AccessPermissions.Read );
var locator = _context.Locators.CreateLocator(LocatorType.OnDemandOrigin, asset, accessPolicy);
Then for smooth:
UriBuilder ub = newUriBuilder(locator.Path);
ub.Path += “/yourFile.ism/manifest”;
Uri smoothUri = ub.Uri;
or for HLS:
UriBuilder ub = newUriBuilder(locator.Path);
ub.Path += “/yourFile.ism/manifest(format=m3u8-aapl)”;
Uri hlsUri = ub.Uri;
That’s it, it just works. If you’re interested in the techy details, read on.
At runtime, the server will get the manifest request, it will peek into the .ism file and find the files your have listed, it will then open each and look up the video frame information, it will look for synchronization points across tracks and build the manifest response accordingly. When it gets fragment requests, it will go back into the video file for that quality level and find the required video frames, it will then build the response from the raw video frames into smooth or HLS, as per the request.