For the last two years, I have been working to extend the capabilities to of the Origin Server for Window Azure Media Services. In January of 2013, Windows Azure Media Services (WAMS) became generally available as a Platform as a Service (PaaS) on Windows Azure, Microsoft’s cloud computing platform. Just over a year later, it would stream the largest live sporting event in history – millions of concurrent users and over 10,000 hours of unique content over the 16-day Sochi Winter Olympic Games. This article describes the capabilities of Microsoft’s origin server, as well as the evolution of the product in the last two years to meet today’s demanding streaming media space.
The starting point: IIS-MS 4.1
As a free extension to IIS Servers, IIS-MS 4.1 offered several market-leading features upon its release. This included Live Smooth Streaming ingest, with archiving to Smooth Streaming and HLS v3, and the re-streaming of these live archives. It offered AES envelope encryption of HLS v3 live streams and archives, as well as simple VOD Smooth Streaming for both clear and PlayReady encrypted content.
IIS-MS 4.1 was a product which worked very well in conjunction with Expression Encoder and Transform Manager to allow the authoring of professional-grade adaptive-streaming workflows. These products together have prepared and streamed a significant proportion of internet video solutions in the last few years.
[IIS-MS 4.1 Origin Server Capabilities]
The capital and operational costs of creating and hosting libraries of video content is significant and ever-expanding for content owners. There was a need for a highly scalable solution, which would meet the needs of the complex content-preparation workflows, as well as the demands of the fractured mobile client space. Content owners, all the while, needed a Service Level Agreement backed by the world’s largest software company.
Along came the Cloud: IIS-MS 5.0 launches in WAMS
In 2012, the origin server was re-architected to meet the expected and existing needs of the exponentially growing media streaming space. It was adapted to source from Azure Storage and extended to allow just-in-time transmuxing capabilities for VOD content. This allowed the server to source from Smooth Streaming and create HLS v4 on a per-request basis. In addition, the server was extended to source from more standard formats, it could now use GOP-aligned multi-birate ISO-MP4 file-sets.
Windows Azure Media Services added scalable Origin Services to the Media Processing Services that had been in beta since May 2012. The origin services team undertook the 2012 London Summer Olympic Games for of NBC in the USA and Deltatre in other countries, thereby testing the server at scale.
In January 2013, the Windows Azure Media Services (WAMS) became generally available, with the capabilities of Expression Encoder, Transform Manager and IIS-Media Services all extended and wrapped in a scalable Platform as a Service. WAMS offered the first Service Level Agreement for encoding and streaming in the Cloud media-processing and streaming space.
The origin server now offered the following storage to streaming format capabilities:
[IIS-MS 5.0 Capabilities at WAMS GA, Jan 2013]
Keeping up with demands: New formats and partnerships
Early 2013 proved to be a very busy time. Microsoft had been working with various standardization bodies to promote interoperability of streaming media formats for several years. With 2013 came the first specifications for MPEG DASH, Dynamic Adaptive Streaming over HTTP. With such a specification, that prescribes the format used to advertise adaptive bitrate media, both the Servers and Clients are affected: there are no Servers to create streams for Clients, and no Clients to test streams from Servers. The Server team took on this challenge and implemented the MPEG DASH Live Profile for its Media Presentation Description (MPD) manifest format, and simultaneously jumped in with the Common Streaming Format (CSF) for media packaging, derived from the latest ISO-MP4 specs. This was demonstrated in time for the National Association of Broadcasters Show, and paved the way for the development of open-source DASH Live Profile client implementations by members of the DASH Industry Forum, as well as DASH player frameworks from the Windows Azure Media Services client team. Our implementation also offered muxing from Smooth+PlayReady to DASH/CSF+CENC, which was the first publicly available Common Streaming Format with Common Encryption. As the specifications were still in flux, this format was not announced as generally available: however, anyone wishing to consume and develop clients from our DASH reference streams could do so.
Not only was there significant work in the standardization domain, but also in the commercial domain. After the success of the 2012 London Summer Olympics, WAMS was selected by NBC to partner with Adobe and Akamai in delivering the 2014 Sochi Winter Olympic games. In addition, Deltatre had expanded its worldwide Olympics streaming footprint to 22 countries for Sochi. All six datacenters in which WAMS was deployed were to be used. With the NBC partnership came a requirement to implement HTTP Dynamic Streaming (HDS) to target Adobe Primetime Flash clients, as well as SCTE-35 advertisement signaling in both HDS and HLS.
As of summer 2013, the origin server now offered:
[IIS-MS 6.0 Live Capabilities, August 2013]
The learnings from the 2012 Olympics were being applied to make Live Streaming generally available for WAMS customers after the 2014 Olympics. The goal was not simply to meet the needs of NBC and Deltatre, but to build a Live Streaming Service as an integral part of WAMS which could then be leveraged by any WAMS user.
While IIS-MS 4.1’s Live ingest was designed as a single instance which both archived and re-streamed, this did not meet the needs of a cloud server deployment, nor the growing scale of the Olympics. The ingest “Channel” was re-architected to provide multi-machine redundancy and failover. Its archiving format was overhauled to take advantage of the distributed nature of the Azure Storage, thereby increasing archiving speed, throughput and reliability. The ingest and archiving components were broken away from the origin services, and two server types now work in concert to deliver live streams, increasing resilience while separating concerns. This offers diverse topology options, such as providing a known amount of Origin streaming SLA per Live channel, and the separation of Live workloads from VOD workloads without breaking CDN caches. This truly made the solution viable as a production service for both Live and VOD workloads in the cloud.
[Diagram of Live Channel, Azure Storage and Origin server]
One of the problems with traditional Origin servers is Live to VOD transition. That is, there is usually one URL for the Live stream, and one for VOD. With WAMS, there is only a single URL for the Asset, either Live or VOD. That is, whether the Asset is Live or the live stream has ended and it is now VOD content: the URL does not change. This greatly simplifies the content distribution workflow. Each Olympic broadcaster had a choice of over a dozen event feeds, as well as static feeds of the torch, stadium, city views, etc, as well as the feeds they were producing onsite themselves: on-location, interviews, new-desk, etc. Each event feed or production feed had several programs throughout the day, with differing start and stop times. The ability to pre-define the Live and VOD event URLs and traffic them to content management systems greatly simplified the workflow and their ability to begin events on time and with confidence. A clever use of encoder automation allowed broadcasters to use their normal program scheduling and management systems to trigger the start, go-live and stop of each program through the signals normally embedded in the transport stream.
In addition to a single URL for Live and VOD, the Live event had access to the full DVR window for that program. Both of these are the result of decoupling the stream archiving from the origin services. The origin is able to read from both the archiving servers, or the archives they are actively writing to Azure Storage. This allows a seamless transition once the event is over, and full access to all parts of the archive. Users could just as easily be watching the live edge of the stream, or be seeking or re-starting from the beginning of the event.
Optimizing the Origin for Live Sources
Cloud storage services are considerably different from local disks. They are HTTP endpoints, not NTFS, FAT32, SMB nor NAS/SAN systems. With either Azure Storage or Amazon S3, comes the issues associated with an HTTP storage endpoint. Looking through Amazon docs, you’ll quickly see that for any heavy throughput you’ll be recommended to use their CloudFront system, which brings the files to local disks on edge servers. Azure Storage is the same: it has massive scale, redundancy and security, but also limitations as to concurrency and throughput while retaining the low continuous latency required for live streaming. For example, a 4 second delay in providing the live manifest to an iOS device will stall playback; and when coupled with CDN caching tiers: the effect is amplified. There is no time for missed requests nor a margin of error for origin server overload.
To ensure smooth and consistent delivery of media, several improvements were made to the origin server. We added request aggregation: requests which required new data are cued with any other requests that need the same data; theses are then spooled out when the data is fetched (from either the Channel or Azure Storage). Origin request aggregation greatly reduces the transactions per second on the source content, this frees up Azure Networking and Storage to meet the needs of other loads, while providing the best streaming-request handling.
Once the source data is fetched, we cache the input source data for use by any subsequent requests: since we are simultaneously streaming Smooth, HLS and HDS, all protocols can re-use the source content.
Due to the high concurrency and wide distribution of the streams, some broadcasters had over a dozen parent CDN caches. This meant that we were seeing multiple, often concurrent, requests for each bitrate, of each protocol, plus manifests. Since the origin dynamically produces the responses based on the requests, we added output caching. If you’ve setup traditional streaming topologies, it is likely that you’ve incorporated ARR caches to serve this purpose — which is what we did for the 2012 and previous Olympics, but no more. For Live and ‘Viral Video’ scenarios, the WAMS origin servers can now easily handle the maximum bandwidth of their outbound network cards, with comparatively little internal traffic and CPU load.
A Gold Medal Performance
The 2014 Olympics were a monumental success across all fronts, it deserves a much more detailed account than can be given here, suffice to say it broke a number of records for Live streaming for scale, reliability, concurrency, automation, protocol diversity, device diversity and content delivery. All of this, only one year after the Windows Azure Media Services made its official debut.
Since the Olympics, we have been building the business model to go to market with Live Streaming, as well as allowing select partners to start ramping up using Live Services.
Massive Scalability: Enterprise workloads for streaming media
Not only was 2013 a banner year for Media Services, but the XBox One launched in November, bringing with it the XBox Game DVR service. The backend for Game Clips is Windows Azure Media Services. Users have generated millions of clips to date and stream terabytes of content daily – including during the Olympics.
At the recent SharePoint Conference 2014, it was announced that Windows Azure Media Services would be the backend for a new enterprise video streaming service for Office 365: Office Video will provide securely encrypted video at the enterprise scale.
While XBox Game DVR demonstrates massive scale, Office 365 requires new investments in encryption and security. O365 is one of the main drivers behind just-in-time encryption and sourcing from storage-encrypted media – a requirement shared by Hollywood production studios: that content shall never be un-encrypted while at rest.
As we complete the O365 requirements in the first half of 2014, the server capability begins to resemble:
[IIS-MS 6.0, March 2014]
Looking to the future: What’s next?
By popular demand, we have recently added the option to stream to HLS v3: for backward compatibility to older Android devices, connected TVs and set top boxes.
As of this writing, mid-March, 2014, we are still completing the Storage Decryption work and to complement AES Envelope Encryption. We are also closing the gap on dynamically muxing Smooth+PlayReady sources to HLS+PlayReady. The goal is to favor Dynamic Packaging and Dynamic Encryption versus static packaging and encryption: sourcing from standard MP4s or Smooth, that are either clear or storage-encrypted, we will be able to support protocols with various media encryption options.
How you can contribute: Feedback
We are nothing if we cannot meet the needs of our users, I invite you to leave feedback on feature requests on:
You can also reach us on our Forums on MSDN and StackOverflow: