Server-side Dynamic Content Insertion: the journey continues…

M2A Media was honoured to speak at Demuxed this year, arguably the highlight of the annual video engineer conference calendar.

Our Technical Architect, Nigel Harniman, took on the task of composing and delivering a talk covering M2A’s very latest work in delivering server-side Dynamic Content Insertion into live streams across the three main streaming formats – with emphasis on MSS and single Period DASH.

In this latest article Nigel explains M2A’s latest advances in regional and targeted content insertion and looks to the future evolution of this particular feature of our live streaming product, M2A LIVE.   

What exactly is content insertion?

First off, let’s establish what is meant by ‘content insertion’, why is it required, and how has it been traditionally done?

A broadcaster will acquire content rights on a global basis. For example, it acquires content that can be delivered into multiple regions, e.g. the UK, Italy, Columbia, USA and Japan.

The rights require the broadcaster to insert obligated content which is different in each region.  In addition, it will want to insert promotional items into the stream that are relevant to each region.

The broadcaster may also opt to have paid for content, such as commercials or sponsorship idents which may be different per region.

Traditionally, this is implemented by taking the common world feed from the MCR into separate TX playout systems for regional content insertion and then delivering through replicated live streaming workflows per region.

Why is DCI required?

Diagram  Description automatically generated

Live Streaming Context

The live streaming architecture we typically deploy for customers includes cloud contribution, encoding, packaging and distribution components, deployed in a multi-region resilient manner.

Our orchestration provisions the resources based on the event schedule, and for some of customers operating multiple concurrent events, this is multiple thousand times per month. 

Live Streaming Context

Zooming out we can see that for a customer with 50 concurrent channels (TX chains) and a single output, the resources required look like this:

If we scale in the traditional approach, eight regional variations look like this:

Other Options?

We looked at existing technologies to  insert content – typically these are used for advertising.

Client side ad insertion relies on code – generally a library or SDK to run with the player in the device. While these are widely supported for desktop and mobile devices, they are less commonly used in the television and set top device type footprint.  This is often because they don’t have the processing power needed to implement the multiple players required for a seamless experience. In any case, client side development and testing can be resource intensive.

In addition, it is quite easy to block ads at either the device or premises edge (i.e., on the home gateway device), and thus this is not a reliable solution for inserting legally obligated content.

Server side provides a more robust solution. However, the commonly available platforms only support HLS and multi-period DASH.  Smooth is definitely not an option. These platforms are typically aimed at targeted ad insertion and thus manage a session per viewer. The costs reflect this. If you have one million viewers across 10 regions,  managing one million unique sessions with only 10 variances becomes uneconomic.

Our Approach

Our approach is to allow content to be inserted post packager in an efficient manner.

The packager output is segmented. We ensure that segment durations are chosen to align audio samples and video frames on the segment boundary. For 25/50 fps this equates to a multiple of 1.92 second duration, and for 30/60 fps a multiple of 3.2 seconds

A dynamic playlist is created per regional variant for each event. This is supplied from the customer’s scheduling system. It contains the list of breaks and the unique assets for each break.

Separately, each asset is delivered via a dedicated VOD workflow that ensures it is conditioned to match the live stream – i.e., same segment size, frame rate, bit rate and profile.

Separate URLs are provided for each region and we use a query parameter for the identification.

Segments are then dynamically and selectively replaced based on the SCTE 35 markers embedded within the stream with the relevant segments from the VOD archive.

In addition, the SCTE markers are selectively included in the output based on the playlist metadata allowing down stream placement opportunities, i.e., for targeted ad insertion.

Architecture Overview

Packaging Enhancements – HLS

HLS is the easiest stream format to support. It allows for manifest manipulation; replacing segment URLs with references to VOD segments.

The manifest manipulator is driven by the presence of the CUE-OUT, CUE-IN and DATERANGE tags. DATERANGE is required for the ID of the ad break. This is looked up against the playlist, and then the URLs for the VOD items inserted.

In this example a 9.6 second break is fulfilled with 2 segments of asset 1 and 3 segments of asset 2. 

Note the presence of the DISCONTINUITY tag to signal to the player that the media timestamps and encoding profile may change.

For this break, the CUE tags are also written into the manifest for downstream placement opportunities.

Caching is vital to ensure manipulated manifests are requested as few times as possible.

Packaging Enhancements – HLS

Packaging Enhancements – Smooth & DASH

Multi-period DASH can allow a similar manifest manipulation approach, but to support Single Period DASH and Smooth streams we need another approach.

Given Smooth manifests are only obtained once at start up, and single period dash does not facilitate changing segment templates, our solution relies on dynamically replacing the segment contents. 

The segment rewriter’s role is to take each DASH and Smooth segment request and either pass through the underlying live segment, or, rewrite the required VOD segment to match. To ensure the player is unaware of the change, the mp4 boxes need to match exactly and be part of the contiguous stream.

The segment rewriter simultaneously obtains the original live media segment and the VOD segment. It compares the encode profile – bit rate, framerate, encoding parameters – and if they are not the same, it returns the underlying media.  This protects us from an unexpected  mis-configuration.

If the media configuration matches, it then goes on to rewrite the mp4 boxes to match, retaining the decode and presentation timestamps.

The segment re-writer needs to be performant, and thus it relies on the announcement of segment timestamps for replacement, and the target contents from the break watcher. This out of band process is monitoring for SCTE markers in the HLS stream.

As with HLS, caching is vital to ensure manipulated segments are requested as few times as possible.

Packaging Enhancements – Smooth and DASH

Signalling and Encoding Challenges

The architecture appears simple, but we encountered a few road bumps along the way.

The SCTE insertion system uses SCTE-104 which is converted to SCTE-35 as part of the TS encoding. SCTE-104 only supports a duration accuracy of 100ms. We found that a required duration of 34.56 seconds appeared with a duration of 34.5 seconds in the TS stream, but as this doesn’t align with a frame boundary at 25fps (40ms frames), the encoder would then shift the duration to 34.52 seconds. We needed accurate durations to not shorten or lengthen breaks by a segment. The solution was to analyse the pattern of breaks that would not be precisely signalled and apply a correction based on a lookup table. Fortunately, a modulus approach could be used, and there were only three out of five possible values that needed correction.

The next major hurdle is that the SCTE 35 splice times will not naturally align with the segment cadence. By default, an encoder will condition the stream by inserting iframes at the splice point and altering the segment cadence. Typically this means extending or shortening the segments around the splice point.

However, if the encode does this, we no longer have the fixed segment durations that the system is constrained to work with.

We requested a feature from AWS Elemental to suppress segment conditioning. This has been carried over to AWS Elemental MediaLive as the “SCTE_35_WITHOUT_SEGMENTATION setting.

Now we have the problem that the timestamps in the SCTE markers don’t align with the splice points we have used. Where the markers are passed downstream their timestamps need to be updated with the timestamp of the segment boundary.

Output markers need to be re-written with timestamps of the segment boundary that the splice occurs on

A picture containing diagram

Description automatically generated

Challenges

In manipulating manifests, it is important to ensure they remain specification compliant.

We discovered that when the DVR window starts rolling it is difficult to know whether you are in a break when processing the manifest. We use media sequence numbers to manage the state, and we needed to ensure that the advertised MEDIA-SEQUENCE tag needed to match the implied sequence number of the first segment.

We had issues when running on multiple origins for resilience, in that the manifests could be out of step. Effectively subsequent manifest requests could appear to change history i.e., breaks not being present. This occurred if there were issues on the bootstrap process that indexed all the content for a playlist, so we had to move that to be a single process with consistent output replicated to each origin.

As the DVR window grows, performance needs to be maintained. We optimise a lot at event startup to ensure fast lookups.

We also found that downstream systems needed both the ASCII and binary marker representations to be rewritten.

Impacts on VOD

As mentioned earlier, it is vital for Smooth and DASH that the VOD assets are encoded with identical parameters, such as; frame rate, bit rate, codec profile and segment duration.

They need to have a length that is an exact multiplier of the segment duration and be encrypted with the same key.

If any of the live stream aspects change, such as frame rate, then a duplicate set of assets is required.

Stakeholder Buy-in

Before embarking on the technology implementation we ensured there was stakeholder buy-in to the limitations, such as:

  • The solution can only be segment not frame accurate.
  • Breaks lengths have to be a multiple of segment duration – not arbitrary.
  • Assets have to be a multiple of segment duration and prepared in advance.
  • Only integer frame rates such as 25/50 fps and 30/60 fps can be supported.

Break structure is baked on event startup. Due to the optimisations that are run on startup, it’s not possible to reorganise the breaks during an event. Note, we did implement an emergency asset pull for compliance reasons – but we simply replaced it with the exact duration of filler media.

Inserted assets only have a single language. This is then replicated into any alternate language tracks. In theory, we could implement multi-track audio assets, however the mapping and validation becomes significantly more complicated.

Best Practice

It’s important to highlight some of the best practises we learnt along the way.

  • Audit the SCTE markers in the inbound transport stream. There will be queries, ad breaks may not fire, and having a record of what was on the input is highly valuable.
  • Even if not required for revenue management, providing an as-run report for each event will aid compliance activities. It also serves as a first source of query handling – such as did asset x playout in region y.
  • Keep a copy of the manifest on the event end; they are vital for issue diagnosis.
  • For HLS and DASH the temporal nature of the manifests makes it hard to diagnose transient issue; build a manifest harvester and validation tool, i.e., check media sequence numbers roll correctly, no spurious discontinuity tags, flashing presence of ad breaks from one request to the next.
  • Cache, cache, cache! This is computationally expensive; reduce the number of times you calculate the same thing, and monitor response time and server load to ensure you have adequate horsepower required to meet SLAs.

Remaining Challenges

There are some challenges that remain. For example, it is difficult to experiment with different stream settings as the VOD needs to perfectly align.

Origins, too, have become very large. To run beyond the current 8-11 regional variations will require sharding region sets across different sets of origins. This will involve some form of path based routing at the cache layer.

To support B2B syndication it will be necessary to re-construct Transport Streams from ABR output – not an insignificant task.

Changes in ABR protocols, such as low latency, will require re-engineering.

Future Evolution

Looking to the future, there are new capabilities in encoders that allow switching between live and VOD assets. We are exploring moving the splicing from the packer domain to the encoder domain. This will benefit us by removing the segment constraint, thus;

  • allowing frame accuracy
  • allowing greater frame rate support, including fractional frame rates such as 29.97
  • easier delivery of transport stream or other outputs for B2B syndication

In addition, the load becomes more predictable, as it is not based on user demand and cache efficiency, but based solely on the number of regions.

There are some remaining challenges:

  • Use inband encoder features to trigger playlist insertion, or, react to markers with out of band logic and API triggering?
  • How to suppress / forward markers for downstream opportunities?
  • Whether additional metadata is required in playlists or instream?
  • This increases encoder and packager costs

We are also closely following the SCTE standards work and the adoption of time_signal() as a replacement for splice_insert() as this allows for expression of different placement opportunity types which different aspects of the workflow can enact upon. For example, it supports provider and distributor advertisement and placement opportunities.

As you can imagine, the above journey has afforded us with essential learnings that we are incorporating in the future evolution of the product. Nigel Harniman’s Demuxed talk is now available to view via YouTube.  And if you would like to learn more about our work on server-side Dynamic Content Insertion, please contact us on info@m2amedia.wpcomstaging.com

Tags: