Live video streaming with Amazon CloudFront and Peacock

Thank you for joining us. Uh this is session net 328 live video streaming with amazon cloudfront and peacock. My name is Tal Shalom. I'm a principal product manager for amazon cloudfront.

With me on stage today, I'm uh excited to introduce Simon Rice, senior vice president solution architecture at NBC Universal. Thanks, Simon will uh introduce himself uh in the second part of the session and will share with us how he and his team architected peacock platform for live streaming delivery at scale.

We have a packed agenda for you in this session. First, we're going to focus on the challenges with the with the delivery of a live event at scale and the impacting factors to achieve high quality of experience for your viewers. Then Simon will share with you the peacock platform architecture and cover areas of innovation including monetization and plot and delivery optimization. Finally, I will share the best practices for delivering live event at scale with amazon cloudfront. Let's get going.

When sports fans watching a game and they hear the host say we are live here at the arena at such and such and seeing the live video icon. The live icon on their player or their application. They expect to have um the experience as close to being at the stadium or at the arena in person. That means the lowest latency behind life, the highest quality possible and no interruptions throughout the event.

Most of these events, uh it's true for all type of live streaming events like the olympics or the super bowl. For many years, those events were broadcast over the air or uh delivered via satellite or your cable provider to your tv sets in the recent 10 years, um content providers started to uh deliver those live events uh online over the internet through multiple devices. And the this is called um over the top. That's the coin uh um term for this delivery. But there was a, there was a difference in early days, there was a difference in latency and it created a spoiler where you watch a game and suddenly you hear your neighbor cheering and you're not sure what's going on. Is it a birthday or something? But then 30 seconds later you see the touchdown and they are. Oh wow, that's so this is reducing the experience uh uh the the quality of experience for your uh viewers.

Nowadays, technologies have improved latency is much lower. We're talking about a few seconds behind live sometimes even faster than the broadcast. So for broadcast and o tt, the events start at the venue at the arena where the video is is being taken from the camera. But the delivery, the processing, the production is different between the different uh transport and delivery. But the main difference as an outcome is the experience with broadcast, there is a single stream broadcast to millions of viewers. And it's the same experience while over the top you have person personalized experience.

So each stream goes to a different viewer and it's an individual viewer that gets that stream. That means you can watch it on your choice, your device of choice, you can watch it on the go, you can watch it on your gaming console. And, and this is adds this flexibility adds a lot of challenges when you now need to manage all those streams and maintain the highest quality possible for your viewers.

Let's take a moment to understand the impacting factors on quality of experience. We mentioned viewers expect the lowest latency behind live, highest video quality, highest audio fidelity, whether it's hduhd four k eight k, whatever their their device or network is capable of, they want to have the highest quality possible, no interruptions throughout the entire event.

Our customers measuring this quality of experience with different telemetry from the client side, they're looking at video start uh failures. That means when a viewer click on a play and the video doesn't start, they're looking at interruptions and video stop during the event. That's the video play failure. We keep it short with vst or uh vpf. And then any image distortion from, from jittering or loss on the network which can cause uh quality degradation. And on a big screen, it looks a little fuzzy and any audio video out of sync, all of this reducing the quality of experience for viewers and at times causing viewers to just change the channel or uh uh move to a different program.

So what are the reasons? One, some of the reasons for this is with setting up your stream. So in a, in a sense of let's say, what is the betrayed per resolution that you want to deliver? How you control that? Um and that should meet the throughput that your viewer and their device on the network can consume. If it's not matching, it's either going to drop the resolution or cause for errors on the network. Those errors can uh uh cause a buffer on the client side to under run and start to lose video segments and create interruptions in the video streaming.

So most of the client tells us that they don't always have control on the client, but they have more control on the network side. When you're selecting a content delivery network, this can make a major difference on your quality of experience that you deliver to your viewers.

The aws network is built on a fully redundant uh 400 gig e parallel network links and private capacity between our regions. The amazon cloudfront, global network or global edge location connected to that backbone with over 600 points of presence around the world. This network extends into internet service providers with over 500 embedded popes. Those embedded popes are a caching tier that is deployed inside the internet service provider allowing for lower latency, serving closer to viewers. And simon and i will talk more about that and the benefits. Uh in in the next slide, we're constantly expanding our network to new locations where our customers seeing more demands from their clients.

On top of that network, we have a comprehensive set of services and capabilities that our customers use to build different workflows for media, different use cases. Uh for example, uh for sports league uh seasons where you have multiple games, some games are overlapping at the same time, they need to manage the capacity. Um some other games uh for, for example, we e sports, those could be 3 to 5 minutes event, but you'll have a spike of audience coming in in a matter of seconds and we're talking millions of viewers coming in in a matter of seconds. So it's a different characteristics, still live event.

Our customers are using aws elemental media, uh uh media, live and our encoders from the venue, sending the live feed up onto the aws cloud and then using uh elemental media, live and elemental uh uh pr to process and package and prepare those streams. And we'll talk about uh how we connect those to uh cloud front cloudfront is used to deliver those segments to millions of viewers in low latency.

Some customers are using edge compute to customize their delivery or to shift functionality from clients into the edge. For example, um if you want to set based on, if you want to set different content based on location, so you can take some query strings and use that in order to shift the request uh to different locations.

Um customers uh are also using security uh in order to secure their origin uh with access control. So viewers or any um any customer that won't bypass cloudfront and access the origin directly. They're also using aws web application firewall at the front to push the attack surface all the way out to the edges and protect from unauthorized users. But mitigation at that level and adding different rules for access lists, whether it's allow list or block list at the edge.

The idea here that those services are running off the same backbone communicating in milliseconds between them and uh the delivery through cloud font. We look at this as a holistic end to end solution. On top of that, you have the uh observable tools so you can see a glass to glass from the encoder at the venue all the way to the delivery. And later, we'll talk about how we are combining client side metrics with the cd n metrics. So we have a full uh a full end to end view. And we also enable to correlate between what's happening at the client side and what's happening um at the city site.

Now, um i will hand it to simon to share with you how peacock uses those services to deliver live event successfully. Thank you ta.

Um any peacock subscribers here today, quite a few of you there. Thankfully, there's quite a few more of you out there as well. But uh thank you very much. So, yeah. Um any anyone subscribed to peacock purely for sports, was that what dragged you in? Quite a few people? Yeah, it's a, it's a common story. Um so i'm hopefully gonna enlighten you to how we, we serve live sports on peacock today. But by introduction, i'm simon rice, i head up the solution architecture function for peacock. And my team's effectively responsible for the solution design of the front end applications, the back end services that support those applications all the way through design delivery and then running and operating those services. And uh we call ourselves the global streaming technology team. And uh why do we call ourselves the global streaming technology team and not just peacock. Well, we have a number of businesses around the world. We, we serve the uh peacock here domestically in the us, obviously, as well as other nbc properties like voodoo uh like fandango movie tickets in europe. We have a service called sky showtime, which is live with viacom cbs across 22 countries. And we are launching at the moment, a new service called showmax across 44 african countries in a joint venture with multichoice. And really our, our, our, our plan with those businesses is to allow them to expand internationally with ease and also to engage new large diverse audiences with uh premium content on any device at any time anywhere. And our mission across all of those brands and all of those global businesses is first of all to allow effortless discovery to quickly guide customers to the content that they love uh to allow uh fandoms where you can more deeply and authentically engage with brands and teams and content that you're you're attracted to um to try to provide consistently fresh content to always allow you to find something new and fresh and interesting on the platforms and delightful experiences with beautiful interfaces with uh intuitive design and easy ways to interface with it and really just provide moments of delight.

So how do we do that? We provide what we refer to as the global streaming platform and at a high level, we're gonna kind of peel back the onions of this platform. But uh we provide a number of services. There are services to prepare encrypt and manage and schedule video assets ready for playback. There are authentication, authorized and commerce services to allow customers to select from a variety of subscription options and then get to the right content that's, that's right for them, we have content discovery and personalization to allow users to search for content, to allow us to recommend content that we think users will enjoy. And we have adtech services that both provide advertising opportunities but also shopp moments and commerce opportunities within the environment. And all of that's underpinned by data and analytics capabilities and cloud infrastructure that both scales up and down automatically but also allows us to scale manually in in anticipation of large scale events. And when you look at that platform, when you have a live sporting event, each part of that platform is going to be utilized and and see different types of load at certain times of the event. So what we do is we then create per minute breakdowns by what we call customer journeys so that we can predict based on broadcast figures, past events, future predictions, we can predict down to the minute. How many sign ins, how many sign ups, how many video start requests are we going to have over the course of the event? How are users going to transition from the event into other types of content afterwards? So that really gives us an idea across this platform. How is it going to be impacted by various types of live sporting events?

So this isn't our first rodeo. It's also not our first super bowl, not our first olympics.

Uh we've already hosted a number of very large scale events. Uh we had a, a weekend last year where we hosted both the Winter Olympics and the Super Bowl in a single weekend. We hosted the men and women's FIFA World Cups.

Um but we're not just focused on big tent pole sporting events. We also just provide premium live content every day. Um we had a weekend recently where we uh hosted around 50 live events over the course of a weekend. During that time, we had up to seven concurrent simultaneous live streams.

So really live streaming is, is part of our DNA and what we do from a technology point of view, that means that we have to provide a platform that provides performance, security, reliability and all of that at pretty significant scale as we grow across multiple global businesses.

So when you look at live sports specifically, there are a few things that we, that are really top of mind for us. One is from an application point of view, we want to make sure that it performs well and that application load times are some of the best in the market. Um then we want to make sure that the platform is reliable and drive down video playback failures to the point where they're really non existent.

We're then doing a dance where we want to balance the lowest latency that we can possibly get. As Tao said, you don't need your buddies, texting you on WhatsApp celebrating a goal 30 seconds before you see it on your screen. So we're aiming for as low latency as we can possibly get to. But at the same time, also the highest quality we can possibly get to.

So we're hosting games in 4K ultra high definition. We're using HDR color spaces. We're running at 60 frames per second. We really want it to be an immersive sporting experience that's as good as if not better than broadcast.

Um then we're also looking at minimizing buffering to a degree where it's non-existent or not noticeable. And last but not least, we want to gather as much telemetry data from the client devices as we can. So that if we do see uh issues or problems, we can deal with those in real time without the customer even hopefully needing to restart their device.

So when you look at the difference, as Tao said, between broadcast and live uh over the top streaming, there are some things in common. There are some things that are unique challenges in the OTT space in common. We still have to encode package and push the video through a distribution pipeline. Obviously on the broadcast side of things that then runs down a fiber optic cable to a customer's home and appears on their TV.

In the OTT world. Our, our strategy to, to mirror that is to go through CDNs that are much, much closer to the customer and then through their internet service provider to whatever device they're using at the time. They may move between the CDNs, they may move between ISPs that may be on the train watching a soccer game for all we know.

So we have to cater for that. And obviously there's no guarantee that those CDNs or those ISPs are, are going to be used for solely Peacock. Obviously, people are going to be watching Netflix on them. They'll be watching YouTube taking those dreaded Zoom calls from home.

So we're going to see contention in the CDNs. We could see contention in the ISPs and it creates a bit of unpredictability that we need to plan around. But what it means is we, we could have tens of millions of, of concurrent viewers. And when you do the math on that, we're talking about hundreds of terabytes per second of throughput that we need.

So we're in a unique situation where we not only need to consider multi CDN for fail over and reliability, but we also purely need to look at multi CDN for throughput. There's a just a ton of data pushing through these pipes, excuse me.

So how do we choose a CDN provider? And how do we decide which ones to use and where and we look for a number of things in our CDN provider. The first and most obvious one is proof of capacity. We need, we need to prove that they have the capacity we need, we have to be able to reserve that capacity and we have to be able to guarantee headroom within that capacity. So that that's probably priority number one, we then look at the geographical reach of each of those CDN providers.

Um not all CDM providers are created equal, not all have the same coverage. So if you know the telecoms world, we go right down to the ASN level almost to the, the zip code to look at coverage across our our locations and make sure that we have a very good idea of which CDNs map to which ISPs map map to which ASNs. And we create this geographic picture of where we're going to use the different CDNs within our portfolio.

Next up is ease of API integration, particularly if CDNs offer things like edge compute or origin shields to be able to protect us. And last but not least is, is commercial and financial considerations. Obviously, there are cost considerations and uh that may dictate how much traffic we send to each CDN. But through all of those, we then build up a plan that covers things like distribution of traffic, what locations we're going to use, which CDN, which CDN we would fail over to if we saw issues or congestion. And that, that sort of builds our business case.

And then we end up with this CDN decision solution that allows us to from the client devices, gather a whole bunch of video telemetry. We look at video and start stop events. We look at whether we're scaling up or down the bit rates of the streams. We look at things like ad insertion events are we buffering and we pull all of that into a back end system that I'm going to talk about in a minute.

But we combine that with things like reliability statistics from the CDNs, we have historical data about reliability from the CDNs. And all of that allows us to create manifest files that during playback time we pass to the clients and those manifest files will tell the clients at that point in time. And for the location that they're in which is the preferred CDN, that's gonna give them the best stability, the best throughput and the best reliability for for their live streaming.

So obviously this system, this CDN decision solution is mission critical to us. And anything that's mission critical to us means five nines of availability, which really means active, active multi region in our scenario. So we run this across at least two AWS regions, it is active, active. So as they're loading and when i say clients, i mean playback devices as they're loading telemetry data in they're they're sending to a cloud region and can easily fail over to a second in the unlikely event that the primary cloud region fails.

Um but depending on where you are in the country, you'll be sending to a specific cloud region. We then load that telemetry data into high performance Kafka queues for ingest and any mission critical telemetry is then replicated in real time between the East and West AWS regions here in the US.

We then put that uh that telemetry data through a Databricks ETLQ, which is a uh again a high performance event driven ETLQ. And that data is then loading it into an Apache Druid database. And, and the reason we're using Apache Druid there is because it is very high performance, it's in memory and it allows us to do CDN decision in milliseconds.

Uh so if we're in a scenario where a group of customers in a particular location are seeing buffering challenges, they're seeing video failures. Uh in, in that scenario within milliseconds, we'll know about it. We can create that Druid database, find out the cause of the problem and automatically switch them to a different CDN normally without them even having to restart their stream.

So it really allows us to be very, very responsive in switching customers between CDNs to provide the best performance possible. Then in the back end, we take all of that telemetry data and we summarize that in a data warehouse so that both our technical and our business leaderships can make informed decisions about future events to ensure that we are covered and that we can do better iteratively as we, as we continuously run these events.

Now, obviously, none of this is possible without testing and testing is incredibly important to us. And we, we're very intentional about it. Uh with testing, we we we have kind of a deliberate approach. We start with uh looking at specific components, specific micro services running them at huge scale. Then we look at the integration points between those micro services. And finally, we run the entire end to end pipeline with customer journeys at massive scale at at peak to ensure that these things do run as we would expect.

And we have a very rigorous chaos engineering philosophy as well. So we test failure scenarios at scale. Um but last but definitely not least we partner with AWS. We have, we make use of the Infrastructure Event Management service. So we know that the the infrastructure and the services will be there when we need them in every aspect of AWS.

And we make use of the Media Event Management services that Tao's gonna talk about later to to look at the video specific services and ensure that they are available and will scale to the degree that we need.

So as well as the actual sports streams themselves, we also want to provide monetization opportunities and advertising within those streams. But we want to do that in a very personalized, innovative way.

So when uh when you look at some of the innovations that we've released over the last few years. We've got things like Pod Bounce where you can have high impact brand level awareness within sporting events for specific brands that we think are relevant.

We've also got Highlight Ads and Highlight Ads. If you imagine if you're a fan of the Real Housewives, which I imagine everybody is, you're watching it and you want to, you desperately want to buy that Gucci handbag that you saw on there that is really crying out to you.

We can actually create uh highlights around uh the, the TV show to be able to pull back and create a commerce experience. So you can jump in and buy items or see more about items that you saw within the show. So we're looking at sort of uh shopp moments within, within our streams.

And then last but not least we've got some super innovative capabilities like In-Scene Ads where we can actually swap out a Coke can for a Pepsi can we could find billboards in the back of shots and replace what's on those billboards to, to, to advertise and create uh product placement within episodic content as well.

So these are really innovative new advertising and commerce capabilities.

But how do we deliver them from a technical point of view at scale? Well, all of this is enabled via a, a single flexible architecture that we refer to as the ad transco pipeline. And effectively we take advertising mezzanine files from Freewheel who are a Comcast company effectively, they, they act as a, an ad marketplace. We then use a CIST pipeline to trigger AWS uh Elemental Media, convert to uh put in some, some video um video parameters to encode the ad segments.

Those ad segments then get put into an S3 bucket and cached on our CDN. And we have a high performance database where we index those ad segments and keep a record of them. Now, when the client applications um are who are running our custom player that we've developed, start a video stream, they then make a call out to what we call a video ads module. And the video ads module determines should the customer see ads? We have a, a tier where customers should see no ads at all. But if a customer should see ads, what is their personalized personalization profile? And what type of ads should we serve to those customers?

So by then we know should a customer see ads, what type of ads should we serve to them where the ads are? And we have manifest files around how to serve those ads up. So when the player, if you're familiar with video, when the player hits a, an ad marker, what we call a scutti 35 marker, we then trigger AWS Elemental Media Tailor that from the server side, inserts those ad segments, it pulls them from the CDN. If they're not on the CDN it pulls them from the, from the S3 origins. It takes the ads from the CDN and just inserts them into the video stream. And that's true of live streaming. That's true of video on demand.

So really, we use that same framework, whether it's product placement within a scene, whether it's a highlight ad, that's a frame around the video, whether it's traditional advertising in an ad break, we use the same flexible cloud based architecture to deliver that.

Now. I i remember way ago, i believe it was Andy Jasse who coined the term, there is no compression algorithm for experience and we feel the same way about live sports. We've been doing live sports for, for a very long time now, uh we were one of the first streaming services in the uh in the world to offer it. And we've delivered events to a scale that's never been seen before in the industry. And i think what we've learned along the way is that you really need strategic partners to, to work with, to plan these events and deliver these events.

So bringing in partners like AWS very early ensuring that you have the infrastructure, ensuring that you can scale, actually testing and kicking the tires and the architecture is so so important. Um most importantly though, no great event happens without a great team behind it. So you really need the people that you can rely on, who know how to do this who have done it time and time again and will deliver on it time and time again, both internally and within your key partners. But we're so proud of the strength that we built within live event streaming. And we're really, really excited to bring you new events.

We've got the uh 2024 Paris Summer Olympics. We've got exclusive NFL wild card games coming up that for the first time ever will be exclusive to streaming services and can't be viewed anywhere else. So we really feel like this is just the tip of the iceberg for live sports streaming. And i'm super excited to see where we can take it from here, but i'm now going to hand back over to, to, to talk more about with Cloudfront, how we can specifically deliver.

Thank you, Simon. It's great to see what you and the team at Peacock have built so far to meet your uh customers demand. I can't wait to see what comes next. Let's dive deeper now and talk about tuning considerations for high quality of experience. We'll talk about caching and the configuration, how caching can impact your latency, how caching can impact your scale. We'll talk about uh resiliency, the resilient delivery of your uh of your content from the origin, how to protect the origin from high spikes.

And we'll talk about how we can tune to low latency, which has to do with the segment length and the, the um the balance between the segment length and this quality of experience that, that we're trying to achieve for low latency. We'll talk about the end to end metrics and correlation between client side to viewer side. We'll see how you can focus on a certain area. um internet service provider, a sn autonomous system uh and see what's the experience of your viewers at that specific region.

And finally, um i'll talk about our media event management team who can help you deliver and drive uh through a full end to end life cycle event. The growing number of live events delivered over the top drives up the peak to average capacity demand. This creates load on peering points between internet service providers and their connections to uh Cloudfront or other CDN to that point with cloud.

We have a multi tier caching approach and one of the tiers that we um added in the internet service provider is our embedded pops. This tier of caching is deployed in the service provider network closer to viewers and enable to reduce the load on the uh connection between the transit connection between or the peering between those internet service providers and Cloudfront.

Those internet service providers do not need to scale up their appearing links uh when they have live events so they can maintain those peaks while still uh uh not spending on cost on uh upgrading for their uh average delivery with over 500 embedded pops Cloudfront can basically deliver in multiple areas and we can allow you to select based on a DNS request if you integrate with some SDK in your platform to route your traffic to a specific um uh to a specific embedded pop with our network.

And our customer tells us that they see an improvement in first byte latency and last byte la latency for uh streams that goes through that embedded pops actually Simon. Um can you share with us um you know how, how Peacock leveraging our embedded pops?

Yeah, absolutely. So there are multiple ways to leverage embedded pops. Um and, and you've probably heard of uh of last mile challenges last mile congestion, last mile latency. That's, that's between typically the CDN and then closer to the customer's home and in within the ISPs network. And, and that's a real problem, especially as i mentioned earlier. We're, we're not only running um infrastructure within the United States, but we're building a streaming service in Africa.

We've got, you know, eastern European countries where the infrastructure may be less reliable and that congestion between the CDN and the customer can be a real challenge and it can really impact the, the customer experience. So when you consider as, as Tam mentioned earlier and as i, i went into our players are, are constantly generating uh events around the telemetry of the video.

We, we work with four second segments in most cases. So every four seconds, the the video player is going to be feeding to the back end. You know, i've, i'm running at this bit rate. I'm seeing this much buffering, i'm seeing this much throughput and, and feeding all of that telemetry data back.

And at the same time now as we're working with Cloudfront embedded pops, we're actually evaluating, are there any benefits to moving a customer from a traditional point of presence to an embedded pop? And if so, what might that benefit be? And if we detect that there is benefit in doing so it's going to provide a better, more consistent customer experience with lower late.

We simply offer an HTTP 302 redirect and just send the clients from the traditional cloud front pop to the embedded pop and typically without even needing to restart the player and without even a buffering event, the customer will seamlessly move over to that embedded pop. And the great thing is when you look at the telemetry that's being gathered, embedded pops have a different tagging mechanism, they appear differently within our logging.

So we can very clearly see which customers and which transaction were moved over from traditional pops to embedded pops. Uh what was the degree of benefit from moving to those embedded pops and in which locations? And therefore, we can plan and we can provide feedback to AWS about where we're seeing uh you know, tremendous benefits from this type of a strategy and where we would like more assistance in the future.

So it really provides us with a great degree of, of flexibility, a better customer experience and the visibility and observability that we have within that framework is is tremendous to us.

Thanks Simon, we're looking uh forward to continue to expand into new regions. Um for your audience as Simon mentioned, there are three ways to utilize those embedded pops. One, you don't need to do anything. We'll route some of your traffic through those embedded pops or you can use redirects on a DNS base.

Um this is including uh an SDK option or you can use that SDK in your platform um to send the proper url. Uh so you, we give you those controls or if you just wanna use it as is uh it's still gonna work for you.

Now, those other two layers of caching, other, those two tiers of caching that we spoke. This is within our um 6/600 points of presence, which we call edge locations. And we also have the regional edge caches which aggregates requests from multiple points of presence within that region before it goes to the before the request goes to the origin that improve the caching layer.

Now, in fullness of time, we have two areas that we need to take care of in terms of what would be the cashing, how we can maintain the scale for millions of requests and still be able to maintain consistency when those segments are being generated as the video continues, as the live video continues and uh a manifest is being refresh.

So let's first talk about the connection and the caching setting between Cloudfront and the client itself. There are three elements for caching that we want to address. One is the manifest, the segment and negative caching, which is known for cashing errors. We recommend to have different caching uh configuration for each one of those elements manifest is um is a file that has the indexes for segments and it's in live streaming, it refreshes consistently and frequently the segment can have different caching.

Basically, once the video has passed, you don't need to use it anymore unless you have the option in player to do uh playback and to go back and, and see um uh the program or the, the live show uh in retrospective. So you might wanna keep it for 24 hours or more depend on uh on your application needs. But for this manifest, we need to look at what's the baseline for the caching.

The baseline will be the segment length Simon mentioned before four seconds. So i took an example of four se four second segment. Typically clients will store between 2 to 3 segments in their buffer in order to start the streaming. And you want to maintain a continuous flow of those segments into the client. So the buffer stays full at, at all time.

Now i mentioned that the manifest has all the indexes of the new segments that are coming. So you need to refresh that manifest in order to know what's coming next. But you don't want to send all those requests all the time back to the origin and you have enough time to put some cash in so you can serve at scale from all those points of presence.

So the idea is to set it to no more than half, but not less than half than your segment size. That would hold enough time to use the scale of Cloudfront, but will give enough time for recovery if there is any issue uh in receiving the segment.

Now, negative caching has another, let's say we have a problem, right? And it's in a certain region. So now you can get a spike of requests because of the problem in that specific region that can also overload the origin. If you don't have any negative caching, it can overload the origin and actually create a more uh um and more uh horrendous effect on the other viewers that are actually getting the stream.

So you wanna set it up for, let's say one second of caching for your negative caching. So viewers who are getting the errors will consistently get that same error from caching, but then have enough time to recover once they make the request. After that second, that gives you about three seconds of your time line, which is still under 11 segment of four seconds

so we sort this out on your caching configuration. now, let's look at between cloudfront and the origin and there are other aspects for that.

so here, first of all, you want to have high availability, so you can fail over in case of an error, the recommendation is to have at least another origin. so you can fail over typically in different region. and then you want to set up in cloudfront, your origin fail over group policy to fail over on errors between the primary and secondary.

but now how do you fail over? how long do you wait to fail over cloudfront allows you to optimize all those settings in order to wait enough time but not too long and to try and fail over.

so if we look at all of that, you want to wait about two seconds until a segment, you receive a segment or you receive the manifest back from your origin that will set for a maximum of four seconds, which is still a single segment.

how many connection, uh how many tries for reconnect? right? if you do twice on the same origin, then you increase that to four seconds cause each way time is two seconds. now you're beyond, you're beyond that single second, single segment, four seconds. so you wanna have one retry and then fail over to your secondary origin that would allow you to maintain at least four seconds. but even if you go to five seconds or six seconds, it's still enough time to recover. when you flip over to your backup or to your secondary origin.

having this type of a setting allow you to increase your uh reliability for your uh delivery. have a uh fail over uh strategy end to end and still maintain a, a high request rate from your viewers. and when they say high request rate still, we have multiple regions sending those requests to your origin.

and for that, we have another layer, another layer of caching which is called origin shield origin shield is that you can think of it, the um original edge caching for the regional edge caches. so that's another aggregated uh caching point which helps you incr increase your cash he ratio and in the um uh request that response from cash compared to those that goes back to the origin.

and with this aggregation, you assign a specific regional edge cache as your origin shield and you wanna set it as best practice at the same region of your origin. now, if you have a primary and secondary origin, you can set two different origin shield in different regions, they will fail over automatically together with your um with your origin because they are assigned to an origin that gives you that high availability and that gives you the um ability to increase the probability for uh high quality of experience to your viewers.

if you're using multi cd n and you send requests from other c dns to cloudfront. again, the recommendation is to have to set up and enable origin shield because now you have more requests coming from other uh c dns and you want to maintain and protect the origin from any spike that come from those uh multi cd n for those third party c dns. so that's another option.

another case I want to talk about is low latency uh protocols like hlshtp live streaming and dash uh have improved and added the option for low latency. in this example, we have um our aws elemental live as our encoder and we have our media live as the video processing and aws elemental media package that package those segments.

and uh in last may, we announced support for uh uh low latency http with elemental media package cloud front fronting elemental media package as origin and serving this low latency. and as you can see, this is a manifest that i took from a stream and you can see what that. now every segment, this is a six second segment is being broken down to parts of a segment. so now you have six parts of a segment, each one with one second.

so what's happening now is that you have more segments or more parts of the segment that are being sent and you need to manage that caching that we spoke about before and the relationship to the origin from cloudfront to maintain that smooth delivery. so now we want to drop down the caching to one second because you have a parts of one second, ok? but this allows you to start playback much faster because the player doesn't wait for the full six second of segment and it receives those one second parts faster. ok?

but you need to make sure if there is any issue and you need to balance out, you know, between how many segments you're breaking, how many parts you're breaking down the segment, which might impact the um uh the latency, right? it might be faster, but it might cause higher request rate which at one point can cause uh uh some errors if there is any issue on the network.

so you always need to balance out how do you treat that all the way from the client to your origin? so we have a lot of components to this and we have different ways, different um use cases. as i mentioned, some are a spy quick first load. um some have transitioned between ads to the to the primary stream. and how do we monitor all of that?

we want to have an end to end observable earlier in the earlier in the session. i mentioned the i impacting factors on viewer quality of experience we have in live stream, we have a short time to detect and mitigate any issue that is happening. during the event and recover from it.

in september 2020 the ct a wave group released a standard um called common media client data. that's the cm cd specification today, you have many uh clients, open source clients like uh flexo in android um uh video dot jshls dot js that supports this standard. and if your application is built on top of those uh frameworks, you can enable cm cdc m cd is sending telemetry from the client video, start uh uh video, start time, uh re buffering rate, um throughput capabilities and many other parameters from the client with every request for an object.

so now the cd side, we collect that data and we stream it down to your logs and what we created here, you can send those uh uh parameters either via query string or via headers and you can now put them together in a log file with cloudfront real time logs. so now the real time logs is provided in a matter of seconds. now you have a full end to end stats from cloudfront and the telemetry for every request and you have the statistics from the client.

now you can make a correlation between what's happening in the delivery. um uh how long it took cloudfront to receive the segment from the origin and then how long the player it took the player. so now you can figure out where was the issue? if there was any issue? it w was it on a certain a sn um was it at the last mile? uh was it at one pop which you will see issues that coming from multiple clients that receive the, the stream from that pop? but that will give you a clear view for your viewers performance. and from that, you can uh build up quality of experience for your uh viewers.

we created an a solution for cm cd. you can, there's a qr code, you can download it. it will build up a dashboard for you. you can select the distribution cloud from distribution and will it will start monitor those um uh cm cd parameters and it will show you this is 11 dashboard screen uh out of that, but you have also a troubleshooting screen where you can see a trace of a single viewer throughout the entire show or the entire event and what happened throughout the event.

um this is available, you can download it with all this monitoring that your team needs to take your clients set up. um which advertising to add as simon mentioned, create all of that. there is a lot to do when you plan for high scale millions of viewers event for that. we introduce our media event team, our media management uh uh event team.

this team will work with you from the beginning where you need to plan your capacity where you need to plan where you wanna uh send the uh uh the stream uh security aspect, uh risk factors. they will work with you on all those aspects from discovery all the way to the day of the event being with you and do a retrospective.

this team is looking at the media side, media processing and the delivery itself for cloud fund. they um they have the red button for every engineer with those teams that they can call right away. if there is any issue, they have eyes on glass. they look at the performance with you with the customer, they can be with you on, on a bridge.

um and they provide regular updates, their planning, uh uh their planning phase with uh operation uh readiness report that, you know, you'll see the green red, green red until uh everything comes green and you're ready, you're ready to go looking deeper into what this team can do. there are three phases pret during event and post event in the pre event. the uh there are discovery and workflows that they will work with you.

we saw some customers with special needs, for example, um edge compute uh functions that needed for content, steering this team help them scale. um if they do need to use those comp computer, can it scale to the number of requests per second? um they are helping them with, with testing planning, how to ramp up this team worked with peacock as uh simon mentioned to prepare for different events.

um while the event is ongoing and they are with you on the bridge, this is, you know, their own life call that they can look and determine if they see any anomalies or something that is develop, developing. and they can already get in touch with a different engineering team and to ask them what's going on if they see any issue and then help steer traffic offload from one region to another in order to maintain that high quality of experience to your viewers.

and post event, post event is really important when you have consecutive events. if it's a, if it's a series of a season, for example, so you might have con continuous uh events day after day if something happened and one day you don't want it to happen the next day. so this team can help, help work with you and help uh um basically alleviate those issues.

um and the idea is to have your event successful uh with this team to summarize. here is a checklist for delivering live event at scale successfully spoke about planning, planning, planning, planning plan for capacity plan to which devices your viewers are going to watch. um plan for what segment length and optimize your caching high availability and custom functionality at the edge optimized based on the format you're using hls dash low latency, then secure delivery.

make sure you know, those are those rights content providers are paying a lot to acquire those sports rights. you want to protect the content, you don't want unauthorized access to this content and then set your end to end observable. so you can detect mitigate and recover from for any issue and then still a business you wanna make money out of it. so whether it's a subscription based ad based, you wanna make sure that all those ads are getting right in on time for the event.

and uh um you, you know, you get your business that, that you planned for simon last uh word. yeah. so uh you know, like we said, running a live event is not easy. but my my advice is, as tao said, planning is so critical, testing is so critical. but aws as a partner to us offers a number of services to help us plan to help us execute testing to even help us innovate in this space and many of those services are complementary.

so my biggest piece of advice is, is just engage with your aws team, engage early, you know, when we were building out africa elemental media tailor that we use for our insertion wasn't even an available service in cape town. it was a brand new region, but by giving aws enough notice, they were able to build and deploy that service ready for us to launch on it.

so having those conversations early, being able to test those capabilities early and and making the most of the expertise within aws has been really, really instrumental to help us uh uh deliver massive live sporting events at unprecedented scale. and uh i just wanted to thank aws for that and tell you all to uh go watch live sports on peacock. but thank you very much.

thank you simon for sharing with us today. and thank you all. i know it's uh not sure if it's the last for you today, but i appreciate your time and staying with us. don't forget to uh complete the survey and uh simon and i will be here next to the stage to take questions. thank you.

thank you.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值