The power of cloud network innovation

Please welcome Vice President, Amazon Elastic Compute Cloud, David Brown.

Hey, everyone and welcome to re:Invent 2023 and welcome to the Networking Innovation talk. I'm really, really excited to be here and it's really great to see all of you as well.

Today, we have an action packed talk. We're gonna be covering a lot of details around recent innovations in the networking space. But before we get started, let's ask the question, why does networking matter?

And to answer that, I want to go all the way back to around 1440 in Germany where Johannes Gutenberg invented the movable type printing press. His invention drastically reduced the cost and speed of printing. But one historian Ada Palmer said to Gutenberg, "Congratulations, you printed 200 copies of the Bible. There are about three people in the town that can read. What are you going to do with the other, the other 197 copies?" And Gutenberg actually died penniless. His presses were impounded by creditors.

Well, some of the other printers in Germany at the time, fled to greener parts and eventually ended up in Venice, which was the central shipping hub for the Mediterranean in the 15th century. And here you could print 200 copies of any book and sell five of them to each passing ship. And these ship's captains were happy to take these books and distribute them around the world.

And so while Gutenberg's printing press was revolutionary, it was this network of vessels that carried these ideas around the world. And even if you have the greatest product in the world without connectivity, it can be very difficult to succeed. And so at his core, networking is really about connections, whether through the ships that traverse the globe or undersea dark fiber cables that AWS carries your data around networking drives collaboration and the exchange of ideas and it creates communities that transcend geographies.

Now, when we started building AWS cloud, we wanted a network that would ensure that your traffic goes from point A to point B and you'd never have to worry about performance reliability or that your data is exposed to security threats. Basically, we never wanted you to have to think about the network. And that means keeping everything on a network that AWS had total control over. This is the AWS backbone and we actually control every inch of optical fiber on that backbone and every connection that runs through it, it connects all of our regions, availability zones and edge locations. And we are constantly expanding it because of the demand for networking capacity is constantly growing.

Fox Sports recently changed their live stream of the NFL Super Bowl to 4K, just moving to 4K doubled the amount of bit rate needed to carry that workload. We've also had enterprises around the world who are currently generating 74 zettabytes of enterprise data. I not only had to check the spelling of zettabytes. I also had to look it up to see that it's actually one trillion gigabytes of data that they are generating every single year. That spelling is correct by the way.

And finally, as we look at new use cases like driverless cars that are emerging very, very quickly, the connectivity that's going to be required to carry this data to the cloud is constantly expanding. And these are just three examples. And I'm sure many of you are experiencing the same increases and demands for networking capacity.

And so as we look at what our capacity on the backbone can handle in 2019, I stood here and I was very pleased to tell uh the folks at re:Invent that we'd actually doubled our network capacity over the few previous years. Well, in the last three years alone, we've doubled our total network capacity on AWS once again, this rate of growth is just phenomenal.

We've also increased our regions. And as you can see, we launched our first region in our US East One Virginia based region in 2006, we launched our second region about a year later in EU West One in Ireland. And since then, we've actually launched 32 additional regions around the world. And we actually have 41 in New Zealand, Canada, Thailand and Malaysia that are still on their way. You can see the rate of us launching new regions in those five year increments has also increased a lot of regions matter what's important about regions.

Well, firstly, it gives you more choice. You want to be able to locate your applications and data in certain countries. You wanna put your regions closer to customers. And we're also able to provide you with increased availability. So the number of regions is an important part of this, but it's really how those regions are actually built, that's critically important and we still today even more than 12 years. So say 15 years into AWS do it differently from every other cloud provider.

So let's take a closer look at what a nas region looks like inside each region. We build multiple availability zones or a Zs. I'm sure you're familiar with those. And a Z is a collection of data centers. In some cases, it can be one or two in many cases, it can be tens, even hundreds of data centers and they are all placed far enough apart that they are outside of each other's blast radius in case of major disruptions, but they need to be close enough together that we are still able to have manageable network latencies each region consists of at least three distinct availability zones. In some cases, we have as many as six, we have transit centers which are, which connect our regions into internet capacity and we bring massive amounts of internet capacity into each of our regions.

And so these centers connect our our backbone network to our data centers and provide redundant connections for the resilience that we need. And this design is what differentiates our regions from other cloud providers and gives you the confidence to run even your most critical workloads in AWS.

Now, some customers have workloads that are very, very latency sensitive and the closer that we can get your applications to your end users, the better that network is, we haven't been able to solve the problem of the speed of light yet. And so we had to make the network cable shorter. We faced this problem in 2019 when we launched our first AWS Local Zone and the target market we were going after was the media and entertainment industry in Los Angeles in Los Angeles. They shoot movie shows during the day or at night. And that footage is immediately uploaded to AWS and all of the graphic work that's actually done is done using GPUs on AWS. And you can imagine when you're trying to do colorization or footage or animation latency really matters.

And so today, we have over 35 local zones around the world and we have another 19 local zones on their way. If you look at the Los Angeles use case and just how much of an impact that had latency from Los Angeles to our Oregon region is about 25 milliseconds of round trip time. That's probably just a little too much if you're spending all of your day in Photoshop or some animation application uh doing editing with the local zone that we were able to bring to Los Angeles. Customers now see latencies as low as 1 to 2 milliseconds completely game changing for those use cases. And this is why customers like Netflix use AWS in Los Angeles for all of their editing of of video footage.

We have another really interesting customer and it's really proof for how network latency can actually inspire innovation and change the game. You may have seen this on Jimmy on Jimmy Kimmel recently, but Proto Hologram is a lifelike 3D display, it's 7 ft high and it gives an impression of a 3D person rarely being there. We worked with Proto Hologram to help them bring this to life. And so rendering the level of detail that Proto needs means a lot of data flowing between locations. And when Proto actually tested the system on the Local Zones, they saw lac dropped for users in the metro area that they were testing and low latency critical for making holograms or say talking to someone on the other side of the country feel like they are really, really there in front of you. Amazing technology they've been able to do and be a lot of fun to do it with them.

Now, some customers want to connect to AWS and decrease in latency is not the only part of the equation. You also have to think about availability. How available is the network and when you connect to AWS over the internet that comes with some level of unpredictability, no matter how hard we all collectively work to keep the internet up and highly available. So many customers choose to bypass the internet entirely and use our service called AWS Direct Connect. And this way, you can actually connect via physical fiber, very similar to MPLS type networks to routers that we've actually placed in colo location facilities around the world. In many cases, you may be in a colo location facility with Direct Connect and it's simply an optical fiber to connect into AWS. And we've opened 20 new locations this year, bringing a total of Direct Connect locations around the world to 100 and 30 all supporting speeds of up to 100 gigabits per second.

This year marks 15 years of innovation with AWS CloudFront, our content delivery network. And this is not just about getting your data to AWS, this is more about serving your data to the end customer and really improving that experience. Five years ago, when CloudFront turned 10, we celebrated 100 and 50 points of presence, which we thought was impressive at the time today. Just five years later, CloudFront has over 600 points of presence. 13 regional edge cas cas and handles over three trillion requests every single day.

Some of you might be familiar with the NFL or Thursday Night Football on Amazon Prime and most uh an example, it's, it's sorry during one of the football games this year, data from CloudFront actually peak at 100 and 20 terabytes per second. Now a game that I'm a little bit more familiar with is cricket. No matter how it happened to the South Africans in the recent tournament, the platform played a pivotal part in the role of this year's broadcast, which was the ICC Cricket World Cup uh on Disney Plus and the Indian subsidiary Hot Star. In fact, Hot Star set a new record for concurrent live viewership with 59 million people watching India versus Australia in the final of the World Cup. And this speaks to the scale that we're able to achieve with our global network and services like CloudFront today.

Now, if we look at our modern ability to transfer data between the world instantly, we've come a long way since 1837. And the invention of the telegraph in early systems of long distance communication passed information from tower to tower over long distance using literally telescopes and indicators that could be manipulated to form letters. And as you can imagine this was probably not that reliable but also had a significant security concern because somebody in another hilltop with their own telescope could read your message that all changed. Though, with the invention of the electromagnet, which enabled the electronic telegraph and usher in a new area, era of reliable communications, ideas could now be transferred reliably over long distances, including over the first transatlantic link, which was actually put into place in 1865. And electromagnetism went on to enable a huge number of innovations such as MRI s and semiconductors and are still used broadly today like electromagnetism, the nitrous system was key to unlocking numerous bc two innovations.

And we speak about the nitrous system in almost every one of our re:Invent presentations because it's been so fundamental to how we've built AWS and such an enormous impact on our scalability, our performance, our availability of the instances and the network that we're able to provide to you. And when we designed DC two back in the, in the day, in 2006, we put it together in pretty much the way you would design any system. We used a software based hypervisor that was generally available at the time and we built all of the distributed systems around it. But we quickly realized that if we were gonna be able to host the world's most critical workloads and what enterprise customers were going to expect, we needed to completely rethink the way that we did virtualization.

And so we literally redesigned virtualization from scratch. And with Nitro, we envisioned a way to move the overhead of the host machine to recoup all the resources while improving security. And so today, 10, almost 11 years later, in my first nitrous system instance, we're still one of the only, the only cloud provider that uses no part of the central machine resource for our own workloads. So everything from network processing through storage, through security, through all the internal APIs that are needed to the hypervisor, all of that runs on my own hardware and doesn't actually use the Intel or AMD or Graviton CPU. That may be in that machine.

This comes with significant benefits. Firstly, had we actually stayed on software based uh virtualization, we would probably be limited to about 100 gigabits per second and you wouldn't be able to exceed that. The growth of networking throughput has been amazing. On EC2 over the years, I remember being incredibly amazed back in 2008, we launched our C1 instance with one gigabit of networking bandwidth. And then we launched our CC1 instance 22 years later, with 10 gigabytes 10 times more than the first one we had. And in, in C5 came in 2017 and you can see it was 25 gigabytes. We hit 100 gigabytes in 2019. And today with our C7g in instances we're offered up to 200 gigabits of bandwidth, double the bandwidth that we had in the previous generation. And so that's continued to increase. And the Nitro system has really helped us do that.

There's one workload though, which you may have heard of that has changed a lot recently. AI/ML in 2020 we released our P4d instance powered by Nvidia's A100 GPU with 400 gigabytes of networking throughput. And again, we thought that was a lot of the time, but we were proved wrong by the incredible rise of generative AI that has shaped the technical landscape as we know it suit me over the last 12 to 18 months and continues to drive the need for higher throughput and lower latencies.

So let's look at this has progressed. Well, just recently, we launched Trainium one about two years ago. Trainium one and Ranum one actually provides 800 gigabits per second on a single instance. Then we launch Trainium one end and we increase the throughput to 1600 gigabytes per second. And recently we just launched our P5 instance using Nvidia's H100 GPU, the very latest technology and we provided 3200 gigabits per second on a single two instance. The growth has been amazing but providing that deck connectivity at the instance level is just one thing. We also have to think about what that means at a broader networking level. And this is one part of the move to generative AI that I haven't heard a lot of discussion about. And so I thought today we'll give you some insight into what goes into designing data center networks to carry the level of traffic that's needed for generative AI.

If you go all the way back to 2012 models at the time, used a maximum of two GPUs and about 10 gigabits of network traffic to handle about 60 million parameters today with clouds. Uh Anthropic Claude v2, they require over 10,000 GPUs and many petabytes scale networking at latencies that once again challenge the speed of light. And so when we started to see more and more of these workloads, we realized that we needed to completely rethink the way that we did networking in the cloud.

And we've spoken about this before and you're probably familiar with this. In our data centers, we use switches that are deployed using the Clos architecture enclose a typical non blocking multistage networking architecture that is often used in large scale networks. It is capable of supporting aggregate throughput of up to 50 petabytes per second in a single topology. And our initial ML instances launched in isolated pockets uh within this network. So we had one large slow network and we had isolated pockets of connectivity and this design worked well at first, but then we started to run into some challenges and a few things happened.

So firstly, we have the risk of traffic congestion as the network, as the use of the bandwidth within the the AI/ML part of the network increased, we may have impacted other workloads. We also saw inefficient performance. We weren't getting the performance we needed and we knew that we'd have to change if we were gonna keep up with the incredible scale that AI and ML was bringing.

And so to overcome these issues, we decided to physically isolate this capacity and reduce the risk to manage our network. So what we did, we created another network. I'm sure many of you have done that. And this became known as the AWS Ultra Cluster 1, a multi tier non blocking interconnected networks that support better bit scale networking. And they can support up to 4000 Nvidia GPUs at the time. And back in 2020 with our P4d instances, we were the first to utilize the Ultra Clusters delivering 400 gigabits per second network between these instances.

But again, we knew we needed to do more. And with the unprecedented growth of machine learning workloads, we quickly saw ourselves going from a room full of P5 racks to multiple buildings across multiple campuses, all running these P5 racks. And now with a single instance needing to deliver 3.2 terabytes of throughput, even Ultra Cluster 1.0 wasn't going to be enough really only two years after we released it.

And so we redesigned how Ultra Clusters worked and we launched Ultra Cluster 2.0. It's a flatter and wider network fabric optimized specifically for P5 and future ML accelerators. And it allows us to reduce the latency by up to 16% with 10 times more bandwidth and supports up to 64,000 GPUs in a single cluster. It makes the use of AWS Development Routing Protocol. So we defined a brand new routing protocol which we needed to overcome and prevent mis wirings and also allow for sub second convergence across campuses. It also provided improved availability as well critically important when you're running very large jobs across tens of thousands of GPUs today.

I'm also very happy to announce the availability of a new feature called Amazon E2 Instance Topology API. Mr Cle, the instance of API allow actually will provide you with detailed information about where your instances are located. And so as we take a bit of a closer look, it becomes critically important for you to know how many hops are between two instances within an ultra cluster. And so in this example, you can see job one is located across five instances, sorry, three instances. But in some cases, we've got 12 or three hops network hops between those instances. And it would be much more efficient if you knew that you could actually go and locate those instances on physical servers where there's only a single hop between all of those services. And this is what the ECT topology API will allow you to do. It will provide detailed information on exactly where your instances are relative to other instances and allow you to optimize your workloads for network performance.

We've also redesigned the way that routing works. Traditional TCP routing will route packets across the network, picking the flow, picking a set of routers for that flow. And if the network connects to your instance is congested or maybe it is a single router along the way, we have multiple flows hashed to that, you can see network congestion and that can cause impact. And as we started to look at these very large distributed workloads, we realized we needed to find a better approach

And so we invented scalable reliable data gram routing, which we've been using for a few years now. It's actually what underpins Elastic Fabric Adapter so it is optimized for performance by sending data over as many network paths as possible, avoiding overloading any paths. And it runs directly on Nitro.

And it was first supported by Elastic Fabric Adapter for high performance computing in the HBC space. And now we've brought that as well to the, the ML space and initially, it was only available in our Extra Large and Dot Metal EC2 instances.

Today, I'm super happy to announce that SRD is now available on all Nitro instances. And we have 85 instances to now support SRD all using our ENA Express functionality. And you can turn on ENA Express by simply checking one box in the Boost console and it comes at no extra charge.

And when you, when you enable that, your flow limits go from five gigabits per second to up to 25 gigabits per second. And you also see reduced P99 network latencies of up to 50% and so much more efficient routing. There's nothing to change in your application. By the way, your application still speaks, whatever networking protocol you were using now, network performance and speed and latency, all really matter in networking.

But what also is incredibly, incredibly important are the tools that we give you to build networking topologies on AWS. Amazon VPC is one of those tools that I'm sure every person in this room if you've done anything in networking and AWS uses. And with a lot of success, I was part of the team that designed Amazon Virtual Private Cloud.

And I remember a lot of those early discussions when we launched EC2 back in the day, we had one big flat network, there were no private networking. Everybody used security groups to control access to their instances and it worked relatively well. But we knew that we needed private networking if we were gonna bring large enterprise workloads.

Um and almost all the workloads we have today to AWS. And I'll tell you that there was some heated debates back then. Do we introduce the complexity of route tables? Do we need subnets? And you can see where we landed, bringing a lot of the traditional networking constructs to AWS.

And one of the things we were absolutely sure of at the time is nobody would need more than one VPC. Well, we would prove wrong on that assumption. You all proved us wrong on that assumption. And many of, you know, are running thousands of VPCs and they've become an incredibly useful construct. As you think about creating new network topology and creating isolation and supporting developer applications and sandboxes. It's a basic unit of, of development on AWS.

Well, the next problem became, how do we allow you to communicate between VPCs? Since we thought you would only ever need one, we didn't spend too much time thinking about intra VPC communication. And about 2013, we actually launched VPC peering which allowed us to start peering VPCs. And this worked well for a period of time and then scale caught up with us again.

As soon as you got to about 100 VPCs, it became impossible to manage this peering mesh. And we heard pretty loudly from a lot of customers, we needed a better way. And so in 2018, we launched Transit Gateway and Transit Gateway has really revolutionized the way that we do connectivity between VPCs across our AWS regions and also from your on premise networks to AWS.

It also gives you the ability to connect, as I said, across regions and on premises and also has direct integration with VPN and Direct Connect. Now, I was very keen to spend a lot of time talking about the benefits of Transit Gateway in AWS, but I thought it would be a lot better to have a customer to come on stage and talk to us about their journey and what they've done in building an AWS network for one of the world's most critical workloads.

So please join me in welcoming Capital One's SVP of Cloud and Networking. Will Myer.

Good afternoon everyone. It is awesome to see all of you. My name is Will I lead cloud and cloud engineering and network engineering at Capital One. If you're not familiar with Capital One, we are a big financial services company. We are a bank, we will give you a credit card, a car loan or a lot else. We also run entirely on the public cloud. And so I want to talk to you a little bit about our journey on AWS. And then I'm going to share a few of my favorite quotes that I think kind of sum up what we've learned about AWS networking along the way.

I'll start with a little bit of history here. Apologies. One slide. So we were a disruptor in the credit business really a long time before the cloud was a thing. But we really understood the power of tech. We understood how to pair it with risk management, really excellent risk management so that we could be ambitious about how we use tech.

And so when we saw the cloud, we saw this huge opportunity to do more for our customers by just getting out of the business of managing infrastructure. And so in 2015, some of my colleagues from Capital One stood on stage here at the event and talked about how we were going to move all of our workloads to AWS. And we did that over the next few years, we moved thousands of workloads applications to two AWS regions.

We worked across five divisions. We gave tens of millions of customers a pretty great experience along the way. And in 2020 we closed our last data center. Now, many of you have been through big enterprise cloud migrations. Some of you are probably doing them now and you know that there's a big difference between just getting to the cloud and really thriving on the cloud once you're there.

And so for the last couple of years, we have been really focused on getting as much value as we can out of that investment and out of that position, we've been improving our cost efficiency, increasing our pace of delivery and our ability to access innovation from the cloud. And today we run a pretty big environment we have on a given day, probably 100,002 instances, a couple of 100,000 lambda functions deployed more depending on how you count petabytes running around the network.

So it is a big environment and we spend a lot of time managing it. And I think we've learned a couple of things, the first thing we've learned, and I'll give you a classic quote here from Arthur C Clarke. Programmable infrastructure actually is kind of magic. The simple fact that the network like the rest of the data center is virtual and dynamic and managed with code has really been transformative for us since we moved to the cloud, our teams deliver 10 times more often.

Our storage and compute costs on a unit basis are dramatically lower and we are doing so much more with our data than we ever could before. Think about all the things that we now basically just take for granted in a matter of minutes. You can spin up a service that's doing thousands of transactions per second, fully automated scaling load balancing across geos. This is really nontrivial networking that we're all just doing every day. And by the way, most of us are doing it with APIs and infrastructures code, not with click ops and slack messages.

It's making us more resilient at Capital One. We run all of our applications active, active across multiple AZs. Many are active, active across regions. It is amazing what you can do with AWS load balancing and Route 53. As long as you have your health checks set up properly. And since moving to the cloud, despite a huge increase in the amount of change that we push out, we experience incidents about three times less often. And I know many of you have seen something similar.

I think all of this is really just the power of automation, right? So AWS being able to offer multiple layers of load balancing, fully integrated with DNS and instance management is a really powerful example, a capital. When we also run one common set of deployment and build tools across the entire company. I will say that that is a journey that's not for the faint of heart, but now that we're there, it's letting us do things like automate consistent scaling configurations to easy rollbacks.

We also talk at Capital One, a lot about serverless, we believe for a huge number of use cases. Not all up the stack serverless is the way to go. We have over half of our applications built with Lambda or running on Fargate. So lots of lots of great stuff happening, but it's not all roses.

The second thing we've learned is that complexity really is waiting for you around every corner. You know, Tom Peters, he writes business books. So he was not talking about cloud networking. When he said if you're not confused, you're not paying attention. But I think he could have been because working against that complexity requires a lot more expertise and a lot more time than you might think.

The first time you spin up any EC2 instance, give a couple of examples. A common one, Dave was just talking about. So when we started on the cloud, we started with a pretty basic account and VPC setup. So dozens of accounts for our different divisions, dev test production, et cetera, handful of VPCs. And over time, we start to deploy more and more applications into those accounts. We have more and more AWS services, more and more internal services that everyone else needs to talk to.

And before we know it, we have a huge amount of peering complexity in our environment, we actually start to hit route limits as well. And so we have to start pulling those accounts apart, moving applications into their own much smaller accounts, going to a hub and spoke model for all the VPCs. And generally, that's worked well. Today, we interconnect almost 4000 VPCs through Transit Gateways in our regions. We pier the regions and by the way, we actually cut the peering a couple of times a year to prove our ability to handle a full regional failure and run exclusively in one.

I'll give another example, you know, it's 2023 but surprise being good at address management actually still is hard and still matters. A bunch, I mentioned our shared build tools. Well, underneath those deployment pipelines, we have a ton of agents that are doing those builds, those builds are running tests, those tests are doing all kinds of networking and over time, more and more applications, more and more tests, more and more builds and we end up eating up way more of our IP space than we ever imagined.

And we could triage that for a little while just by being smarter about how we allocated it, it ranges to different teams. But really the only scalable solution there was to put all of those agents behind private NAT gateways. And it's a pretty simple use case and kind of an obvious solution actually. But I think it's a good reminder that when you're spinning up this infrastructure constantly, these kinds of small things really add up fast.

I will say challenges aside, you know, I think the future is still incredibly bright. AWS is moving really quickly with new services, but also strengthening existing ones which we love even more. Tim Berners. Lee said this, he was talking about the web and he said this almost 15 years ago. And I think right now, I know we're all talking about AI all the time, but I think it's still true about the cloud and I think it's true about networking on the cloud that the future is still so much bigger than the past and we're excited about it.

One of the biggest places we see outcomes is in security. We talk a lot about dynamic policy driven network boundaries and context where access, you don't implement that stuff with static partitioning and segmentation, you implement that with software with the kind of higher layer routing primitives that we have on the cloud.

I'll touch on SD-WAN. We have probably like many of you a big SD-WAN deployment. It connects our sites and our retail locations. And when you start in the cloud, the cloud is kind of its own network environment, right? It's the place where all of your cloud workloads are. And then over time you start to move some of your appliances to cloud services. Maybe you get more resilience, more observable, but the cloud isn't just a place to run virtualized legacy network hardware, right?

What we actually want is for the cloud to actually be our backbone. We want to use Transit Gateway and Private Link to keep connecting our workloads with our sites and our partners. We are rethinking employee access for this now, primarily distributed world that we all live in. And we're hopeful that with AWS Cloud WAN, we can actually get to a fully cloud managed WAN over time.

We don't just want cloud, we don't just want network infrastructure that runs on the cloud. We want a truly cloud native network and we want all those benefits of scalability and programmability and resilience that we get everywhere else on the cloud.

I'll leave you just by saying that for Capital One, AWS has really been a huge part of our networking story, not just our cloud story, we have learned a ton. We have benefited a ton and we're excited to keep building together back to you, Dave. Thanks for your time.

Thanks. Well, that's, it's been an amazing journey, partnering with Capital One. As we've helped them move their business into AWS and, and build the network. I was very excited to hear Will mention how Capital One is expanding the global footprint. And I'm beginning to roll out their AWS Cloud WAN infrastructure.

You know, for many, many years, we were really just networking in the region, but we've also found that customers just like, well, Capital One have asked us to look to solve broader networking problems all the way back to on prem. And as the cloud became such a central part of your network connecting to on premises, data centers and colors has really created a lot of networking complexity.

Now, you could solve this yourself, you could use Transit Gateways, you could use interreg peering. But we knew there was a way that we could simplify a lot of that complexity. And so we introduced AWS Cloud WAN and with Cloud WAN, you can use AWS as your middle mile network backbone connecting your VPCs and on premise locations globally.

A Cloud WAN unifies them into one network that fulfills the traditional role of your WAN. But it's elastic and can scale and flexible as the flexibility of what you normally expect from AWS. And another unique capability of Cloud WAN is you can easily create network segments. In this example, you can isolate sensitive traffic such as the finance data or the sales data from the general network.

Controlling all of this is also incredibly simple. And we wanted to make sure that you can automate every single part of your global network deployment and you can do this all with your own enterprise grade global LAN and manage it all in one place through simple policies. So writing a simple policy and deploying that allows you to control the full network.

One customer that's made extensive use of Cloud WAN to modernize their network is Sonos, a developer and manufacturer of one of my favorite audio products. Cloud WAN helps stream their operations of their network. Reducing the times that tasks take from weeks to just a few hours while increasing throughput across their backbone by 3x. Pretty amazing. They also use Cloud WAN to bridge the connectivity between their existing SDWAN networks and AWS. This is really really a great success story but speaking about SD-WAN, we're continuously looking for ways to simplify integration of SD-WAN back to AWS.

And today I'm happy to announce AWS Cloud WAN Tunnel is Connect AWS Cloud WAN tunnels. Connect allows you to integrate your SD-WAN appliances into your Cloud WAN deployment without specialized tunnel protocols. Really a lot simpler. And what that all means is we were moving the GRE overhead completely AWS Cloud Tunnel is Connect, increases your bandwidth by up to 5x because you're no longer needing to tunnel and provides up to 100 gigabits per second per availability zone.

You get native BGP support between SD-WAN appliances which makes it obviously even easier to use. And I know networking professionals love BGP and it also integrates with your existing Cloud WAN appliances and partners. The partner ecosystem in the space is so important and we started innovating in this journey and collaborating with Cisco and then had the pleasure of working with other leading industry partners that you see on the slide.

We're very excited to see how this new feature helps you take your wide area network to the next level. No, we saw how expanding your AWS infrastructure with VPCs is difficult but it brings a challenge managing IP addresses effectively. And our next guest has spent a large part of her career worrying about IP addresses and has actually become a leading world expert on the topic. And so I'm very excited to welcome AWS's own Technical Business Developer for IP Address Strategy.

David Brown:

Hi there, we see so many of you deploying global networks and many VPCs connected to those global networks. All of those VPCs need to have IP addresses assigned. In some cases, they may be unique and potentially across thousands of deployments. Many of you are starting with spreadsheets and you quickly realize manually tracking can be overwhelming. Doing it manually just is not practical.

Sorry to help with that, two years ago, we introduced VPC IP Address Manager or IPM. IPM simplifies IP tasks such as assigning, tracking, and monitoring of IP addresses at any scale with real-time visibility. But that's not all we've been working hard on IPM. And today I'm pleased to announce that we have a few enhancements.

First, enhanced automation. IPM can now assign IPs to VPC subnets in minutes, eliminating the hassle of manual subnet management.

Second, bring your own Autonomous System Number so you can move to AWS and use the same ASN your customers know and you have always used on premises.

Lastly, I'm very excited to tell you there's now a Free Tier to make IPM more accessible to everyone. This also gives you insights into public IP usage, helping you with cost optimization and efficiency of your network.

Speaking of IP addresses, let's talk about IPv6. We see IPv6 as the future. It's a huge leap beyond the limits of IPv4. At AWS, we focused on two key angles. First, enabling IPv6 connectivity for internet apps and second, empowering you to overcome IPv4 IP exhaustion when deploying massive workloads.

IPv6 is more than changing the address schemes. It unlocks the next era of hyperscale possibilities. We've been working hard on v6 and I'd like to call out three of our new IPv6 capabilities we've launched just this year.

First, managing IPv6 just got more flexible with IPv6 Contiguous Blocks. A large contiguous block simplifies your address allocation much easier than managing many small blocks manually.

Second, Gateway Load Balancer support allows you to have IPv6 flows end-to-end using the firewall of your choice.

And third, Global Accelerator has extended IPv6 support to ALB, NLB, and EC2 endpoints with 35 services now supporting IPv6 adoption growing year over year and we're so excited to help you during this transition.

Thank you for your time. Thanks, Tina. It's amazing the things that happen at scale. Whoever thought we'd be worrying so much about the management of IP addresses, but it became a real problem. Just incredible to see the level of innovation around something like that.

Now, these IPv6 insights got me thinking about where the internet began. It's about 50 years ago now. In 1969, the first two computers were connected between UCLA and Stanford creating the first segment of what would eventually become the internet as we know it today. And this is also when the networking protocols like TCP/IP were crafted.

And during the first year, only four computers joined the network. But you fast forward a little bit to 1975 and nearly 50 computers had joined the network in the US and several others abroad were also connecting via satellite links. And it slowly grew. And by the late 1980s surpassed 100,000 computers. And as it evolved into the internet, traffic routing became more and more of a challenge.

And to address this, the first physical load balancers emerged in the late 1990s and swiftly became crucial for sustaining the fast growing internet, right?

I'll confess that I joined AWS in 2007 and I was much more of a distributed systems developer and not much of a networking person at all. And it was in early 2013 that I was asked to lead the Elastic Load Balancing team that my networking journey at AWS really began. And these were the early days of load balancing in the cloud with limited features and very few load balancers doing more than 100 gigabits per second.

And today, Elastic Load Balancing handles over 300 terabits per second at peak on a daily basis. And load balancing is really about availability. It allows you to spend less time optimizing applications, ensuring that garbage collection doesn't impact your application performance. And it really just papers over a lot of those issues.

So I wanted to spend a little bit of time talking about Elastic Load Balancing and some of the incredible features that we've added and one or two new features we will be adding today.

Well, historically, we enabled cross-zone load balancing to isolate your workloads and improve uptime. You know, even though we spend all the time designing AZs correctly, they aren't foolproof. Time to time, you can have AZ failures and they can still take down cross zone dependencies. And so we enabled intra-AZ traffic isolation and failover of impaired AZs which reduces the blast radius. That was launched earlier this year and it improves overall availability automatically.

And our next quest was to further improve application availability. And so I'm happy to announce that we recently launched automatic anomaly detection for weighted target groups. This service allows you to detect and mitigate AZ failures for Application Load Balancing targets.

So let's take a closer look at how it works with anomaly detection. ALB identifies underperforming targets without configuration or any health checks, reducing the traffic that we route to allow them to ultimately recover.

And so in this example, in steady state, you normally have traffic distributed evenly across all of your backend targets. Now, anomaly detection uses insights we have gained from years of load balancing experience along with machine learning algorithms to continuously monitor for underperforming targets. It's also able to distinguish ambiguous situations - is it the backend application, is it the network between the load balancer, is it the load balancer itself?

And when it does detect an anomaly, it will tell you via CloudWatch as well, but will also automatically weight out the failed node and distribute traffic accordingly. And doing this accurately, accounting for any underlying issues is really, really hard. But when you get it right, it significantly improves application availability. And we're very excited to see how this helps you to further improve your application availability with ALB.

Besides routing, load balancers are also your first line of defense for your application. Strengthening security instantly protects everything behind them. Now, encryption with TLS has been a key feature since all the way back in 2011. In 2014, any of you remember Heartbleed that really highlighted some of the challenges that we had with TLS at the time, just generally across the industry, as well as some of the risks with the libraries that were available for SSL processing.

And coming out of Heartbleed, we actually ended up writing our own TLS implementation which we ultimately open sourced called s2n. s2n is now broadly used across Amazon and also used by all of our Application Load Balancers. It improves the speed, simplicity, and it also avoids rarely used options within the TLS state machine such as the one that led to the Heartbleed issue in April of 2014.

Now there's been one feature that was asked for by customers when I joined the team in 2013. And we worked really hard to try and address, but it was very risky because it involved parts of the SSL state machine that were known for vulnerabilities for many, many years.

Well, recently we've implemented all of that functionality in s2n and today, I'm very pleased to announce the availability of Mutual TLS authentication for Application Load Balancer. This is a fully managed authentication system that uses certificate-based identities with mTLS authentication. You get a fully managed certificate-based identity system. It includes client certificate authentication powered by AWS Certificate Manager or third party certificate authorities that we've integrated with.

And most importantly, you can integrate this with an extremely secure library to avoid pains like we had with the Heartbleed event. We're excited to bring this critical feature to you and make deploying mTLS within your environment very, very simple. Excited to see what you do with that.

Well, really our features is good, but we also focused on providing you with a better user experience. For example, we prioritized CloudFormation support so that you can use ALB within your infrastructure code. We're invested in providing timely updates to the ALB controller so that Kubernetes users can get new capabilities when they launch on ALB on EKS. And lastly, we're investing in making your transition from IPv4 to IPv6 as seamless as possible with all load balancers supporting dual stack configuration and end-to-end IPv6 communication.

Today, we're also looking for new ways to reinvent networking as we know it. For years, we've dealt with IP addresses and route tables and subnets. But what if we could create a way that developers could build applications without ever having to see an IP address.

And then last year, you may remember that we announced VPC Lattice for service-to-service communications. Lattice provides a simple, intuitive developer experience where teams just register services and define access policies and have no more network complexity. It really takes mesh networking, the concept of that, to the next level.

And I'm also happy to let you know that earlier this year, we made Lattice generally available. And since launch, we focused on security features and support for shared VPC, propagation of authentication identities. And the feedback that you've been giving us has allowed us to really innovate really quickly in this area.

When you use Lattice, you start by creating a service network that's simply telling your VPCs should be able to support Lattice. And then you can create and enforce authorization policies to have control and visibility across all of your developers. And finally, you share the service network with developers who have applications and services that they want to deploy and they can run across any AWS accounts or any shared VPCs. And your development team can then register their services with the service network and they are done, all this without even having to think about a traditional networking construct.

And importantly, as a network administrator, you still have all the control, all the visibility, all the monitoring that you have with normal AWS networking. We're very excited to see where AWS Lattice is going.

One customer that recently deployed Lattice is Block and they chose to use VPC Lattice as the next generation architecture to simplify service-to-service communications between all of their subsidiaries such as Square and Tidal and Afterpay. And they said that it helps us to maintain better security practices at scale without requiring our developers to completely refactor their applications.

Well, one more topic. So we spoke about networking being about connections, but ensuring that data is protected during transmission has been another challenge over the centuries. We all know that the Greeks used scytale - they were wooden rods and leather straps for secret messages. Not exactly sure how that worked. The Romans used substitution ciphers, shifting alphabetical letters, which is still a fun way of teaching cryptography today. Where eventually more advanced key-based ciphers emerged including Enigma, which Alan Turing helped crack in 1939.

And then after centuries of diplomatic and military use, cryptography emerged in IT in around the 1970s to protect customer data and really paved the way for business and consumer adoption. And today more advanced public-private key exchange algorithms followed and are now powering most of the online transactions that you do today.

At AWS, security is job zero. And our goal is to give you the most secure cloud environment that we possibly can. Our security building blocks provide a strong foundation for network security. And we will build the largest network for security partners from companies you already know and trust.

I want to give you a little bit of a look behind the scenes of some of the things that we're doing across the AWS network to mitigate threats. Now, our scale gives us broad visibility into threats to cloud security. And we also have knowledge which we can continually then reinvest into our infrastructure.

In fact, every day and every hour we detect and successfully throat cyberattacks. In recent years, the threats have grown more complex as cybercrime has continued to escalate. And today we can give you an inside look at a service that we built internally called Project Mad Pot.

Mad Pot deploys tens of thousands of decoys across our infrastructure to lure threats and gain intelligence. And it actually observes over 100 million daily interactions flagging about 500,000 as malicious, incredible real-time insight into what's happening on the internet on a given day. It then analyzes these patterns and identifies the attacks and automatically blocks identified threats by sharing that information with AWS services such as AWS GuardDuty, WAF, Shield, and it alerts companies when their infrastructure is actually being abused to disrupt the broader attack.

Of course, attackers keep evolving with new techniques but Mad Pot evolves faster. Mad Pot doesn't just gather the intelligence, it disrupts threats to maintain business as usual. These insights also improve preventative services like WAF, Shield, and Network Firewall, as well as the detection within GuardDuty and Security Hub.

And Mad Pot rapidly shares this data with the community too, taking down botnets, command control centers, and DDoS attacks to give you some idea of just how successful Mad Pot has been. In Q1 of 2023 alone, Mad Pot blocked over 1.3 million DDoS attacks, shared intelligence with nearly 1,000 command and control servers with hosting providers to have them taken down, and dismantled 230,000 layer 7 DDoS sources.

Mad Pot enables real-time threat disruption at tremendous scale, keeping AWS, your workloads, and the larger internet a little safer.

Well, no helps with perimeter detection. Modern environments also require protection within the network itself. And this is why we built the AWS Network Firewall, a cloud native service that scales on demand unlike traditional firewalls. And today it's become a key building block for customers when securing their deployments.

And I'd like to share a few of the recent updates that we made to the AWS Network Firewall. First, we added the ability to decrypt and inspect TLS connections, keeping data secure. This lets you monitor inbound threats from the internet and other VPCs while running TLS encryption.

And second, we integrated the Network Firewall with resource tagging to help you implement microsegmentation policies within your environment. And this integration automatically creates rules based on EC2 instances or network interface tags, enabling communication between approved groups while blocking all other traffic.

And finally, we introduced multiple admin support via Firewall Manager to simplify complex policy management tasks, making it easier for you to manage at scale. AthenaHealth deployed the Network Firewall in their environment to over 120 accounts in just five days. And they actually reduced their cost of doing firewall inspection by 95%. Just an incredible outcome.

We also know that some of you want to be able to run firewalls that you may have used on premise. And so with Gateway Load Balancer, we now support you bringing your own firewall to the cloud while not giving up on any of the scalability or availability benefits that AWS offers. It provides the native foundation for Network Firewall as well, Gateway Load Balancer, but also lets you run similar partner appliances that you already trust or use on premise.

And whether you go fully native with our own firewall, bring your own, or use both, our goal is to give you the flexibility and choice when securing your environment. In fact, this is actually the largest segment of the AWS Marketplace today and all these providers are available and many more to leverage for you to leverage the best of breed network solutions from all of these partners.

Another area that has been fast growing in this area is zero trust. And as we all moved to work from home during COVID, we just saw a massive change in, you know, at AWS no longer having to VPN into our network because everything is zero trust. The idea of zero trust is that every application is authenticated, that the user is authenticated rather than having a network that you VPN into where the network is trusted.

And so in a zero trust model, each and every access request to applications, data, resources is evaluated using identity, device posture, and other factors. And these types of controls augment the existing network connectivity security that you may already have.

Last year at re:Invent, we announced AWS Verified Access to move beyond trusting networks and IP addresses and simplify the task of deploying zero trust networks within your organization. And Verified Access evaluates requests against policies. Factors include users, device posture, and context. For example, you can require MFA authentication along with confirmed device compliance before allowing access to your application. No more relying on IP addresses alone and incredibly simple to deploy.

Since we launched, we've also been focusing on a number of new features based on the feedback that we received from customers. Once again, for example, we have new integrations with trusted device partners like Jamf for Windows posture checks in access policies. We also have native integration with AWS SSO offloads edge policy enforcement.

The policy assistant also lets you simulate and troubleshoot policies before deploying, minimizing any sort of errors that you may have in policies. And we use our CDL policy language to provide human readable yet powerful policy definitions. We're very excited about the service and as many of you go on a zero trust journey, we we think that it's going to be an important part of that solution in the zero trust world.

Just like with firewalls, we also know that there's a bunch of partners that you trust and we've been able to bring them to AWS Verified Access. It allows you to use identity providers and endpoint device management services that you already use today. And this means that you don't need to rebuild any of these systems from the ground up. You can just get you up and running a whole lot faster.

Avalon recently deployed zero trust and used AWS Verified Access to quickly get it done and they're providing access to corporate applications again in minutes without using VPNs. So we just love the fact that we're able to simplify this journey for our customers.

Well, today we started our journey with some German printers and telegraphs who showed us how networking is all about the connections that drive the exchange of ideas globally and reliably. We highlighted the impact of load balancing on the modern internet. And then looked back at the evolution of cryptography over the centuries as the foundation for secure communications.

All of these innovations contributed significantly to our modern world. And the network technology has been an essential part of what we have today. And this is why AWS is constantly investing to provide you with a highly scalable infrastructure that you need to continue this rapid pace of innovation.

So what will you build?

Thank you.

  • 9
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值