Coinbase: Building an ultra-low-latency crypto exchange on AWS_building a low latency exchange-CSDN博客

本文链接：https://blog.csdn.net/just2gooo/article/details/134824572

"To f si 309 coinbase building an ultra low latency crypto exchange on aws.

So before we get started, I'll just ask a quick show, hans how many of you have purchased or sold a stock or a cryptocurrency, a website, an app. So obviously almo almost everyone.

So my name is joshua smith. I'm a senior solutions architect here at aws. I'm gonna be joined by kevin arthur and uh yu kong sun from coinbase and we're gonna go into the incredible story of high performance engineering that goes into making that happen and how they built this institutional exchange from the ground up.

So first I'll introduce some of the concepts of exchanges. I'll talk about some of the latency sensitivity levels we're gonna be going through today and some aws services that could be used by customers to reach those latency levels. Then I'll pass it over to kevin from coinbase. He'll go into the cluster design and the flow of operations through the system for quick and reliable orders. Ucon from coinbase will go deeper into running this production and production at scale and achieving sub millisecond latency running on aws, not just latency but security resiliency and deployment methods, then we'll close it with the future of international exchange, low latency optimizations on aws. And lastly some lessons learned from coinbase.

So let's get started.

So quite simply a marketplace uh exchange is a marketplace that facilitates safe and secure transactions between traders. So i mentioned earlier, you buy the stock or cryptocurrency, you're typically going to a portal or an app and this is your broker. The exchange is going to take millions of orders that are submitted by those brokers from different marketplace participants and look for a match between the buy and the sell. Once the exchange finds a match for the broker, the transaction is completed either by exchange or broker and the completed transaction is then returned to you.

So i bring this up because it's important to talk about why latency matters. We want two types of latency. We want low latency, but we also want incredibly predictable latency, low latency ensures that the marketplace participants are able to compete fairly and ensures an efficient market less time between seeing a price and acting on. It means that traders are able to execute quickly and fairly and they're more likely to continue to use in exchange. We want stable latency because traders need to know how long until a trade will execute, how long do they have to cancel the order to get in and get out? They're unlikely to continue using exchange if it's unpredictable and this image illustrates fairness. It's very important that every exchange gets the same low level of latency without any head starts or unfair advantages for specific clients or traders.

So in terms of latency goals, we can really separate this into a few different buckets, we have less latency sensitive applications. And this is typically several 100 milliseconds all the way up to several minutes, 1015 minutes even this is better for reporting applications, some types of risk evaluations, basically anything. It's asynchronous being run daily hourly. And um for this, the time equivalent, um we're looking at is about the time that it takes light to travel from the sun to the earth. That's kind of a silly metric, but it's important because that's the maximum speed of data through fiber optic cables. And i'm gonna come back to that when we go into ultra low latency.

So for low latency applications, we're looking at a couple of 100 milliseconds down to a few milliseconds, typically the retail benchmark for screen trading. So human in loop, you kind of see the trade, you act on it. Um and the time equivalent here is light traveling across the earth. But when we get into ultra low latency, we're talking about sub millisecond latency, triple digit and sometimes even double digit microseconds. This is necessary for successful institutional high frequency trading and this is the time equivalent of light traveling just 100 150 miles, a few a few 100 microseconds.

So for a reference of how fast this is the blink of an eye is about 300 milliseconds. So this is about 1000 times as fast.

So it's not just latency, we also need to address throughput for both steady state and spiky workloads. Though our workload is going to depend on market behavior outside circumstances that we cannot control for reliability. We're all familiar with applications that have 24 7 requirements, uh streaming video, industrial health care, but here those 24 7 operations have to have a specific p 99 target through sustained spiky workloads. We also need agility for easy and fast deployments to keep the exchange up to date constantly. We have to keep improving and adding features without affecting that performance and reliability. And then lastly, as i mentioned earlier with fairness, we can't have unfair market access where one client is faster than another. This could erode trust in the exchange and it could even mean regulatory risk.

So we'll start by going over some of the key a wo services and capabilities that our customers use to optimize latency and design these high performance secure applications in the cloud. And we'll introduce some of the key services that coinbase is going to go deeper into.

So the first thing we talk about latency is actually processing the data, where is the order matching and all these systems happening. We need very high single threaded cpu performance generally. And aws offers a broad range of options. Any instance containing a lower case z indicates our highest level of cpu performance. So the options from there vary depending on what you want to pair it with. So we have networking optimized instances, memory optimized instances, instant store volumes with nvme discs and newer combinations like our x two iezn instances that have memory intensive and networking optimized chips.

So for coinbase use case, coinbase chose to pair cpu performance with la large local instant store volumes, ec2 z one d instances give them a cost effective instance that focuses on cbu frequency, some memory bandwidth and provides up to 1800 gigabytes of nvme storage. This allows them to buffer locally on dis quickly in the processing pipeline and then batch replicate rights asynchronously without affecting a hot path of the application.

So for networking and we talk to financial service customers, we often hear about multicast now transit gateway provides multicast, but we also talked about distances and latency and it's and the transit gateway is gonna hop in our network which we don't want. So for less latency sensitive use cases like those market research tools and things like that, this might be acceptable. But for lower latency, we want to keep as much traffic point to point as possible.

So for lower latency vpc to vpc, we use vpc peering as the lowest latency logical connectivity solution and to go even further into deeper optimizations we really need to start to optimize the operating system, the drivers, the runtime, the application and coinbase is going to go deep into that later today. But one thing i really want to level with here is that the key aspect to improving network performance is to to decrease the physical distance between two points.

So how can we accomplish this in the cloud? So let's say we have a cluster of two instances, a leader, several followers inside an availability. So now one of them has been elected a leader. It's sending out requests to its followers. Now we need this to be extremely low latency. So even though they're in the same availability zone, amazon natively spreads uh workloads across different networking, subsystems server racks and in some cases, even data centers within an availability zone. So maybe we get lucky like with this one here and our leader and our follower, one of our followers ended up on the same server rack or same networking system and they get, they hit our latency target, but that may not be the case for the others. And now we're running into issues. So in terms of cluster management itself, um maybe we're running into consensus problems, maybe we have leader election failures. But if we zoom out to the functionality of the exchange exchange itself, maybe we start to fail to match orders and we impact our customers.

So what we can do is use an aws cluster placement group. This tells e two that we want the instances to be started up close together inside the a z. This packs them together and it enables workloads to achieve our low latency network performance for our highly coupled node to node communication.

So now that we've addressed some of these uh networking aspects, we talk about ingress. Security, aws shield is a managed dd os protection system that can automatically scrub bad traffic at specific layers, protecting against sin floods, udp floods and other types of reflection attacks shield. Advanced adds on top of this enhanced visibility into dd os events and a shield response team that can be contacted for escalation during an attack as well as cost protection for impacted services. But most importantly for our latency story today, automatic mitigations are applied in line to protect a wo services and this means no latency impact this laws protection without affecting the client's connectivity by adding unnecessary latency like something like a firewall would add.

And then lastly, we need a place to store our data. Amazon aurora is a global scale relational database with full my sql and postgres sql compatibility and we mentioned earlier that predictable reliable latency is. So key, amazon aurora allows the separation of compute and storage by using a distributed data layer that spans multiple a zs. And as a result, storage scales independently from compute. So it automatically scales the amount of data without requiring adjustment or replacement of the writer or reader nodes, any latency.

So that said, now you have an introductory overview of a wo services for low latency. I'm gonna turn it over to kevin arthur from coinbase and he's gonna take us into how they were able to leverage these to build the coinbase international exchange.

Hello, everyone. Thanks joshua. I'm gonna give a quick overview of the kinas international exchange. It was launched in may 2023 and operates 24 7. It's regulated by the bermuda monetary authority and has traded more than $15 billion in volume. So far, its core systems are built to handle 100 k messages per second. The trading systems are part of the core functionality of the um exchange as well as the gateways and this is all built on an r messaging framework. The trading systems are ultra low latency and single threaded and deterministic. They are clustered for resilience, meaning they um meaning they uh replicate their state across different uh machines for resiliency. The gateways are connected to via fix rest and web sockets. The arrow messaging framework is low latency udp based and fast and reliable. It offers replication and archiving.

The basic architecture of the system is comprised of client gateways and two types of raft clusters which i'll explain shortly. The order gateways accept orders from clients and the market data. Gateways send out the state of the order book to all participants. The order management system, o ms handles risk and portfolio balance. And the matching engine cluster handles active orders and matching orders to make trades. The os and matching engine operate using raft consensus. A raft cluster consists of a single liter with an even number of followers. All cluster members run the same code on the same input and produce the same output"

The leader sends each message it receives to the followers and waits for a majority of the system to acknowledge all data. That the clusters need comes from the cluster's input and it doesn't make remote calls during message processing. This allows us to reconstruct the state of the system at any point in time on any machine in the system. And we rely on this to reduce the amount of traffic we have to send over the network because raft clusters have to make network hops to ensure the resiliency of the system. It adds latency to an overall request timeline and um can, can lead to a higher uh response time.

The cluster placement groups that we use, minimize this impact and allow the shortest possible network transit for each individual request. We managed to keep our full internal agencies uh below 90 p 99 below one millisecond. I'm gonna walk you through a round trip latency for our system. In order first enters our system through a client gateway like fix or rest and then goes to the uh leader of the order management system. After basic validation, it gets forwarded to the uh the order management system.

The order management system cluster sends its messages to its followers. Once the followers acknowledge the message, the o ms processes the message and performs risk and portfolio checks. After it passes validation, it gets a sense of the matching engine. The matching engine sends a message to its followers who then acknowledge and the matching engine checks order books and finds a match. If possible. The matching engine then sends the results of the matching such as a match or a new resting order back to the oms.

The os gets consensus again and the process processes the results of a matching engine and updates its own state. Finally, the o ms sends the results back to the gateway which sends it back to the client. The full order round trip is lengthened by the cluster process. But the resiliency it adds is critical even with the additional network hop, we're able to achieve sub millisecond latency valency. uh uh outliers are under a millisecond and most of this is dominated by network. uh there are 10 tops in the round trip total and about 80% of that is network. uh the processing time is in a single digit uh microseconds and this includes both the o ms and the training system.

Next, I want to talk about the development experience on aws for a w bs using aws. We're able to provide codified environments for all of our developers, our developers are able to allocate and create new environments based on uh whether testing that day or based on future branches. For example, these stacks can have the same layout as production and are automatically scaled down every night. uh when we don't need them, we have a central orchestrator that controls services and versions and the replicated system uh must be must maintain behavior between systems. In order for the rolling deploy to work when we move from version to version, uh the replicated state machine has to uh keep track of what version it is supposed to be running on. Even as it transitions to the new version.

Uh blue green deployments are used for gateways. They are essential to keep customers connected for as long as possible. While we move uh from gateway to gateway version, I wanna talk a little bit about our ruin deploys now. So as a steady state, our system operates um as you know, a general system. But during deploys, we um third employees, we uh roll each service um by creating new services and then shutting down the old ones after they apply.

Um for the clusters, we actually don't use blue green deploys, we roll them one by one and um we shut down the uh version that's running and bring it up with a new version while the cluster runs, uh it has to maintain the same uh behavior and deterministic fashion between versions. And so this means we have to basically guard behaviors as we move from version to version and then enable the behaviors later our full system deployment. Uh let's like this. And next, i want to hand it off to yon uh who will talk about building a cloud native exchange on native s.

Alright, thank you, kevin for detailed explanation about the design of the international exchange. So next, i want to bring we all through a journey on how do we actually launch uh combat international exchange. Uh first, we talk about the environments that we have to build for, for the combat international exchange that we talk about the concept of cloud native design and why the fundamental success factor in the success of the comba international change. Then we will talk about the production environment in specific because that's obviously a lot of people are interested in what the production look like and then we will touch on data pipeline a little bit.

So when we first thinking about developing a com based international exchange, this is usually what pops into mind. Obviously, you're gonna have a dev environment, you're gonna have your production environment. The the the dev environment obviously runs untested software for the production environment, runs tested software. Somehow you can make the software bug free and get into the production successfully. But the first issue we run into is actually um a very interesting one after co con switch all the way to uh all remote depth session and coordinating between all the different developers is very hard. So we have to actually create a personal depth environment for every single developer so that they can work on their future independently without worrying about integration, without worrying about like interference between each other.

So there you have it a personal environment that's easy to create, easy to destroy and use for the data development. Next, you have your typical uh what we call uat environment. Uh it's integration testing that basically runs nightly testing, making sure that all the features from everybody developer can, can work together successfully. And obviously another thing to test here will be forward and backward compatibility. Like kevin said, we only ever have one production environment. If we screw up there, nothing will save us. People are going to lose money. A lot of things will uh bad things will happen. So we have to make sure that everything we deploy into production has perfect uh forward and backward compatibility. And then we also test our rolling upgrade mechanism, all kind of stuff over there.

Uh the next one will be sandbox. So sandbox is a concept in a lot of financial services that basically it's public accessible because uh for a marketplace to function, not only your service, i need to have access uh feature implemented. Often your market makers, your application, your client need to actually use those features extensively beforehand to make sure they can actually put real money on top of it, right. So that's why we have to build a sandbox environment that publicly accessible but also has the latest software features and obviously trade with paper money.

Finally, we will land into the production environment, which is where all your money, all you like an important feature need to be there. You have to use the best machine, best configuration and it need to be highly secure. And it's, it's like a unicorn environment that you have to maintain and be very careful operating. That's why we we we realize not only you need a production environment. What do you really need is another kind of production environment which is not real production, but it's what we call shadow production.

So this production environment would have the uh a newer version of the software running in the background, but it has the same input from the production environment. So basically you mirror your traffic into this new new environment first and verify all the new behaviors works fine correctly over there. Observe for a while before you actually switch over. So uh so you don't get, you don't see this, talked about, talked about a lot, but it's a real mechanism to sort of safeguard the production environment finally.

So to sum it up creating international exchange, a combi international exchange is not just two environments, it's more like 20 environments. So sooner or later you're going to run into complexity explosion issue. Meaning all those environments run on different accounts, different hardware, different region. How are you going to manage them? All? That's why we have to, we have to say, pick the co design software design principle. Um you know, from the beginning, it's really the key success factor of the success of the combat international change.

So let me explain how we apply the native design principle into our specific scenario. We basically break down our our software into three layers, what we call application layer ration layer and the infrastructure layer. So if i put it into a graph, this is what we're going to see on the right side, it's a little bit complex. So i will i will walk you through um from bottom from up.

So first, we're going to look at the infrastructure layer, obviously, the infrastructure layer worries about actually provisioning aws resources, right. So the key thing here, i think most people are already familiar with you want a fast repeatable provisioning, not only for provisioning but for the provisioning as well. And also you want to be able to basically have a template that apply to different accounts, different regions, different hardware. This is where you um you use a technique like infrastructure as code and basically to be able to on demand for visioning stuff. But also remember i talk about different environment has different requirements. That's where you put different features into your template where they only use it when they actually matters.

Moving up a layer after you have the infrastructure layer, what you're going to do is you basically abstract your compute resource or your database or all your aws resource into compute pools. So having this layer allows us to have a single orchestrator that basically oversee the scheduling and the uh you know, the run time of the actual application onto different compute pools. So what this means is instead of having the application developer to worry about, oh i need to run in this account, i need to run in this version of the infrastructure. They just pick a pool name and the orchestrator basically hides all of that behind the scene to actually pick which machine you're actually going to run. Uh you know which network you're going to be in.

So that's one side to the application side downward to the infrastructure side. Also, meaning the infrastructure present as a uniform interface to the application that basically instead of wiring in the hard coding ip s use a mechanism like a service discovery, you use other mechanism to basically model your operations. And we're going to see that in a little bit as well.

So finally, we're moving on to the top of the stack, which is the application stack uh in the in the modern day of designing application, you want your application to be composable. What that means is. Uh you should be able to test your business logic without relying on whole whole infrastructure being set up. Meaning if you need some piece of data, that data can either be on your own machine, can be on a database or can be like abstract away into some other uh you know api interface.

So so basically having this uh this abstraction will make sure your application uh business logic, stay, stay, meaning you are able to run multiple copies if you want to horizontal scale. But also means you basically is able to test your business logic without waiting for all the other piece come together.

Ok. So coming back to the cloud native design software design, i think the usual journey for for a company to apply the cloud native design are usually from bottom up, right? So that's also the the the the order i walk you guys through.

"But in our case, I think we have the benefit of burning the rope from both ends. Basically, our application because we're built on the AWS platform is enhanced, modular, and actually forces us to pick a modular design all the way to the infrastructure. Also, because of the specialty of our environment, we use a lot of really expensive features and you obviously cannot use them for every single development. So it forces us to basically think of this problem from the beginning to the end.

So again, I think having this cloud native software design principle applied is really critical to the success of CME.

Okay, so now switching gears a little bit, I know you guys are probably all curious about what it actually takes to run a high profile, high stake environment such as the CME production environment.

Often in designing a production environment, it feels like threading a fine line into a needle. You have to make tradeoffs along the way. It often feels like you know, the panel here shows that you're going to have to make tradeoffs between different aspects.

One principle I always hold to my heart is that if you lose something in one aspect, you have to gain it back in another. So today I'm going to talk about three different aspects - performance, reliability and security - and see how we make the tradeoffs between them.

First, we look at performance. I'll talk about capacity planning, network setup, and OS tuning. The reason I bring up capacity planning is that often to have a high performance system requires you to use a specialized setup. But having that specialized setup also means it has implications for a bunch of applications.

For example, as Kevin said earlier, our processing time is heavily dominated by the networking latency. So in order to have a high performance environment, you're gonna have to deploy all your services in a single AZ in the cluster placement group. There's no way around it.

But like I said, if you lose something there, you have to gain it back in a different way. That also means the cluster placement group is not free - you cannot just use that and forget all about the network latency.

First, it must be able to fit all your hot path components. So like Kevin explained, the round trip from the order entry all the way to the engine and back involves 10 network hops. So all those components have to live together inside one cluster placement group and need to be available whenever you deploy. That's the first thing.

Second, if you need to do a blue/green deployment, that basically means you need to double your capacity requirements. And then during the deployment, you're going to need to have like 2x the capacity available so you can actually deploy the new software, wait for it to be healthy, and then switch over. Even for cluster components, if you do a rolling upgrade, that just means you need to have N spares available for you to upgrade.

So all of this plays together into the cluster placement group, into the hardware region availability. So here it's really an NP hard problem to solve. We cannot do it without a lot of help from the AWS team. That's also informed our decision to use z1d instead of other instance families - they may not be in the region that we needed and may not have the availability that we requested.

But having z1d also brings the question - do you have to worry about NUMA, do you have to worry about affinity? So all of that fits together. That's what actually took to inform our plan to pick the location that we picked, to design the whole system around it.

One callout I want to have here is AWS has a really good feature called Capacity Reservations. A lot of people don't really use it but in our case, that's critical to our success to be able to actually deploy the system on a regular basis. We don't want to create a production environment that takes years to upgrade, so we're able to do it very frequently.

Okay, so we talked about compute. Now let's focus on the network architecture a little bit. On the bottom here you're gonna see a diagram showing how our production VPC is designed.

I'll walk you through - on the left side is the internet, that's where users send in traffic through the public load balancer. One thing to call out is our specific design is for our production VPC to be isolated from other services because you want to, in order to improve the security posture of your service, you have to control access in and out.

So we run our hot trading path in the cluster placement group, but we have so many other services that are not sensitive but they still need to live in the production VPC. So there we use, we basically run them outside of the cluster placement group.

And one of the design features from working with the top notch security team at CME is we're aiming for no internet access by default in our production VPC. There's a lot of benefits if you don't allow any internet access - it's going to be very hard for somebody to exfiltrate data out of your system even if it was under attack.

So the best way to achieve that is we basically route all egress through a firewall VPC. But also that firewall will introduce latency. So you gain some there but lose some.

Another good AWS feature in order to have no default internet access is you have to use AWS service endpoints, and then use AWS PrivateLink to talk to services in other VPCs. So all in all, I think running on AWS allows us to have access to all these awesome features so that we can build our system into a secure design, on par with what we can get in a data center design.

This is really good - we're gonna come back to this diagram to talk about the security features we put in here, but this is the general concept of our production network.

Next, I'll touch a little bit on hardware tuning. If you watch this presentation and think "Oh I can do sub 1ms too" - you can, but there's a lot of things involved to actually get to that stage even if you manage to reduce the network latency.

First of all, running on z1d - the reason we picked z1d is it's not the shiniest instance, but it has nice features. We don't have to worry about NUMA, meaning you can actually use all the cores without worrying about differences between cores.

You do have to turn off hyperthreading, make sure you have more L2 cache available. Then one of the other features is local NVMe disks. Like Kevin explained before, our system is designed as a Raft cluster that has to write to disk to make sure the message was persisted before processing it. So you can imagine without fast disks, everything will grind to a halt.

Besides that, another important part of tuning the production system is - I just have this slide to show you that basically anything you can find on the internet about how to tweak Linux for low latency, you can still do it on EC2.

The concept is simple - try to isolate your hot threads from your cold threads. By doing all of this you really have to do all that to get the latency low enough - it improves latency by orders of magnitude in p99 or more. So it's still really necessary.

It's not just free - you take z1d and that's it, low latency. You have to take this into account in the whole software design, that basically needs a way to identify your hot threads and something in the background to move threads off core.

So we really put a lot of work into this to be able to achieve the latency numbers.

Okay, so we saw the performance aspect, but now let's talk about reliability.

I know when we talk about single AZ placement groups, a lot of bells start ringing, at least for people like me! You lose some reliability guarantee, so you have to gain some back.

I want to talk about the things we think are most critical for us. The first one is we choose to build a real-time monitoring and alerting pipeline. Instead of exposing metrics to some third party script, we found it's really critical to have our services directly push to a central location in our software stack to concentrate all the data together.

On one hand our system has very few nodes. On the other hand, a lot can happen in a second or minute - by the time your monitoring system realizes there's a problem, it's too late. You typically cannot build a control loop based on non-real-time monitoring data.

But if you don't do this, then the typical solution is "Oh you have to wait" when doing a blue/green deployment. You add a wait here, a wait there, and sooner or later you end up with a 4 hour deployment. And that's not good for anybody.

So I think real-time monitoring and alerting is critical. The concept is simple, there are lots of off-the-shelf solutions, but this is pretty critical for our success.

Second, like Kevin mentioned, we decided to adopt an Infrastructure-as-Code concept from the beginning. On the right side I have a small graph showing what that really means - it means that when a developer says they're going to upgrade production, you don't just say yes or no. You actually have to see what they are changing on the production system.

In this case, we have a job called MockTraders that is a load generator in dev. So the developer decides to add another environment, and you can review this when reviewing the IaC they submitted. This allows the people involved to understand whether the operation is safe or not, and make the right decision. It gives you auditability.

Another aspect of IaC is you can enforce best practices, what we call SOPs. We have the SOP enforced at the cluster level - every time you do a cluster upgrade, you have to follow the steps, one line at a time. It's always happened this way, no way around it.

When you build a system that supports blue/green deployments, you get the graph in the middle that's automatically enforced every time. There's no patching, no hard-coding, you have to do it this way.

The real trick is to make it as painless as possible so developers don't feel it's a burden, otherwise they'll find a way around it.

As I said, you lose some in one aspect, you have to gain it back in another."

So you were, you were probably thinking, oh single AZ, how are you going to fail over? You're going to lose data? Yes, that's the fact that that's effective business risk that we decided to do, decide to take on because in order to have a like this, this best set up.

So our disaster recovery uh is very simple. We have to stay in one AZ, but we don't need to have single point of failure in that AZ. So the AZ failure um is different than if you have like a single machine outage.

So first, we need to cross that off the list, we need to make sure our system doesn't have a single point of failure.

Second is uh you know, we designed a replication strategy that replicate between AZ one and AZ two. Um the reason I say it's really special is because um we instead of replicating the, the egress of the system, that's usually what we do that basically, meaning replicating the database, replicating the logs, we actually only replicate the ingress of the, the the system. Like Kevin said, our whole trading system is the term. So as long as you have the same input, you will have the same output, you should have to rerun that. Um you know, in a separate process, if you decide to fail over.

Um so we have a typical setup which region A has two AZ you replicate in between them and then you have like an offset backup that basically um you know, has a, has a different replication strain. So yes. So um Amazon Aurora gave us some help around that factor that because it's global uh at least availability globally available within the AZ. So we don't have to worry about too much about data inside the database. But for the rest of the system, we only do the replication. Ok. So uh performance and reliability, we gain some there, but we also definitely lose, lose some over there as well.

Um so for, for security, I just want to talk about um um because we adopted operation as code, all the privileged access to production system is done automatically using some uh services, not human. So you don't really need human day to day operation privilege, privilege access into the system. What do you really need is a escape hatch. Uh meaning when you do really need to log into the server to debug something to fix something on the spot, that's your safe emergency access mechanism.

So in order to set this up, we we have to build on top of AWS existing SSOs. So infrastructure that basically does the um so whenever you try to access to a machine and send a request to the SSO um component, this component not only enforce you to have MFA it enforce you to have simultaneous to MFA available. Meaning if, if one of our MFA provider is compromised, you're still not able to get production access to the system. Uh so that's a really good design.

Second is you, that's typically you, you get a science certificate back which only have a limited period of exploration and use that to jump to through a bastion host to jump to your production system. So again, we, we, we were able to only have emergency access protocol just because we have the operation as code as a feature implemented in the system.

Secondly, this SSO also can be used to enforce all our AWS console, our our own management portals and it's really a good pattern to have. So uh coming back to the to the network architecture.

So we, we gained a few things when we designed the production network that doesn't have internet access by default. But we um you know, in order to have visibility observ into that system, um we um we basically have to use a few tricks that AWS was able to, to offer us first uh for like DDoS scenario or like that denial, denial of services scenarios. We rely on a double shield, not only a double shield provide us like over the internet, it also covers like scenarios where the traffic originated in the region.

Uh secondly, um you know, on our VPC, we were able to set up like observability directly on the VPC to see what traffic actually goes out. And, you know, there's a whole whole pipeline that basically automatically alerts us if somebody is sending a lot of data out. So, so, so there's that uh when trading system talking to auxiliary services, we enforce AWS IAM policies.

So that only the trading system is able to talk to the, you know, for example, if you want to transfer money, uh only one piece of the system can, can request that can request the transfer only, only one another piece of the system can actually execute that transfer. So all the privileges separated into different components, you never have like a single root system that can do it all. So you can never have one single point of failure.

Uh when, when you actually talking over VPC to another service, um you know, we, we have to use mTLS, we have used the service authentications between our training system and the rest of the other services that we have.

Um so overall, I think the fundamental goal here is to make sure the production environment is really secure, secured by default. It doesn't rely on the security of the software design. It adds another layer to prevent disastrous scenarios such as you know, data exfiltration. Ok?

So finally, I want to talk uh talk a little bit about the data pipeline. So, so running the exchange at high speed, generate a lot of data. And uh in the in the typical data center scenarios, it's really hard to deal with because, you know, you don't get access to amazing services such as Amazon Aurora that can scale up to hundreds of terabytes uh without you having to worry about.

So in exchange, there's two type of data that usually was generated. The first one, what we call needle in the haystack type of data. This is where you a lot of events was generated by the system and you need to quick, quickly look it up. Um you know, the the user want to know, oh, what's my, what's the order status of what the status of my last order you want to quickly list? Oh, what is the history of my orders? So those type of look up, you know, you need the index and what's what's better than the PostgreSQL database that can scale almost indefinitely, right?

So that's Amazon Aurora. Uh it's, it's surprising good for that. So another type of data that's generated in the system. This is like if you go to, you know, if you open your Google Finance, uh you open any finance application, it will show you uh you know, a candle candle graph that basically shows you how many trades in the last last hour and you know what's open, what's the high or the low, those type of data, we usually call the streaming analysis meaning the data need to arrive at the aggregator in within a certain time of period.

And the aggregator is just going to keep producing this type of analysis regardless of whether the data is actually there or not or you know. So, so this type of scenarios, we have to build a different type of pipeline relies on Amazon, MSK that basically all our system staging our data into the MSK. And we have something else that basically pull from the MSK consume that basically read into our own time series database and then do the computation in real time.

So this this data is available like, you know, almost milliseconds or like even less after it's being generated. And all of that is archived into the S3. So we never have to worry about losing the data. And this is presented as the as the real time dashboard that you're going to see.

Ok. So i think uh I will invite Kevin and Joshua back on the stage and we'll talk a little bit about what's next for Coinbase International Exchange.

Go ahead and get this one. Thanks, Ken. So we talked a lot today about uh latency within the application and between the client gateway and the internal systems. But what about client ingress through the load balancer to the gateway?

So Coinbase today is using connectivity over the internet into a network load balancer which is a common pattern, but some of our customers have specifically asked about clients that exist within AWS and are looking to optimize for that.

So now if we shift things a bit, our clients are inside the AWS cloud. So we have several different configurations for increasing the uh reliability of the latency in this situation. And we covered two of them today, Amazon Private Link and uh shared cluster placement groups.

So with Private Link, clients traverse the AWS backbone instead of the public internet, reducing jitter and creating a more reliable latency instead of traversing the internet. But for even lower latency cluster placement groups now have the ability to share across different accounts and customers, the trading firm or client can coexist inside the same cluster placement group as the gateway and they peer their VPC to the exchange VPC. And this drastically reduces the geographic distance between the two. It makes the latency lower and much more predictable.

For future improvements, we're looking at uh various different uh things that we can do to make the networking faster and reduce latency. For example, we're looking at kernel bypass which Aeron natively supports. And we're looking at enhanced network adapters like Elastic Network Adapter, uh Express and Elastic Fabric Adapter.

We're also looking, we're also looking at uh cloud native services like AWS CDK, uh the Amazon managed service for Prometheus Grafana and Amazon Timestream.

Cool. Uh I just want to uh take the final part of uh uh of the talk to do a little bit of reflection. Um when I look back, I think uh when i, when I'm looking back for the, for the year that we had, I think we couldn't really do it without really powerful collaboration between the Coinbase uh team and the AWS support team that we have.

As you can see in our talk, we talk about uh you know, uh helping us to learn about the different features that we can use as well as all the best practice that we can do on the cloud. So in a, in a sense, Coinbase International Exchange is born on the on the AWS cloud and it's going to survive on the AWS cloud.

Secondly, the cloud native design, I think really rings a bell for me because uh designing our software for cloud native from the beginning really gave us ability to separation of concerns. The application developers doesn't really have to worry too much about the infrastructure. The infrastructure engineers doesn't need to be the limiting factor on what the software engineers can do.

Having the ability to do operation as code allowed us to make a safe in place upgrade to the production environment regularly instead of doing it. Once, you know, once a couple of months, we're doing it regularly and having the ability to do like repeatable provisioning the provisioning. Really, um you know, make us make, allow us to basically spend money at the critical places and allow us to basically um you know, be able to survive as a new business.

Um all in all i think developing on the AWS cloud really gave us a huge benefit around the developer velocity. Um having unlimited personal environment, it's like a dream come true. Um and uh having shadow production, I think it is really the key success for being able to test a lot of features underneath the um you know, the the big, the big market and without disrupting.

So I just want to take this uh this time to say appreciation to the uh you know, AWS team to invite us for the talk and I hope this was useful for you guys. Thank you.

So thank you so much, everyone. Uh that's our talk. And for all of you, um we are extremely data driven company and your feedback could not mean more if you could please fill out the session survey in the mobile app and our social information is up there. If you should contact us, we'll be around. If there are any additional questions, we'll be kind of hanging around in the lounge.

Um but yeah, thank you for your time. Everyone. Enjoy the rest of re:Invent!