Hyperscaling databases on Amazon Aurora

最新推荐文章于 2024-10-07 08:59:05 发布

李白的朋友王维

最新推荐文章于 2024-10-07 08:59:05 发布

阅读量81

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134810791

版权

All right, good morning and welcome to Wednesday Re:Invent. Hope everybody's having a good time this week so far.

So how many of you work for very fast growing companies rapidly scaling or for those of you who don't yet? How many of you are thinking about maybe launching that next start-up app that's going to take the world by storm. And you're worried about how you're going to grow, right?

So that's what we want to talk about here today - how you handle some of those challenges of hyperscale with very rapid growth. When you have a successful high growth application, your application traffic and your database load can double month over month, sometimes even week, over week. And you struggle just to keep up with that.

Sometimes the application tier is a little easier to scale, but people struggle with how do they scale their database? The team of database engineers that I lead with the Aurora service team has worked with hundreds of AWS customers to help them with this exact problem. How do you scale your database and how do you handle this rapid growth?

I want to share some of the techniques that our customers have used to successfully handle this so that you can take those techniques back to your business and apply them for you.

We're going to talk about just handling rapid growth initially. When you first launch, how do you deal with that sort of first surge of traffic that you get from your customers? Optimizing your solution, optimizing your database workload, dealing with microservices and scaling out your database to work with your microservice architecture and then scaling out with sharding and both building it yourself. And then the Amazon Aurora Serverless database that we announced this week.

So just to set the stage, Amazon Aurora is a purpose-built cloud database relational database built for the cloud. It's fully managed service, highly secure designed with high availability and durability and great performance and scalability.

Today, we're going to focus mostly on the scalability aspect of it. So Aurora is a great choice for a relational database for a hyperscale need primarily because of the architecture.

So just a quick bit about the Aurora architecture, the key thing we've done is separate the compute layer from the storage layer so you can scale them independently. The Aurora distributed storage layer stores six copies of your data across three availability zones for great durability. It also makes it easy to scale.

Aurora scales in 10 gigabyte segments up to a maximum size of 128 terabytes. Your Aurora cluster has a primary writer instance that you can scale from a very small size up to a large size and processes all of your write transactions in the in your cluster.

You can also scale out with additional Aurora read replicas up to 15 read replicas in your cluster that allow you to scale out your read traffic for your application.

When we talk about hyperscale, some of the use cases that we see with our customers. The primary ones are in industries like rapid growth apps such as gaming, messaging, social media in these situations. If you find a fit with your customers, if customers get excited about your apps, you can see millions of customers adopt your app in just a few weeks and dealing with that growth is pretty challenging.

We also see digital businesses use Aurora. So businesses such as delivery services, e-commerce, payment processing, these companies can serve tens to hundreds of millions of customers around the globe and they need to have database platforms that can scale with them.

We also see software as a service applications. So customers who are either building new native applications on the cloud or traditional software vendors who are migrating their on premise solutions into cloud based managed services for their customers. These applications again scale to serve millions of customers and they're mission critical applications. So they need high availability and they need to continue to scale and drive great performance.

So today for the purpose of this session, we're going to just have everybody here build a brand new ecommerce app because we're here talking about databases.

We're going to start with the data model. You're going to talk about your or we're going to think about your items, your products that you sell. You're going to have lots of details about those the customers who buy your products. And of course, you want your customers to have lots and lots of orders and you need details about each of your customer where they live. Their address is where you ship to et cetera.

Now, your engineering team is using modern techniques. They're going to build a microservice based architecture and they've identified three core services to start with the profile service item service and order service. So fairly simple to start off with for your ecommerce app, you decide to deploy an AWS, first of all, because it's a great cloud service. But second of all, because it's easy to scale and easy to manage.

You're choosing to use containers for your application tier and you're going to deploy on one of our container services and you pick Amazon Aurora as your database back end.

So the great news is after a couple of months of hard work, you're ready to go, you're launching your app and very exciting users are actually signing up. Your usage is starting to go up and you're getting orders. So this is a great thing. It's also a little bit of a scary situation because your utilization is growing and you need to deal with it, you need to handle it.

So what do you do? So the first step is scaling your application and this is relatively simple in AWS and with containers, you can establish autos rules, you can have your containers automatically scale and you can serve additional traffic. Start off adding elastic load balancing to make sure that your traffic is evenly distributed across your containers. And that from an availability standpoint. If individual containers go down, your users don't notice anything.

So it's fairly simple on the database side. The first thing that you probably do is just scale up. When you first built your application, you picked a small Aurora instance to start with something like a t4g medium. It's pretty cost-effective. It's simple to get started with, but you can very easily scale up your Aurora instances to some of the larger sizes we support.

So think instances like the r7g 16xlarge, the x2g 16xlarge and the r6i 32xlarge. These instances support up to a terabyte of RAM. So you can fit a very large data set in memory. One of the keys for performance with a relational database is keeping your working set in memory. Anything that you're doing in memory for processing is going to be a lot faster than if you're reading data off of disk.

So simply scaling up getting more memory and more CPU to go along with it to process more queries is the first step to scaling and handling this additional load. In addition to just having more memory, you also are able to drive with those additional cores, additional network bandwidth, you're able to drive high write and read throughput across your application.

So scaling up is the first step. The second thing you want to think about as your application grows is high availability. So your initial Aurora cluster has a single instance, you can add a second instance an Aurora reader in a second availability zone. So that in case of a failure, in case of any problem, your application will continue to work using the second instance with Amazon Aurora fail over times are typically under 30 seconds to minimize the downtime for your application.

In addition to just being used for fo that read replica can serve read traffic. So when you first created your application, you used the Aurora cluster endpoint which points to your read node and serves read. And as well as all of your write queries. When you add readers, the Aurora has a second endpoint, the reader endpoint which points to the read replicas.

So you can have your application open up a second set of database connections, pointed to the reader endpoint which will go against the read replicas in your cluster. So now you can start offloading some of your read queries enable your database to continue scaling now as your app grows and as you have new containers and more containers grow, your application is opening more and more connections to the database.

The overhead of opening thousands of connections to your database can actually impact the performance of your database and slow it down. So you're going to want to add a connection pooling tier to your application to handle these additional connections. The connection pool will multiplex application connection many or thousands of connections onto a smaller number of connections to the database. And this connection pooling tier will help you drive higher throughput for your database rather than the overhead of those additional connections.

Some customers choose to manage their own connection pooling tier using products such as ProxySQL for MySQL or PgBouncer or PgC for PostgreSQL. If you choose to manage your own connection tier, make sure that you architect it for high availability as well as scalability.

In addition to the products that you can choose to use, we offer a fully-managed connection pooling service called RDS Proxy, RDS Proxy, worked with both Aurora MySQL and Aurora PostgreSQL as well as the RDS engines. And it automatically scales up and down based on your database workload RDS Proxy scales across three availability zones for high availability and it enables you to scale your database to larger levels.

In addition to the scalability side RDS Proxy improves the fail over time. So if your database has a problem and fails over from the writer instance to one of the replicas your connections that are going through RDS Proxy, your idle connections will not be dropped. And your application can see downtime as few as just a couple of seconds.

So RDS Proxy has a lot of benefits to you as you're scaling your application. Now, in addition, as your app continues to grow, you're probably going to want to add more read replicas to scale out your read traffic. You can add up to 15 read replicas in your Aurora cluster across three availability zones for more availability as well as read scaling.

If you're using a connection cooling layer and RDS Proxy, for example, it will automatically scale your connections across the replicas and automatically load balance your queries across the replicas. So this helps you with your scaling.

All right. So this is the first step your application launched and you're scaling for rapid growth. Typically, what customers would do, scale up a reader instances, manage their connections and this gets you through those first couple of chaotic weeks or months, gets you to the point where you're excited about your application growing.

The next step is to think about optimizing, how do you optimize your database? So there are a couple of areas to look at and think about here.

The first thing is database errors. So as your app is growing, you're adding new features and your workload is growing very rapidly although we probably all test as much as we can. We don't find all the errors in our test environment, especially when you're under high load, you're going to see interesting conditions that you didn't expect. You're going to find errors, you're going to find locking, you're going to find dead locking and you need to fix that in the database to drive higher throughput.

The second thing to think from optimization is tuning. So tuning your parameters, tuning your indexes, tuning your database queries, and then you also want to optimize cost. So it's easy enough to just scale up, add more hardware, et cetera that you want to be able to optimize your cost so that you're growing a business and not just a database, the customers that we work with that go through these tuning steps regularly, see 10 or 1000 times better performance.

You'd be amazed at the number of times a fast growing application team has said they have to move off of a relational database that doesn't work

They go through a tuning exercise and a couple of weeks later, they're processing 1000 times faster. When you get that level of efficiency or those levels of improvement of efficiency, you can either continue to scale if your app is growing or you can take some of that efficiency and optimize your costs with it, which is very nice as you're growing.

So, RDS and Aurora offer a number of tools to help you with this optimization, finding errors and fixing them. The first one is Performance Insights. Performance Insights is one of the best tools to help you understand your database. It shows you the average active sessions across your database, lets you view weeks and months of data. So you can see the trend of how your database load is growing, whether it's growing over time or which areas which types of queries are increasing.

In addition to the database load, it will show you the weight events. These are the things that your queries are actually waiting for inside of the database. So you can understand whether the bottleneck in your system is because of disc reads queries that are not efficient or whether you have maybe locking and contention inside your fema.

So using this tool, you can identify which parts of your database and which parts of your queries you need to optimize, you can drill into individual minutes. So you can look at sort of second by second load changes and see what's varying as well as look at the trend over time.

With Performance Insights, you can see the top sequel statements. So you can identify which parts of your application are driving the biggest workload. And you can look at the metrics at a per sequel level over time. You can configure which metrics you want to see such as the average latency of a query, the amount of disres that that query is doing over time or the how often that query is called and you can compare different time ranges so that you can see whether in a particular time when you had a problem, certain queries were running more frequently.

Performance Insights also has an integrated CloudWatch metrics dashboard. So you can see the overall system performance, the overall health of your database along with the individuals sort of per sequel performance. This way, you can look at both the big picture of how your database is performing as well as drill down to the low level of how individual queries are behaving.

So last year we introduced Dev Ops Guru for RDS and this is a machine learning based system that provides automatic analysis across your workload. Dev Ops Guru identifies anomalies in your databases proactively. And so you can actually see these anomalies before your customers do. You can see them before they start causing real impact and you can go in and resolve these operational issues before they resolve in downtown.

When you drill into one of the anomalies through Dev Ops Guru, you get an insight report, this shows you the context of the anomaly. So when did it happen? Which databases were involved? How long did the anomaly last? It shows you details about the anomaly such as which weight events were the largest contributors. What was your system actually struggling with at that point in time and which queries were involved?

Dev Ops Guru also provides automated analysis and recommendations. So it identifies what was the type of problem you were having and it provides recommendations for how should you solve that problem. This can be very helpful, especially for a team that doesn't have experienced database administrators as part of it. If your software developers are great at writing application code, but sort of struggling with how to manage their database Dev Ups Guru can be a great way to point them at the way at the where the problems are and give them easy ways to start fixing their problems.

Dev Ops Guru is integrated with the RDS console and can provide alerts across your entire fleet of databases. So as your application is growing, as you add new databases to your fleet, you want to be able to find anomalies, find problems across the entire fleet. With Dev Ops Guru, you can be alerted ahead of time so that you can find problems wherever they are doing, the type of work that an experienced db would do if they had time to analyze all of your systems.

In addition to the reactive insights that I was just talking about finding those anomalies as they're occurring. De sbu also provides a set of proactive insights. These are best practices and guidance that we've developed over time working with hundreds of thousands of RDS customers. These proactive insights alert you to problems that you may experience as your database continues to grow. Things like primary keys that are overflowing tables that are growing too large indexes that might need maintenance operations, et cetera. Again, it's things that an experienced db could do if they had time to analyze every one of your databases and, and spend time optimizing them dev ups guru does this before you automatically in the background and presents an easy set of recommendations for you to follow.

In addition, you can configure all of those notifications to be delivered either through CloudWatch alarms, SNS notifications or route them through EventBridge to whatever system you're using for your alerts and alarms.

All right, the next area that's really important is cost management. So lots of customers are excited about scaling and they're excited about being able to handle more users, more traffic, more database queries, but then they get a little bit scared as their costs grow up. So optimizing your cost is important for your database.

One of the things that customers notice is that they have variable and unpredictable workloads. So your customers are coming maybe shopping at night, maybe shopping on their lunch break, maybe they shop more on the weekends. You have variable workloads over time.

Aurora has two major technologies that help make it easy for you to manage these types of variable workloads. The first one is Aurora Serverless, Peter D'O Santos talked about this in his keynote Monday night and it's a particular technology Aurora developed to allow us to automatically and instantly and on-demand scale database instances without restarting them.

So you can configure your Aurora Serverless instance for how small you want to let it go and what the maximum size, how large you want to let it go and it will automatically scale up and down based on your database workload. As new queries come in as new database connections are created, it will scale up as your as your workload drops off, it will scale back down. It gives you a true pay per use pay for what you use capability with a database, which is pretty unique for databases. It's also completely automated and instant scaling. So you're not worried about your application pausing or having downtime as you try to scale up.

The second feature that Aurora has, that makes it easy to manage. your scaling is Reader Auto Scaling. So you can add up to 15 replicas to your Aurora cluster. And that's great. Makes it simple. But you can also configure Aurora to use Auto Scaling to add and remove reader instances based on your workload.

So as your workload increases, as more traffic comes in Aurora will automatically add reader instances to your cluster. As that workload drops off, it will remove those reader instances to optimize your cost for you. This can be great if you do have that very cyclical type of workload that changes over days or hours.

So those are ways to optimize your workload and make it easier for you to manage some of your cost as your application is continuing to grow.

One of the things that many of our customers like to do is actually start splitting out their database to be more in sync with their microservice architecture. This is both for agility and flexibility. You want your development teams to be able to build and deploy features as quickly as they can. And it also helps you from a scaling standpoint so you can scale your database capability capacity across different services.

So a typical shared database pattern, you'd have multiple microservices pointing through a single end point to a single monolithic database. The first problem with this is just agility. Your engineering teams have to coordinate with each other. So an individual development team wants to release a new feature. They need to deploy a scheme of change. They have to coordinate with the other engineering teams to make sure their change isn't going to break one of the other services. And also just the act of deploying it can cause change management. You have to plan for that and coordinate across services. So you lose some of your agility.

Second is competition for resources. When each of your services is pointing to the same database, they're competing for cpu memory, network bandwidth, et cetera. If a single service has a spike, maybe they're running a backdrop or they're backfilling some data as part of a data change that spike in usage from one service can impact your other services that are running on the same database and then the scope of failure.

So when all of your services are pointing to the same monolithic database, any failure that you have is going to impact your entire application, potentially all of your customers, it might be as simple as a bug. One of the services deployed a new change and they didn't optimize one of their queries or it could be some other issue that causes a problem in the database. When you have that single database, you impact everybody and all of your customers are impacted.

So we want to move away from that monolithic database and we want to start breaking up the database into different services. The first step to do that is to identify which service should own each of the tables. In our ecommerce example, your item service is going to own your products and your items, your profile service will own the information about customers and your order service will own all of the information about works.

We also need to refactor the data model a little bit. So we don't want to have a direct link between our order and item table. We want to store the details about the item in a new table, order details. And we want to separate the link between customers and orders the direct form key between customer and orders.

Now this is good from a business standpoint, for example, by storing the item details as part of the order. If your item description or your item price changes over time, you don't have to worry about the customer seeing an old version because you're linking back to the item table. But it also means that your services can be very specific and only touch the tables that they're supposed to be touching.

So as you do this, you may also have to refactor your service boundaries. So we may introduce a new composite service that runs the workflow of some of the application logic. For example, when you're, when you're creating a new order, you need to read some of the data about a customer, you'll do that by calling the profile service and then you actually have to create the order. So rather than having the order service directly touch those customer tables, introduce the new composite service that uses the two underlying data services to do that.

Ok? Now that we've refactored some of the services to separate the usage on tables, we want to actually separate the database end points. So to do this in production, you want to introduce new end points that your services can talk to. So you can create a new RDS proxy that is specific to one of the services. In this example, the pro professor service that has a different endpoint. So that service is talking to a different url, a different connection string, but it's still talking to the same shared database and you do this for your other services as well. And now you can actually audit the usage of tables by services.

So this reduces your risk. Because if you did all your refactoring, you thought you analyzed, you thought you modified your code because you missed things. They're not going to break in production. You're just going to be going through a different end point and still go into your original database and you can audit and find those problems and then go back refactor your code again, update the services until eventually you've got everything completely correct in your application.

Then you can actually start separating the database. So to separate the database, you can create a new Aurora cluster per service and you can do this by using the Aurora fast clone technology.

So I'm going to digress for a moment here and just talk about cloning quick. Show of hands who here has heard of cloning. All right. Quite a number of you. That's great.

So Aurora cloning takes advantage of the Aurora distributed storage layer. It lets you actually point a new database cluster at the same storage, but there is a copy on right clone. This means that it's very fast to create and you don't pay for a new copy of the data.

So when you create your item database as a clone, it it can happen very quickly and then you can set up replication to get the changes from your monolithic database over to your item database, once it's all cut up, move the end point to point at the new database and keep going.

So you repeat this for each service until you've separated out all of your services into single specific databases and then shrink your monolithic database down, deleting the tables that are no longer needed and turning it into whatever remains.

In some cases, it might be a single service. And for many customers, they still wind up with sort of a main database or a database that contains a significant amount of their application data. But they've separated the larger transaction tables into their own databases.

So now instead of that shared database pattern, we've got a situation where teams own their own service, they own the application code, they own the application logic and they can own all the way down to the database that gives them the agility to deploy changes separately.

It also lets you scale each database independently. So you can change instance sizes. You can configure different serverless scaling rules. If you need to, you can add different number of readers per service and you can configure different autos rules for each one of your services.

So if your order service is busier on evenings and customers tend to sign up in the mornings, so you can have different configurations per service.

Good. All right. So as you do this though, as you break out your databases into per service databases you're going to start encountering sort of dev ops challenges with some of your engineering teams.

They're great, they own the own the team, they have more agile flexibility, but they're going to need some education. They're going to need to learn how to tune database queries, how to manage databases.

And it's important for you to establish monitoring and operations best practices because you want to make sure that all of your databases for all of your services continue to run well.

So this is one of the key things that sort of the database team or the database owners still need to do even as individual engineering teams own their own database for their service.

All right. So you've split it out, you've gotten a lot of agility, you want to start, you want to continue to scale. So you start worrying about horizontal sharding and this is going to help you break some of the limits for what you can do in a single database.

First with database size. So today, a single Aurora cluster can scale to 128 terabytes. If you're dealing with hundreds of millions of users around the world, though, that's not enough.

Similarly, with queries, we can add read replicas, we can serve a lot of read queries. But you're going to want to scale to more writers, more write queries, more transactions.

So we start looking at sharding. With sharding in general, if you're going to build your own. You're going to think about splitting your data across multiple Aurora clusters.

Each one of those clusters is a single shard and contains part of your application data. To do this, we'll take the example of the customer table. You want to figure out how are you going to split your customer table?

You're going to create a partition key. In this example, we're using a hash of the customer id and then mapping ranges of those hashes to individual partitions.

And then for each partition, you're going to logically split those partitions into separate groups. This way you can map individual partitions to to different shards. So you spread your data across those shards.

Each partition is owned and hosted on a single shard and they're spread across those different clusters. This is all something that customers have done for years across databases and using Aurora as you do this, one of the important things to think about is how you choose your partition key.

So you want to choose partition keys for your tables that allow most of your application queries to go to a single shard. So you do this for a couple of reasons.

The first is complexity. If your queries are hitting multiple shards, let's say you're selecting data and you need to join it. You have to build that logic, build that combining result capability in your application.

You have to make sure that you get the correct results regardless of what type of SQL you generate. So you have to learn how to apply filters, aggregates, etc. So that your application gets the result, the right answer in multi shards.

Also from a consistency standpoint, when you build your own sharding, each one of these clusters is a separate database. And if you're doing an update across different shards, you don't have a single consistent view of all that data.

And then for performance, the reason you're doing sharding is to be able to scale out. If your queries hit different, hit multiple shards, multiple different shards, they're actually using more resources than they were when you had a single database.

So if you don't choose the right partition key, you're not going to actually get the scaling as you do your sharding.

The second thing you need to think about is how you map your partitions. So your application wants to know which shard to send a query to. So you're going to create a mapping table that maps your partitions to which shard those partitions are on.

You create the mapping using the partition ID that you generated from your key. And you're able to have the logic in your application to know which shard to go to based on that partition.

So for a particular customer ID, you can calculate the hash, you can calculate which partition that hash is mapped to and then you know which shard to route the query to.

You also have to think about where do you put your shard mapping logic. So the most common thing that we see our customers do is basically put the application or put the sharding logic in the application layer.

They either build a custom module or they use a prebuilt library for one of their tech stacks. For example, in Python, there's the Django sharding library. If you're using Ruby on Rails, there's Active Record sharding.

And these are application layered libraries that manage this sharding logic for you. Other customers choose to deploy their own sharding tier. So they may custom build something. They may use a product like the Vitess for MySQL or the Apache Sharding Sphere, open source project and deploy a separate application tier that knows how to route queries.

We've also seen customers create what we call a routing database. So it's sort of a thin database particularly for Postgres using the Postgres foreign data wrapper where inside the database it knows which shards to route queries to based on your partition ID.

This way your application is still just talking to a single database endpoint, but the underlying queries are routed out to the individual shards, whichever solution you choose, you're going to have to think about a couple of challenges.

The first one is synchronization. So as you're sharding, as your partition mapping changes, as you add new shards and move partitions, you need to make sure that the logic stays in sync, your application needs to have the same mapping as where the data actually is.

If your application gets a little bit confused, you're going to get the wrong results. The second thing to think about is if you add a new tier, if you add a sharding tier, or if you deploy one of these routing databases, you have to plan for high availability and scaling.

You don't want to introduce a new single point of failure just to be able to try to get scaling. So it's important to think about these challenges as you build your sharding logic.

So we talked about the customer ID and how you probably shard by customer and generate your partitions, spread the shards, spread the partitions across the individual shards. You also have to think about how you're going to shard and spread your related tables.

So something like address, each address is owned by an individual customer, you want to have the address for a customer on the same shard as where the customer data is. So that when you try to retrieve the data, when you try to retrieve the address for a customer, all of the query can be processed on a single shard.

So in this case, we're going to co-locate the address state, we're going to use the same partitioning logic, the same type of partition key and make sure that all of the addresses for a customer are located on the same shard.

When you think about the postal code table that's not owned by a particular address or owned by a particular customer. So the postal code table is what we call a reference table. And you actually want to duplicate that table across all of your shards.

So you have the same set of postal codes everywhere. This means that when you go to run a query and you join your address to your postal code, so you can look up a city or state information again, that join can be processed on an individual shard.

You're not, you're not splitting the postal code. You have to make sure that you keep those reference tables updated consistently across all your shards and make sure that you're able to map correctly.

So when we think about access patterns for other types of tables, you want to go through and identify how does your application use them for the order tables. Sometimes you're creating an order for one specific customer. Other times you're retrieving a list of orders for a customer in the warehouse system.

When you're fulfilling an order, you want to be able to pick and pack for one order specifically, or you want to be able to update the status, your shipping tracking, et cetera. So we have two different types of access patterns for orders. Sometimes we go by customer, sometimes we go by order id.

If you were doing this in a single database, you would just create two different indexes you would create an index on the customer id. So all of your queries that use customer id are fast and you would create a second index on order id. So all of your queries against order id are fast.

When you're building your own sharding solution, you can't just create an extra index. Your application needs to know which shard to go to to retrieve data before it actually accesses the index.

So doing the secondary index on order id isn't the solution charting. Instead, you probably want to create what we refer to as a materialized global index.

So you have your order table and you decide that most of the time, the more frequent queries, more frequent access pattern is by a customer. So you shard those tables by a customer id and use the same partitioning scheme and you distribute them across shards the same way you distributed your customer table, but you add a new table and this is essentially your material materialized global index.

In this case, we named it order partition and you shard that you partition that by order id. So now when you spread the order partition table across your shards, it's actually spread out by order id. And when your application wants to look up an order by the order id, it can query the order partition table, find the customer and then query for the customer data that has that order.

So it's not as simple as in a single database. And you have to think about your schema and how you're going to deploy it as your application grows, as your data size grows, you're going to add new shards.

So adding a new shard, you start by just creating an empty database or creating a new aurora cluster mapping, deciding which partitions you're going to move to the new shard configuring replication to replicate the data over to the shard.

Once all the data is replicated and the new shard is ready to take queries, you need to update your mapping logic and make sure that your application and your data position, stay in sync and then you're ready to remove that data from the old shard and have the partition just run on the new sh.

In addition to adding partitions, you need to think about backups again with a single database backups are simple aurora backups are offered as continuous backups. So you can do point in time recoveries to any point in time you want and it's a consistent backup of the entire database.

You don't have to worry about getting some transactions and not others. It's also easy to do point time restores whether you're doing this to populate a development and staging environment or whether you're doing it as part of a recovery operation and you can share snapshots across the accounts or across regions for your disaster recovery planning or for different developer workflows.

So single database backups are easy when you build your own charting solution. You have some challenges with backups. The first thing is that every shard is a separate database. And so the backup of those databases are actually separate backups, they're not transaction consistent across all the charts.

And you need to think about if you were going to restore data, how would you manually coordinate the backups across those different charts? This means that it's really important to think about how you're going to do disaster recovery.

If you do need to restore your entire database after a disaster, you need to make sure you're getting a consistent set of data for all your customers.

So this week, we announced the limited preview of aurora limitless database specifically for aurora post cress and limitless database is really designed to make sharding easier.

So it's a horizontally scalable writes and reads. It enables you to continue to scale your database solution and support growing workloads. It offers declarative table sharding. So rather than you trying to figure out how to do all the data mapping and get down to low level, you can just define what your shard key is for a specific table.

It has integrated shard and logic and data movement. So figuring out how to map the shards from partition to shards, figuring out how to have your application, know what the right shard to go to lilia's database, takes care of that.

And then luminous database is able to auto scale individual shards individual routers as well as provide high availability. So if there's any issue, it automatically creates a second instance that provides that shard capability and it's transaction consistent.

So both within a single query, i love seeing all the cameras go up within a single query. You don't have to worry if you do query or update data across multiple shards.

And for backups again, you get that consist transaction consistent view of your limitless database.

All right, limitless database introduces a new concept called a shard group. The shard group is part of your aurora cluster. So it adds a new concept inside your aurora cluster. And it encapsulates all of the infrastructure used by limitless databases, both transaction routers and the individual data shards.

And it provides a new endpoint for your applications. It's essentially a limitless database endpoint that your applications can use to query across your char at tables and it scales resources within configured limits.

So you decide how large you want your limitless tables to go. You decide how many acus you want to provide across your database, but it automatically scales up and down within those limits that you configure.

There's a distributed transaction routing layer that handles, handles all your application traffic provides that endpoint processes queries allows you to create database connections, et cetera and it scales vertically and horizontally based on load.

So it uses the aurora cervus technology to instantly scale up and down for individual routers. So if you run a big query, that router will automatically scale up. And if you have a larger load over time, it will automatically add more routers to the system.

In addition, it maintains that synchronized shard mapping logic. So all of the things that we talked about about keeping the application in sync with your database routing, that's what the transaction routers do.

And in addition, it aggregates results for multi char queries. So if you do run queries that query across multiple shards, or if you, if you run queries that query across multiple shards, the transaction r aggregates all those results make sure you're applying the correct sequel semantics for filters, aggregates, et cetera and give you the correct results back and it drives distributed commits.

So if you're updating data that goes across multiple shards, the transaction routers make sure that your commit is consistent across all of your shards.

The second layer in the shard group is the data access shards, these own partitions of individual shard tables. So as we talked about mapping your partitions across shards, each one of the data access shards is, is one of those shards and owns one or more of those partitions.

They have full copies of the reference tables. So when we talked about creating tables that are duplicated across all the shards, the individual data access shards have a full copy of those reference tables. So you can do your joins and your query processing local on each chart.

And these also scale vertically based on the server two technology so that as you're running individual queries or more queries on particular shards, they'll instantly and automatically scale up and down.

They also split based on load. So if you have a particular set of queries or sorry, a particular set of keys that become hot and that partition needs additional resources. The oral limos table will automatically split that shard moving partitions to new shards and giving you more capacity on those set of hot keys.

And these the data access shards execute local transaction commits. So when you are doing single shard queries or even multi chart queries, the local transaction commits typical database commits are executed on the data access charts.

And all of this is backed by the aurora distributed storage layer. So you still have six copies of your data still stored across six of three availability zones and you have that high durability. But now instead of having that 128 terabyte limit, you can have many shards each of which in theory could grow to 128 terabytes.

And so it's relatively limitless size for your maximum aurora cluster.

So the example of the declarative sharding logic.

So if we take our example of customer table rather than having to build special logic for doing this, you can, you can define and declare as part of your schema creation that you want to create tables in a shard of mode.

And you specify that for the next set of tables that are being created, you want to shard them by customer id and then you run a typical create table statement. And so this is how we make it easy for you to actually create your charted tables.

When you want to create that address table that's co located. When you want to have addresses located with the same customer on the same shard, you specify that you want the next set of tables to be co located and which table do you want them to be co located with? And then you run your create table statement.

So we've made it much simpler for you to actually define how you're charting how you're partitioning data across the charts.

And then for reference tables like postal code, it's the simplest declaring that the next set of tables are reference tables and then creating the table limitless database under the covers, automatically duplicates the data across all the shards and make sure that as you insert or update data on the reference tables, those data changes are replicated across all the shards.

So each shard stays in sync. So this is how customers are using amazon today and how they're going to use it in the future as you build your e commerce app or take this back to your regular job on next week.

You know, think about how you're going to handle that rapid growth, taking advantage of aurora service for immediate instant growth using reader instances, autos sc and simplifying the management of your database, optimize optimize your queries, tune your indexing schemes, look at your parameters again, customers that do this well can very dramatically improve their database performance.

And that helps in many ways as you're trying to scale, think about your data model and refactoring your data model over time. What you build on day one for your app is not going to necessarily work for moving to database per service. It's not necessarily going to work for sharding.

You're going to need to refactor your app in order to make sure that your performance scales as you add additional hardware as you move towards sharding models and then scale up and out.

So we talked about both scaling individual instances, adding reader replicas, moving towards databases per service so that you can scale individual services moving towards charting. And then of course, we're excited about the mill database and how it's going to make it easier for customers to hyperscale.

So if you want to learn more about scaling aurora, there's a number of other sessions this week that you can take a look at some of these, including the last one, the achieving scale was yesterday. So the recording of that will be available for you if you didn't happen to see it.

And thank you. I want to encourage you to please fill out the survey in your mobile app. We are a data driven company and we like to get your feedback so that we can plan our content and improve things uh for every event as we go forward.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫