Deep dive on MySQL databases in Amazon RDS

Hey everyone, welcome. I hope you've been having a great time for the last two days at re:Invent and learning a lot. Thank you for coming to this session. My name is Sirish Chandrasekaran. I'm the General Manager for RDS's open source databases - that's MySQL, MariaDB and PostgreSQL.

In today's session, I'll talk about a few recent innovations we've done this year to improve availability, manageability and performance across our database services. The talk is a 200 level talk, so I'm gonna assume some level of familiarity with databases and operating them. In terms of the actual content, I'll focus on the what and the how. I'll also go a little bit into the implementation details, but not as much as you would expect in a 300 level talk. But I'm gonna stick around after the session in case you have any questions if you want to pop up a level or if you want to go a couple of levels deeper.

So let's get right to it. In terms of the agenda, I'll start with a very brief overview of RDS - I assume most of you are familiar with the service, hence you're here. But just in case you use databases today but not RDS, just a quick update on what it is.

And then we'll go through four features:

  • Blue/Green deployments, which make changes to your databases faster, safer and simpler.

  • Multi-AZ with readable standbys, which increases the resilience of your databases, improves failover times, improves performance, and in some cases, lower costs.

  • Then you have two other features - Optimized Reads and Optimized Writes - which we released just on Sunday, which double your read and write performance respectively.

So what is RDS? RDS or Amazon Relational Database Service is our collection of managed database services that makes it easy to set up, operate and scale databases in the cloud. We offer a wide choice of database engines - on the screen, you see Amazon Aurora, which is a cloud native database, there's a MySQL edition and a PostgreSQL edition. And then we also offer the three open source databases I mentioned - MySQL, MariaDB and PostgreSQL. And then there's a couple of third party proprietary databases, Oracle and SQL Server.

The way we view the value of RDS compared to say, self managing your databases on-premises or on EC2 is fundamentally - it's easier. Our databases are fully managed and that's the reason we offer the service to our customers - that's the reason our customers use us. So that's the first thing.

The second thing is databases are hard to get right. And when they go down, you're in a world of pain. So availability and durability is core to what we do and we spend a lot of our innovation in those areas.

The third one of course is performance and scale - workloads always get bigger as your business grows and you just wanna make sure that you don't have to switch platforms. So continually raising the ceiling is a big focus for us.

And the last one goes without saying - this is job number one for everyone at Amazon - security and compliance. We have to make sure that your data is safe, your environment is safe.

And as we think about our journey in RDS, it started in 2009 and we have a constant or continuous wave of innovation along those different dimensions I mentioned - increasing choice for our customers in terms of the databases we support and also raising the bar on availability, manageability, security and performance.

I'm not going to go through this in detail, but just at a high level, you'll see that in the first half of our life as RDS, a lot of our focus was on getting the basics right - so putting new database engines, we started with MySQL, then we added Oracle and all the other databases, getting HA or high availability right - so soon after our birth of the service, we launched Multi-AZ which keeps your databases available even when one entire Availability Zone goes down, you can continue to operate.

And then in the second half of our life as RDS, we focused on continuing to raise the bar - further integration with other services such as S3 and Lambda. A lot of you, you know, you don't use databases in isolation, you use it as part of a whole overall stack. So things we can make simpler on that front.

And the basics still remain important - things like disaster recovery with global databases, things like simplicity and cost management through RDS Proxy.

And then a lot of the features that you see on the right side here, many of which were announced this week again are in the spirit of helping our customers make it easier to run databases in the cloud.

And all of this innovation really comes from customer feedback. Most of our roadmap is based on what we hear from our customers. And this has also helped hundreds of thousands of customers run their database workloads in RDS. And we thank those customers.

And while the real proof of the pudding is in customers using us, and we're very thankful for it, we also appreciate the recognition from Gartner who rated us as the industry high solution for Database as a Service. And this rating of 95% - it's based on a combination of factors - there's required criteria, there's preferred criteria and optional. We met 100% of the required criteria, 94% of the preferred and 75% of the optional, giving us a rounded score of 95%.

So that's RDS in a nutshell. Let's get into the meat of the session, which is to talk about some of the new features that we have launched in the last few days and months.

I'll start with Amazon RDS Blue/Green deployments. This is currently available for RDS MySQL and MySQL, but our goal would be to offer it to other engines as well. And as I mentioned before, these four pillars that you see on the screen are a constant reminder for us when we think about launching a new feature - is it hitting one or more of these dimensions?

And personally, we love features and our customers love features where a single feature is able to hit on multiple dimensions because you're raising the bar across many things. And Blue/Green deployments does exactly that - it makes a very complex task today easier to administer. And it also improves the overall availability and durability posture for database workloads.

And the way we think about it, this will be an ongoing journey for us. There's a lot of things RDS already does where we manage it for you. There's a few things like schema design, query optimization that you still manage because it's specific to your databases - it's your secret sauce. And updates - database updates are the ones that we focused on with this new feature.

So what are we talking about here? Where would you use this feature? Your databases are not static, obviously you're adding data, deleting data, changing things. But more than that, your database environment itself is constantly changing over time. You have to upgrade your databases because let's say there's a CVE and you need to apply a security patch or there's some other important patch that you need to apply to your database. Or you decide that hey, you can actually make your database more performant by changing your schema. Or you decide hey, you're running out of headroom and you need to scale up your instance to a larger size. Or you decided after looking at some workloads that hey, you need to fine tune some of your parameters in our parameter groups. Or you know, there's an OS update that needs to happen - it's not directly for your database, but it's something that you're required to do.

So there's a large number of reasons people change databases. And the thing about changing databases is they're a crucial part of your stack. And so you have to get it right - if you get it wrong, your app goes unavailable.

And customers typically have two historical methods they have had in RDS on how to do database upgrades. And while we focus on upgrades here, you know, this also applies to any of the other changes I mentioned like schema changes.

So the first option, which many customers use, in some ways it's the simpler option, is to do an in-place database update. And this is essentially overriding your existing database, your production database in place. It's super simple in RDS - if you're doing a database update, it's a 1 click workflow. If you use the CLI it's one command and that's one of the reasons many customers choose it because hey, it's simple.

And for the most part it works. But the key here is for the most part - this is a dangerous operation because you're making changes to your live production workload and you're effectively taking downtime. In the case of an upgrade, even for things like schema changes, you're affecting your live workload because you may be building an index, dropping an index, adding a column and these are expensive operations on a live production system.

The other thing is these can be unpredictable and this is probably the trickiest part about doing an in-place upgrade. Depending on the size of your database, the schema that you have, a database upgrade could take seconds, it could take hours. And so you're basically taking a leap into the unknown - it's simple and you're hoping you'll get to the other side. Most of the time you will, sometimes it's tricky.

So that's in-place upgrades. A large number of customers say hey, this is not for me, I want to use something called blue/green deployments. And many of you may be familiar with the term, but I'll also get into the details of it in a second.

But effectively the way to think about blue/green deployments is your production database is let's call it blue. You make a copy of this which we're gonna call green. You sync blue to green so green, you know, green is not your production database, but it's got all the data, it's synchronous.

You do whatever changes you want on green - you could change the schema, you could upgrade the database. When you're happy, you stop writes to blue and your application starts writing to green. So that's the operation - you go from blue to green and all the changes are happening in a staging environment.

So this is safer.

You get a higher degree of control. Even if it took, let's say 12 hours to upgrade your database, you could do it on, on the site on a Wednesday. You could test it. You're all good. And then on Saturday or Sunday, when you have your maintenance window, you can do the cut over in seconds or minutes.

So it's faster as well. But, you know, maybe the right way to phrase the slide would have been, it's safer and faster than the alternative, but it's not safe or fast on an absolute basis. Because there are still some gotchas here.

The first thing is this is actually fairly complex to pull off. Let's assume that you have a database with five replicas hanging off of it and those five replicas have more replicas hanging off of it.

First thing is you got to make sure that everything is set up correctly. Second, you got to make sure that everything has the same configuration, right? If you've done any sort of customization outside of the default, your green environment has to mimic the blue environment.

Then when it's actually time to do the change, you got to do things like fencing, you gotta make sure that you don't lose any writes either on blue or green. So you got a quiesce blue, you gotta make sure the green is caught up, no errors in replication, and then you switch over, right?

But here's the tricky part, you may think the green is synced with blue, but bad things can happen. Let's say that someone issued a large transaction. When you actually do the cut over, you could stop your reads on blue, but green may, may not have caught up and frankly can take minutes or hours to catch up depending on that one very large transaction or DDL.

And so even here, you just have to spend a lot of time, a lot of care getting it right. And some customers do this because it is worth the effort, right, if you get it right. It is safer and faster.

But then it begs the obvious question, you know, can we do it for you? Can we make the right side a push button workflow? And that's effectively what we have done with blue green deployments for RDS.

We take this workflow which is very familiar to many customers and we've made it fully managed. So it's just a set of buttons you push on the console or set of CLI commands or API commands. It's safe. We add a bunch of the guard rails.

You know, we have been operating databases for coming down 13 years now. So in some ways, we know all the ways things break, right? Because we just have a lot of experience, you know, former leader used to say there's no compression algorithm for experience.

And so we've built in a lot of guardrails based on our learning. And because we have this as simple, because we have all the guard rails, it's also fast. It typically completes in less than a minute.

So how exactly does this work? So let's assume you got blue, you've got a primary database. It's got two replicas hanging off. If you're on the RDS console, it looks something like this. You know, you go to the front landing page, you click on databases and you'll see, hey, there's a production database. I think it's called my product db.

The first thing you do is you issue two commands or really one command which is you create a blue green deployment. And what this does is it creates a new resource. It's not a database, it's just, you know, an object which is the thing that you use against, which you issue commands to do the switch over.

And you can do a describe to see what this looks like, but effectively what's happening under the covers is, you know, when you go to the console, if you use the console, the CLI you go to the database, you go to the modify, you know, there's a drop down, you just say create green and you get you get the green database on the right side, right?

You can, when you create green, you can either just make an identical copy or you can say as part of creating green, also upgrade to a later version. It really depends on your use case, whether it's an upgrade, schema change or whatever it is.

But once you've done this, you basically get a copy of your environment. A few important things here, green is not just the primary, it's also got the first level of replication. Now, Amazon RDS for MySQL supports multiple levels of replication with chain replication.

Currently, the first version of this feature, we don't support a chain, we support one level of replication, but there's nothing inherent about the way we built it that prevents us from offering a second level. So we create a topology we replicate your entire database topology from blue to green.

The second is we set up the replication between blue and green and monitor the metrics and make sure everything is up to speed, not just between blue and green, but also between green and its replicas the green primary and its replicas.

Now, you could choose to just leave it as it is or you could say, hey, I want to add more replicas, I want to you could do whatever changes you want to do to green. And at this point, you can run your tests, you can make sure it's working to your satisfaction.

If it doesn't, you can just dump green, you can delete those databases because the database instances like anything else, you can delete those databases or if you're happy with them, you can say, hey, I'm ready for a switch over.

And when it's time to do a switch over. And um actually just a couple of things as you can see in the console here, green is just another database like blue. So what you see here is you have, you know, the blue icon, it's the original database.

When you create a blue green deployment, it creates the green database that automatically names it. It's all available on the console. You don't have to do any extra work, also copies all the configurations as I mentioned. So you know, as you can see in some of the details below, we got the check boxes, for example, backups have been set up. So you don't have to think about this. Green is a mirror of blue.

Now, when you say, hey, I'm good to go. I want to go make the change, then you issue a switch over command, right? Um you can delete blue optionally, but really it's just switch over. So if you think about it, the only commands you really need to know are one and three - create blue green deployment and switch over.

On the console again, it's very simple. You just go to the right place that says switch over and it happens, you don't have to think hard about it. And what happens when you switch over is that we fence the traffic, we quiesce writes on blue, we make sure green is caught up. We transfer your writes to green and then that's that.

Now while we quiesce the writes on blue, you can still apply, reads to blue. So if you have read heavy trips, you can still keep running while we do the switch over. This whole operation happens in about a minute. You know, your mileage may vary in the sense that you could be a lot faster, but generally, you know, safe estimate is you can do even major major version upgrades in less than a minute.

And this is a substantial improvement over what many customers see in a self managed environment. So I talked about safety. So what do I mean by that again, we're constantly monitoring the health of both blue and green on both sides. We look for pathological cases like DDLs, large DDLs or long running transactions, which could make things go badly.

I mentioned about fencing the writes, you know, we block writes on blue, we switch, we allow green to catch up and then we switch, switch over. And this other thing which I failed to mention earlier, this is one of the most important features that, you know, as we were talking to customers, how would you like us to design this?

This is you can actually specify a recovery time objective or an RTO which is the maximum downtime you're willing to tolerate when you go from blue to green. Because if you do hit a pathological case and you don't want to wait an indefinite amount of time, you can say as low as 30 seconds, you can say if the upgrade hasn't finished in 30 seconds, just terminate the switchover abort and restart my writes on blue, right?

And so this is a very powerful fail safe where it's fully managed. You can guarantee that you're never gonna have an outage greater than a certain amount. And the rest of it we take care of for you, for you. And at the end of it, you're left with the green environment which is upgraded or it's got the right schema, it's got the right parameters, whatever changes you want to make to it.

I do want to point out in the interest of safety. We don't actually delete blue. You have to delete the blue databases by yourself. And the reason we leave it is sometimes you just want to keep it around for forensics. Let's say one of the reasons you made a change was something was not working great on blue. You want to keep it around for forensics or you just want it as a fail safe. You know, you may want to establish replication the other way from green back to blue. If you, if you really want the features available on both Aurora MySQL and a MySQL and also RDS MySQL.

So all three of her, MySQL, Maria family of and in some ways, it's very timely because some of these databases are reaching community end of life. And it's, it's a great way to make sure that you can upgrade, stay current with your security changes with the best features and fixes that you get from these community databases. But without taking a significant downtime, just as a reminder, we've talked a little bit about upgrades, but it's not just upgrades, it's all the other kinds of changes that you want to make to your production database, but in a safe way.

So that was the first feature. The second one is more focused on performance and availability and this is multi-AZ with readable standbys. This is something we actually launched earlier this year in May. But I want to make a point of sharing this because a lot of our customers have found tremendous value from this.

This is one of the features that customers have been asking for this for a while. We're very excited that we have been able to deliver it. And just recently, we expanded the number of regions in which this is available. So a lot more of you are able to use it now.

So as I mentioned, this feature is about performance. It's actually also got some benefits to availability and we'll talk about that in a second. But this, this feature is part of a bigger set of changes that we've been making.

Just want you to be aware that it's not just this one feature that we've done to improve performance. We introduce graviton instances which have a 35% better uh pri uh performance and a 52% better price performance.

Uh the, the multi with readable standbys, which i'll talk about it, doubles your right throughput and a few other benefits in terms of jitter.

And the third is we now have 15 read replicas. A lot of customers said, hey, five is not enough. So we now offer 15 and the fact that my sql offers change replication, you get 15 times 15 plus 30 which is a total of 255 replicas.

And some customers really do h use hundreds of replicas. So on all these dimensions, we're focused on improving performance and scale.

So let's talk about this thing in the middle, which as i mentioned, a lot of our customers have been asking for a couple of years. We're very excited that we've been able to launch it this year.

So multi easy, you know, this is the core of what we have done in r ds. We've had it since almost the very beginning since 2010. And our current model, i don't have a slide on this. So i'll just walk you through it. Bear with me when you create a multi a instance, you essentially get two hosts transparently to you as a customer. It appears as one instance, but there's actually two different database hosts underneath.

The primary host is in one a z, the secondary host is in the second a, hence the name and we keep them synchronize the primary and the secondary through block level replication in the file system and the album bm layer.

And because we use block level replication actually, before i go there, we also have automation that's constantly monitoring the health of your compute your storage on both the primary and the secondary a. And if there's any sort of a failure, we automate the fail over.

I think many of you are familiar with rd maltese. This has worked tremendously well for our customers, right? A lot of our customers, hundreds of thousands of them that they workloads on r ds because we have multi ay, it's tried and tested, we have ironed out the kink silver in more than 10 years and it works.

So begs the question, if it works, why did customers ask you to do something different? Why did you do something different? And really comes down to the fact that when we built multi a a lot of these open source databases did not really have what i call high fidelity replication technology.

And hence we use the storage level replication, right? It's block level replication. And what this does is it works really well for the core use case of h of high availability. But it also means that you can't actually be running a second database, live database on your standby. It's an idle database, right?

So in some ways, you're spending for two hosts because we do provision two hosts on your behalf, but you only get to use one of them, right? That's one issue.

Another issue is that in the last 10 years, database replication has just come a long way, right? My sql especially 57 big advance were 56 in terms of parallel replication a o is even better.

And what customers are now realizing is you can actually get even better performance by using database levels replication because it's it's less chatty than sending every page change. And you can actually get better performance out of uh using native replication in my sequel and there's also benefits to failure work.

So we said, ok, we hear you, the world has come a long way in the 12 years since we first launched multi a z, we're gonna automate and make simple again, this new model of multi a z. And that's what we call multi a z cluster here.

And the way it works, what you have in the center here is the primary database instance, right? This is what you've provisioned. And we also provision two other databases in two other azs. So this is a three a solution and we use database level replication from the primary to the two secondaries.

Now, unlike multi a database instances, multi a database clusters, right instance versus cluster, you actually get to see all three end points because these are not idle replicas, they're actually active in the sense that they can take workload.

Now we use native replication, these are available to you. Now, one important thing here is we use something called semi synchronous replication. So the way it works is that when you write to your primary, we ship the logs, the redo logs and the bin logs to both replicas.

One of them has to persist to local storage and act back to the primary. When that's done, the primary can commit the transaction. And the thing that i missed here is an additional innovation was we, we made sure that multi az clusters are available on instances that have local ssc or local instant storage.

And the reason we did this is that when you actually have the local ssd, instead of always writing all the way down to ebs, which is the network storage we use, you can actually commit your log records to local storage, both in the primary site and on the secondary.

Now it eventually gets persisted to ebs for durability. But for availability purposes that allows us to get a lot of the network hops out of the commit path. And that's one of the things that drives higher performance because we also take our network storage out of some parts of the commit path, you're less prone to jitter.

And so typically, jitter is a combination of, you know, when you have a sophisticated stack, every hop, if there's any sort of jitter in the latency, it all starts adding up and especially in the synchron system, it can add up. Right. In this situation, we have reduced the number of network hops both within an a and both across a because the protocol is less chatty between az and because it's semi synchronous, only one of your replicas has to get back.

So the other one is experiencing a moment where it's slow. For some reason, your transaction doesn't slow down. So that's, that's kind of what we've done with multi clusters.

Now, reads are still served from either cash or ebs. Um and so that doesn't change your primary can take rights and your two replicas can take reads. Your replication line is very low. We've, we've tried this at full throttle and it generally works well.

And so in terms of the performance, i mentioned some of the reasons you see performance gains, right? We're using local ssd and more on the right path. We use semi synchronous application. So you have less jitter.

What ends up happening is in our benchmarks, you see, you know, on average a two x increase and right throughput. And just as important, there's also a significantly lower amount of jitter, you know, which is the p 95 latency that we have measured here compared to multi a database instances.

Now i talked about performance. The other thing that customers really love about this feature is the fact that you not only have a primary end point for your reads and writes, you also get a reader end point where you don't have to decide how you load, balance your reads between the readers, you write to the reader end point and we do the load balancing across the readers for you. So that's simpler.

And the third part and i i briefly mentioned this, my sql has come a long way in terms of replication, there are some benefits to fail over technology. And here's what happens when you have the multi ac database with one standby.

So this is the model we've had since 2010, right let's say your primary crashes or some sort of an outage and ids automation decides that, hey, you got a fail over from your primary to your secondary. What's effectively happening is that your secondary is a synchronous copula primary, right?

So everything that was persisted to disk or ebs is also there in your secondary. Now, you suddenly bring up your second database because remember it was idle, you got to bring up the host. And what it does is it effectively starts crash recovery, right issue happens.

We detect failure, we start the host on the second slide, it does crash recovery. And there's really two competing things. You got a complete crash recovery which you know in my sequel, you do redos, then you do undo it can take time. Really depends on your right workload and when your last checkpoint was and then you have to do dns propagation.

Now, this picture is a little artificial because it makes it look like your crash recovery in your dns propagation end at exactly the same time. But what it really is trying to imply is that it's the greater of the two times. Once you have completed both crash recovery and dns propagation, then your new, your old secondary, which is now your new primary is able to take additional workloads. And this can be a long amount of time. It typically 60 to 120 seconds, but it can be minutes. It really is worth dependent with monkey ay clusters.

You're shipping bin log and video logs from the primary to the secondary and the secondary is constantly, both, both secondaries. Right. They're both constantly applying the logs, which is the only reason you're able to run reads on them.

So, if something happens to your primary, you don't really have to do crash recovery on the other side, what you have to do is you have to apply the delta of the things that you committed to local ssd, but haven't actually been applied to the database. So it's much smaller now, it is workload dependent. But we're finding that on average, your failure time improves by a factor of about 2 to 4.

So again, multi a z has been great for our customers. Both in terms of just raw performance, you can increase your right through for br two x. It's simpler because you get the reader end points and it's also more available.

And of course, you know, the thing is, let's say that you have a larger event like an entire availability zone goes down. In this case, you're still left with two nodes. So you still have coverage between one node and the second.

Um so that was, that was one of the features uh that we launched earlier this year. As i mentioned, it's had a lot of resonance with our customers and we're excited that it's now available in more regions.

The next two features are really around taking advantage of innovations that ec two has done because a lot of what rds focuses on is making sure that we're getting the best of what aws has to offer to our customers in a fully managed service.

So what are optimized rates? You know, one thing i think all of us can uh agree is that you want your database to never lose rights. Right

If you think about acid properties - atomicity, consistency, isolation, and durability - you have to be durable. If something is written to your disk, it's written to your disk regardless of failure. So MySQL has a way of enforcing this.

The key concept here is that MySQL uses 16 kilobyte pages - that's a MySQL page. Historically in hardware, you've been able to write 4K blocks atomically to disk - that's historically what's been supported. So when you want to write a MySQL page, you have to write 4 4K blocks, right?

Now let's say you write the first, you write the second, and then you have a failure. You have what's called a torn page - it's half on disk, half not on disk. When you come back up, you don't really know the state of truth.

So MySQL works around this through a feature called double write buffer. The way double write buffers work is you write the 16 kilobytes, you know, in 4K blocks to a double write buffer, which is an InnoDB data structure. And then you write it again to persistent storage.

Now, if there's any sort of failure, when you come back up, what you're really doing is comparing these two copies to see if they're different. If they're different, you know the double write buffer is the source of truth. If they're not, you know everything's good.

This works well - it does the job, your data is durable. That's kind of the most important thing here. But it's got some downsides to it. It's slower because you've got to write everything twice. It's inefficient from a throughput perspective, right? And again, depending on your workload, you may have to provision more IOPS because you're rewriting everything twice, right?

So again, the question is, can we do better? And this is where EC2 announced this feature, I think just earlier this week, called Torn Page Protection, where effectively we can now write 16K blocks atomically to persistent storage. And this is a property of the Nitro card, the Nitro system - the Nitro system.

And so if you're using one of those instances, which at this point is really most of our newer instances, you should be able to get advantage of this. And so from our perspective, as MySQL and RDS MySQL, we are able to write the 16K blocks atomically and EC2 takes care of the rest for us.

Now, it's not as easy as "Hey, can you, can you do this?" We, you know, you, you gotta make sure that our writes are also 16K aligned. So there's a little bit more detail than I'm letting on here. But in effect, it's a partnership between EC2 and us and EBS to make sure that we're able to get something faster for you.

So again, before, which is on the left side, you'd write everything twice. After, you're writing it once - pretty basic idea, very powerful idea.

And we ran some benchmarks. This is on SysBench. This is with MySQL 8.0 on a 16xlarge instance. And SysBench, it mimics OLTP, simpler OLTP applications - more like web workloads.

And what you see here on the x-axis is the number of threads - so it's the amount of concurrency in the system. And the y-axis, the bars are actually the throughput per second for the transaction throughput per second. And the line is the improvement that you see from turning off double write buffers versus having them on.

The numbers - this is switched on by default. It is, and I'll come to that in a second. What you see here is that the improvement percentage actually increases as you go more towards the right, which makes sense because if you have a heavier workload and a more concurrent workload, the double writes are gonna be more impactful for you.

So the heavier your workload, the more the improvement you actually see. I think on the very right here, you see something like 233% of improvement - that's actually like 3.3x.

We also ran this for HammerDB using the TPC-C like workload because that's more complex queries - it's, you know, different workloads and different benchmarks are good fits. And here as well, we saw that, you know, there's a 1.5x percent increase, which is again, tremendous for a lot of our customers.

This focus - why don't I, why don't I get to that in a second? Hold that thought. And, you know, I think we'll have about 10-15 minutes at the end of this for any questions, but I'll answer the question as well.

Short answer is not yet, but I'll explain. So as the slide shows, it's available for 8.0 MySQL, 8.0 and higher. You have to use specific instance types. But the goal is to make it available for all instances.

And here's the catch though, and that's something we'll work on fixing - it's only available for new databases. And so you may ask, why is that? Why can't I turn it on for my existing databases? Because I got tens of terabytes, you know, help me out. And this is where it comes out to - we also need to make sure that all the way from the database through the file system through LVM all the way down to this, it's all 16K aligned.

And so we're just being showing an abundance of caution to make sure that your existing databases, the way they have been written out, are all 16K aligned. Because if those 16K doesn't align all the way through, you may have 12K in one block, 4K in the next, and then you sort of lose the protection. So that's something we're working on, working through.

But that's, that's the reason that it's for new databases at this moment. So how do you actually get started with it? It's pretty simple. You go to the console, you pick RDS for MySQL, and you know, you can just say, show me the versions of the Torn Page Protection optimized writes. And you can pick the right version.

And then you can also say which instance classes support atomic optimized writes. I'm sorry, I'm using the old name. And then that's that, you know, you, you, you get it. It's at no extra charge. It's turned on by default.

Now you could choose to turn it off - that's up to you if you choose to turn it off, but by default it's on, it's called the AUTO setting in the parameter groups.

The last feature is optimized reads. So again, let's look at how reads happen in MySQL. You've got your app, you've got your database, and we store currently all of our data in EBS. This includes both durable data but also temp storage.

So MySQL uses temp storage for a variety of different things, right? So let's say you're using joins, very large joins, where your two tables don't fit together in memory. You're doing a very large sort using common table expressions or using a hashed aggregate - for all of these things, MySQL creates basically a spill space with temps.

These temp tables currently also sit on durable storage. But the fact is, let's say MySQL crashes - when you restart, these temp tables are not persistent. So there's no reason to actually put the temp tables in persistent storage when they don't need to be persistent.

And so what we've done here is we said, alright, we're gonna move those temp tables into local instance storage, which is also where the database is hosted - the compute. And effectively you get faster reads, right? What we're seeing in our benchmarks again is up to 50% faster.

So 2x faster in terms of elapsed time. This mainly helps complex analytical works - more complex queries where it's read heavy. And an important thing here is it also reduces your cost because you're not doing unnecessary I/O out to EBS when you know, you could just be doing it all within the compute instance.

So again, pretty straightforward idea but powerful idea. We used to store temp tables on EBS, but for the classes, for instances that do have local storage, we now can move the temp tables onto the local instance storage.

We ran benchmarks - in this case it was HammerDB and TPC-H, which is more of an analytics workload. They have 22 queries and you can see for each of these queries what the level of improvement is. And on the right side you can see - and lower is better by the way for the bars, because it's elapsed time for the query.

And on the right side, you can see on the lines, you can see what the improvement is. It's up to 60% but on average it's more like 50%.

Like I said, it's available for all instances by default. As long as I have instance storage, you don't have to do anything to turn it on, you can't turn it off. But if for some reason you don't want to use it, you can always scale back to a different compute instance that doesn't have any local instance storage.

And that brings us to the close of the main content here. I do want to point out two things - there's two other sessions that I'd highly recommend that all of you attend if you are able to.

The first one is a deep dive into Amazon Aurora and how we scale Aurora - we have a new version of it that came out earlier this year. It's been phenomenal, much faster scaling, much better performance, and you get all the properties you get with Aurora including multi-AZ. So that talk is happening on the 30th.

And the other one, which is tomorrow I believe, is Trusted Language Extensions for Postgres. Postgres has a concept called extensions which essentially let you introduce new data types and do a bunch of interesting things with Postgres.

Historically we've certified every extension, but now it's more of a bring your own extension model with some caveats where we make sure that it's safe and that we will still be performing well for your database.

Like I said, I'm here to answer any questions. But thank you again and please make sure to fill out your survey. But I'm around if you have any questions for me.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值