AI-powered scaling and optimization for Amazon Redshift Serverless

hey, welcome everyone. i hope you are having a great time with aws re invent and thank you for coming after adam's keynote.

can you please raise your hands if you are a red shift amazon redshift or amazon redshift serverless user? oh, full house. look at it. i'm really glad. thank you.

um how many of you can you raise your hands if you have experience, you know, pain points related to scaling when your workloads are varying, you know, or there is some large jobs coming in. few of them. all right. a couple of them actually more than a couple of them.

i do have a solution here. a i powered scaling and optimization. we will talk about it. we will do deep dive, we will go into some demos. you will see it in action.

i am ashish agarwal, a product manager with amazon redshift. i along with my colleague, tim kraska, who is a professor at mit will do a deep dive into this new a i technique that we are bringing to help our customers.

so here is my agenda. i will do a quick recap of amazon redshift serverless. i'll talk about some success stories and pain points. and tim will do a deep dive into the new a i technique that we are talking about. i will do four demos, we will conclude and then do a q and a.

so what is amazon redshift serverless? it's an offering of red shift where you do not have to worry about type of the nodes type of how much capacity you need. you do not manage the infrastructure, everything is taken care by the red shift service team automatically. and the best part is it's pay as you go model, meaning you only pay for the workload when it is active and you do not pay when it is idle. and that results in in cost savings.

some key capabilities it's automatically provisioned, it scales, it does automatic scaling, it does very good, good auto scaling. when there is a high concurrent workload, it does ml based workload monitoring and make decisions. it does automated backup. there is no maintenance window. it's available to you 24 by seven because it's all automated patching and you can get started very quickly.

so how does the architecture for redshift serverless looks like as mentioned, it's ml based monitoring is done. the workload management is automatic using auto w lm. there are background optimization jobs. there is automatic tuning like automated vacuum, automated key creations. the patching is done automatically and you see you can do your data lake queries, you can query your data on s3, you can do your federated queries and you can connect traditionally that you're used to either your jd bc, your data api s or your favorite bi tool.

so your experience is very seamless. you derive, you know, a lot of burden has been taken by the red shift automatically and customers have experienced benefits of it when they are using serverless.

so let's talk about it. i'll take a few examples here, customer play a mobile game operator and creator using red shift serverless. they were able to on board 10 x of more of their analyst because of the auto scaling capability, they experience better performance and you know, cost savings of over 20% because serverless decides when to scale. when there is high concurrent workload, it was scaling very well. so they, they do not have to provision, worry about the capacity upfront.

similarly, there was another customer mosaic a i and um workload management uh provider. they use the auto skiing technique for using the amazon redshift serverless and were able to manage their data pipeline, their data lake queries better resulting in both performance and cost savings.

so this is all great. but at the same time, we listen to our customers and they provided us feedback that their workloads are more varying in nature. ok. by what i mean is, you know, they have steady state workloads, they have etl queries which require more memory. whereas the study work requires less memory. and on the other hand, they have data scientists and data analyst who can send, you know, big, big jumbo queries that have a huge requirement of memory. and those workloads can possibly time out or can you know, cause disturbances to existing workload.

so they were happy with the existing scaling technology, but they still want better responsive technology for the variable workload which will scale even better. so we listen to our customers here and that's why we went and built this a i driven scaling and optimization technology.

i would like to invite my colleague, tim to talk about, you know, this new technique and do a deep dive. thank you.

thanks. so my name is tim kam. i'm a director of applied science and also when i'm not working for amazon, a professor at mit, i'm going to talk a little bit deeper more about what we build and why we build it.

so for that, i would like to use an analogy and that is that is a little bit like shopping carts, you have a bunch of stuff in there and you need to eventually check them out, which is like the processing of a shopping cart, right? so you have different work items in your basket and you want to check it out.

so the question is how do you scale it? actually the typical way is that you have a range of counters which can work in parallel, right? so you have a certain capacity in your retailing business in your store. and we normally refer to that as the base capacity or base up here, you set it up. it's essentially the numbers of counters you have. it just says on how fast you can do like parallel processing and other things.

but what if you actually ran out of capacity? right. so one thing which we can do which is a little bit harder for retailers is we can very easily add another warehouse to it like another store. this is the same size, right? we can even add like a third one and even increase the capacity further just on the mu.

so a little bit more formally how we do it today is actually we have a component which is called al wm, which looks at the numbers of queries coming in and makes a decision based on that. we also have like a thing which is like a prediction model which already considers, for example, the demands of a single theory and we use that information for decision making, right? and then we have this like base capacity which you actually have to set up front. this is like essentially a knob, you can tune in order to figure out how fast things would be like how many counters you warehouse.

so now currently how the scaling mechanism works is creates are coming in and we essentially look into how many concurrently crates are running at the same time, if the load increases, we add another base capacity to it, like another warehouse of the same size. right. so this is great and our customers already love it. right.

so we have our shopping carts with different work items. we have like a certain capacity in our store. we can even scale it up. the numbers of warehouses. everything is good until you see this very large basket shopping cart coming up, right? and we all know that if you have been to a retail store at some point, like there's somebody coming up with the card and everything queues up behind it, right? and if you have several of them and i'm talking about the really big stores, i probably cannot name them, but you probably know which ones i'm talking about, right? it can really slow everything down, right?

so more formally essentially, it's just like a large query comes in in the base capacity is not enough, you need to do more. you want to increase it on demand. beyond this, like numbers of selectors counters, you necessarily have to process the query.

another issue we have actually observed is like a v pattern changes. so let's assume like you have your cluster set up, you size it perfectly for, let's say short running queries. but you also want to process some etl que at the same time, if the base capacity you have is perfectly sized for the short running dashboard inquiries but not for the tl ones, your etl workload might take too long. if your size is correctly for the etl workload, you're overpay in the moment you only have dashboard in queries running, right?

the current recommendation, the scenario is that you actually split the workloads out that you create one like red cluster or like a data warehouse just for the dashboard in queries. and then another one, let's say for the etl queries, you separate them manually and then you connect them through data sharing so that they work over the same data. unfortunately, this doesn't solve the full problem.

so first of all, it's a manual process, you need to constantly monitor what workloads belong to where and how to size it, right? that's like a problem. the second one is things might change. so let's assume that you perfectly sized it for the workloads. but now because normally increases the data size, your data size slowly grows. queries are slowly coming a little bit slower until they violate actually your ss like the service level objectors you have in mind, right?

so how do people deal with it today is like they actually continuously monitor what the data is doing if it's still functioning correctly. they apply a whole range of like optimizations like materials views sort keys other things, they create manual separation of workloads just to deal with that issue.

we want to make it easier. and that is where the next generation of a i optimization and scaling comes in and we want to make this part easier that it scales really on the real demand of your workload, not just based on the concurrency, right.

so it's a new intelligence scaling mechanism. so how do we do that? the first thing is really like this very large query which coming and we want to solve that use case, right. so if a large query comes in, we analyze it on the fly. and then based on the demand of that query, we actually allocate potentially more resources beyond the base capacity. but you selected, right?

in fact, i'm using just the query as an example because we do it more like over the current workload, which is coming in, it's not just like one query we are doing it for, it's more like the current situation of your workload and we analyze every single individual query, but then also consider it as like the current set, what's going on. so if you don't just make individual decisions, it's more like what's currently on on your cluster at the same time.

and so that's a new on demand scaling mechanism which constantly monitors what the real demand needs are from your work. so how do we do that? like on one hand, we improved our career prediction models. so now we can predict much more precisely how much memory a query needs what the demands are on io and for what the latency, the predicted latency will be if you start running it.

in addition to that, we build new like scaling predicting models, which actually tell us an estimate on how a query performs if you would scale it up on a different numbers of like nodes or machines. right. so if you increase the capacity, we get an estimate on how their career behaves on that one, right.

we make, we take this information into account and then make a decision based on that. unfortunately, ah like models take time. and our most advanced models actually might take longer than actually executing the query. particularly if you have a short running dashboard in query, doing this inference to figure out on how a query might scale might be longer than actually just running the query.

so we need to wait to deal with that. luckily, there's one thing which we observed on our feet, a lot of queries actually repeat. in fact, we found that over 80% of our queries have been seen before. so the best predictor, we can actually build us if you see a curve which have run before use the last one time we observed it is a pretty good predictor.

unfortunately, things might change in between, right? like maybe the data size increased, maybe there is a different predicate, which is they're not exactly the same query, but maybe the first was a template. so they are like certain variants.

so instead of just like doing a cash on the query text, we actually do the cash on the feature vector of a query. so a feature vector you can think of it's a embedding like a bunch of properties we have about the query, including the data properties. so it's a typical machine learning model which embeds the information which normally curry has after it was compiled and optimized, right?

if we find the cash value on the feature vector, we return it and it's super fast, it takes no time if you don't find the cash. and we actually go to a local small model which makes a quick prediction. and the main goal of that model is to figure out if a career is short or long running.

if it's short running, we actually fine because we probably just want to schedule it and don't spend much more time on it. if it's long running, then we go through another stage of predictions which actually give us this like scaling properties and other things which might take longer. but then we get all the information we need to make the right scaling decisions.

and so let's assume this old drugs now and we implemented it. it actually does. but the the thing is now how do we know on how aggressively we should scale? and for that, let's talk a little bit about like the scaling properties of typical queries.

and so for the moment, assume you have a query here, it takes currently eight minutes to run and it cost you $1 a minute. and we just assume this query occupies the entire cluster. normally, we would talk about a workload here

Just for simplicity, we use the single, right. If I can run this query on twice amount of resources, I would pay $2 a minute. But if the career behaves perfectly and runs twice as fast, right, then my total cost would be still $8. I pay $2 per minute, but it only runs for four minutes, right? So I get better performance and the cost stays the same.

We talk in that case about linear scaling. It's like the property we'd like to have in the right. In some cases even we got into a regime which is super linear scaling, which sometimes happens if you double the amount of resources, the query actually goes faster than two x. This particular happens if the query before was constrained, let's say by memory, like the entire intermediate working set didn't fit into my memory and you started spilling to disk like you had to write data out to this, bring it back in as you do the processing in the moment, you give it enough memory suddenly everything works faster. And overall it runs shorter than just like the two extra additional amount of resources you throw it and you even save money.

If you do that, then i think what we give is the next generation of a scaling is that if it's either linear or super linear, so it doesn't increase the cost for you. We always do it for you for free. We always do that scaling if you can detect, it doesn't increase the cost and it just gives you better performance, right? That's a new thing which wasn't like possible before.

However, there are also the cases where this is not true, right? So let's assume like we doubled the resources already. Now, the query runs in four minutes, it costs us $2 a minute. But now if you would double the capacity again, let's say to that we have to pay $4 a minute. Now the query runs actually for three minutes, not twice as fast, a little bit slower, right? So now our total cost increases from 8 to 12. But at the same time, we do get the speed up just not two x. So the question is, is this was while drawing or not, right?

And actually there's a theory behind that, which is called adults law, which defines that. Eventually every application, every program you have can be divided into a paralyzed part and a s part and you can only paralyze the paralyzed part and that you will get this diminishing return effect, right.

So eventually everything will actually flatten out and there's a theoretical limit on how much you can actually scale something. And so how do we deal with that. It's just like this is where the user comes in. We need your feedback because we cannot know without any additional information how much you prefer our performance over cost, right?

So this is why we're introducing a new slider. So there is no base r anymore. You don't pick a like a base cluster size. Instead you say where you want to be on the performance price spectrum. If you pick a point on the far left, you optimize for cost. What we pick automatically for you is the optimal data warehouse size where everything skills linear right or super linear and you get the best price performance possible for your workload. This is not necessarily the smallest data capacity available. It's the best price performance we can do for you, right? It might actually be a little bit louder larger because we want to avoid this billing things i talked about before about as more you move the slider to the right, the more you prefer performance over cost, right? And this is like a certain overhead you are willing to take, but you get better performance for it.

In addition to introducing the slider, we also introducing additional cost controls because we know you care about like for example, never the cost goes up around like a certain value. So you can also specify that. But the core idea is just like instead of thinking about the hard thing you need to allocate. Now you have to slide over. You say like we are on the cost performance back you would like to be and we do everything else automatically for you.

Moving on. We already talked about this like on demand scaling capability query comes in, we do all these predictions and we scale accordingly. Unfortunately, this is in some cases not enough. And this is particularly true for short running cur because there is a thing, it takes us a little bit of while to add additional capacity. If everything would be immediate on demand, scaling is the only thing you need because you can immediately, right? But unfortunately, there is physics but it takes simply some time to load data from this. It takes some time to allocate enough resources. And like in the end electrons only travel at like 1% of the speed of light, right. So there's something we need to work around that.

And this is why we're also introducing a forecasting mechanism, which looks into how your workload actually behaves over time. And then based on that makes also like allocation decisions particular for the short run queries because we don't want to have the short run queries impacted by bringing our capacity on demand when it's already too late and we would delay them, right. We want a little bit more practice about them, right?

The nice thing about these forecasting mechanism is that it also allows us to do other things because in the moment we have a forecaster, they can predict a little bit into the future what to expect next. We can actually make optimizations based on that. And a whole range of our autonomic efforts are in that space. Meaning for example, use this forecasting to decide what material use might be beneficial for you, which can again, lower cost if done, right, or what sort of case you can use to skip over larger amounts of data or a new thing. They actually introducing a complete new sort key orders which can help to even further improve the cost.

So this also launched, it's a new way to sort data. It's called multidimensional data layouts. So in contrast to like picking a normal sort key or hierarchical sort key, which normally the user has to do multidimensional data layouts, organize the data in a multidimensional space. You can think about it as they observe what predicates you have most common. And then based on these predicates group up the data in vase, put them into block on storage so that you can skip about over large amounts of data you don't need for a particular query. It's a new technique as far as i know it wasn't done ever before in that sense. And we see huge speed ups up to 74% runtime reduction compared to no sort key or 40% to the best manual pick one in some of our experiments.

So now putting it all together here, i have like a simple setup where i have like a base workload of short running dashboard inquiries and then a period of some long running que. So let's look first on how retro today before this like next scaling mechanisms, the a i based one was introduced how it would behave. So we have the short running running, we have a base capacity of 32 up use at some point in yellow. The large query start, they create this like queuing effect burst cluster was created for that because as things grew up, the concurrency level goes up, we create another burst cluster of the same size again. And then in this case, even a third one, right, and then eventually everything shuts down again. So this is like our current behavior. What you would see, it's just like it, it creates based on this like concurrency threshold, the different things.

So now with the new scaling mechanism, we actually are smarter more about it. So we have the same workload in the beginning. But then we detect the long running queries come in, they need much more capacity. And here we allocate a large larger capacity on demand to it. And we even find that this larger capacity of like in this case 256 r is even able to handle all the dashboard in queries as well. So we shut down the original smaller cluster completely. And then at some point when the large queries are done, we bring it over again and so very different scaling behavior.

As a result of that, the average latency over both short and long running ones is reduced by a factor of 10 x. In this experiment, the latency of long run increase actually improves even more. We also improve the latency of short run increase because they're no longer impacted by the long run increase. We have a slight cost increase here because the slider position wasn't set to like just cost savings. It was like more the balance one. So we favor performance over cost. But overall we give you a 10 x better price performance for this particular workload, right?

So with that, i will hand it over back to us who give you now a live demo and you can see it in action. Thank you, professor tim. Now that tim has taught us, you know, physics and you know m del's law and, and how the auto skilling works, let's see actually his principles in action here.

Ok. So uh the first demo is i'm going to show you how easy it is to create this new um serverless with the new a i driven uh scaling and optimization technique. I have recorded pre-recorded in interest of time and you know, because of, you know, not to rely on the wi fi. So, so let me do this and, and the video may have been fast forwarded. There is a preview. It's in preview, i would give it a name where you create your compute my a i driven work group. And um so you give it a name and then you see the price performance target. This is where we provide you. The control is in your hands. You decide if your workload is more sensitive, you know, um you, you choose uh optimize for cost, let's do next and then give it a name space. It's a storage basically where you store your data.

So i am giving it a name and selecting my im policy to the user and then say next and review as you see, i'm sensitive to cost. So i selected optimization for cost and i will ah you know, just submit it and the work group starts getting created. So it's creating the work group right now. And on the bottom also on the list, you can see that workgroup is getting created, let's give us a few seconds, it will get created and and you can review so you see that it does get created with optimized forecast.

So any workload that it will be running will be, you know, keeping cost in mind. You are welcome to change it even later on. Ok? So you decide their workload needs more performance, you can change the slider quickly and save it and all the optimization hence force that will occur will be for performance. Ok? So that was my first demo.

Ok. And you can see that, you know, it is not optimized for performance. All right. So within a few seconds, you can create your serverless end point and you can change your settings later on if you need to. Ok. So that was the creation part.

Now, let's talk about, you know, some real world scenarios. Here. I have three specific demos. The first one is a long running query, a data analyst or a data scientist has written some complex query that was never seen before. It's a long running query. I'll show you how this new a i scaling and optimization technique responds to it. How does it scales what capacity it uses and what's the cost for us? Ok.

And the second one is when you have high volume of injection coming in like your copy command, you have data size of various files. I'll show you a scenario where i have copied three times more data. And how does this server less with the new technology scales up? Ok.

And the next one is we do downscale also. Ok. Remember that. So in this scenario, the third scenario, i will show you how did we downscale basically automatically because you have decided that you are sensitive to cost here. So the warehouse skills automatically down.

So let's talk about the scenarios here. Ok? So basically i do have two workgroups that are available to me. The first one is the normal, you know, serverless workgroup with the base rp that you specify ah it's 32 base rp capacity that i have. And the second war group i have is with the new a i driven scaling and optimization technique. Ok. And i have this big giant query. And again, i have fast forwarded the video in interest of time. It's running against a sales catalog with over a billion rows. You can see that the query is complex and the data set is large. This query is coming in for the first time and my slider settings are optimized for performance. Ok

So what I do is I'm going to run this query against the regular, you know, with base RP of 32 serverless data warehouse instance, as well as I will run it against the uh new air driven technology uh serverless war group. And let's see the capacity, use the the performance of it. What's the elapse time for this query? And what's the cost for this query?

So it's running against the regular one right now and I also ran it against the next gen serverless work group. So let's see it's running right now. It returned the results more quicker on the next gen serverless work group.

So this is I'm not checking, you know, the time it has taken in the regular one and also the capacity and the charges that it has used. So it's a 32 compute with about 56,000 charge seconds that this query has used, let's do the same thing against the new serverless work group that we created. And you can see the duration is just 1 70 seconds for the same query with the new a i driven technique. It is just 1 70 seconds basically. And now see that the, the char seconds, the char seconds also reduced here. Basically.

I'm summarizing this in my excel spreadsheet here. The query run time with the regular base work group was 2008. The cost was $5.93. And with the new nextgen a i optimized instance, the timing got reduced by 17 to 1 70 seconds. And not only that, the cost also reduced here. So there is a big price performance benefit. And this is exactly what tim was explaining by assigning more resources you actually saved on the cost. So that's the benefit of using this new next gen a i optimized instance. Ok. So it solves our pain point of this long running workload that comes in. So that's my first demo.

Now, the second one, let's talk about the second one where you know, we are ingesting more data, the copy, the data files has more data. So you can just quickly see that i have a copy command. i'm running to run two copy commands. my files are on s3. 1 is with with 7 20 million rows and the other one is with 2.1 billion rows ok. And i'm going to run this against the regular uh serverless. today's serverless as well as with the uh new a i driven uh serverless workgroup.

So let's do that here. So you see, i'm doing a copy command. uh i do have, you know, my files in history with 7 20 million records. and the second one is with about 2.159 million records. um so let's run this against the regular work group and, and the next gen work group, both, you know, and see how it performs. we'll check the capacity, we will check the cost, we will check the elapsed time that it has used. ok? And we'll compare those again in the excel spreadsheet. All right.

So let's run it. So i'm running it against the regular work group here right now. ok? And i'm running it with the next gen also at this, you know, it's running. So now let's see, as you can see that it has loaded 7 20 million rows and it has the, the second query has the second copy command has loaded 21 and let's validate for the new work group. Both have been loaded successfully. ok.

Now let's check that elapsed time using the sis query history. ok. And you can see 1 74 seconds and the second one is 5 60 60 seconds in the regular war group. Now let's do the same in the new serverless work group. What happens there. ok.

Chart seconds. uh let, let's see the total here basically. um let, let's see the, yeah, let's run the query again here. ok. So the chart seconds, 12,704. Let's see the times. ok. Again that with the new serverless work group. ok.

So you can see the 1st 7 20 million rows got loaded into 1 79 seconds and the second one got loaded into 1 22 seconds. So you loaded three times more data, but it got loaded faster. And let's see the cost here. ok. The cost for first one was 1920 charge seconds and the second one was 3456. So it did cost you more basically to load the data faster.

My settings there was for optimized for performance, the work, the work group scale, the compute scale here and i will summarize it in, in for a quick understanding.

7 20 million rows took 2.9 seconds and costed, you know, 58 cents. Whereas in the new war group, it costed, you know, pretty same in the same ballpark. But look at the three times more data in the regular work group. It took nine minutes 33 seconds and costed $1.30 approximately. But look at these numbers in the new work group, new aid driven work group. It for the 2.15 billion rows. It just took two minutes, roughly two minutes approximately. And the cost was more, you get a big boost in the performance actually.

So if you are sensitive to loading workload, you want to load it as faster as possible. This is a great solution with optimized for performance, right? So that was my second scenario.

Now, let's talk about the third scenario. Again, this is a pre-recorded one. I had ran the workload on 11th of november. My goal there was to save on cost. That was my goal and we were seeing this workload for the first time. That's the input. I wanted that control. We had a budget, so i wanted to save on the car.

So what we will see is the workload ran for the first time. What was the capacity used? How much time it took and what was the cost related to it? And then the workload came in again, automatically. The next day we measure the impact of this new a i technology that we are talking about here. ok.

So let's do that. So you see on the 11th of november, i ran this workload. ok. And these are about roughly about, you know, a little bit over 1900 queries, 1977 queries took on an average 1.39 seconds. This was on 11th of november, i ran it actually. ok. And let's see what the capacity was used. ok.

So capacity used was 128 rp rp is the redshift processing units. One rp gives you 16 gb of memory. ok. Now let's see what happened on the charges on 11th of november. ok. So around 728576 charge seconds because we do charge based on, you know, per second basis. And let's do the same thing on t 12.

So the average time is 1.9 seconds for pretty similar, the same workload. But so the performance went down from 1.3 seconds to 1.9 seconds. ok? But let's review the capacity. Now, the capacity is 64 earlier. Remember it was 128. Now it is 64 the next day on november 12th. And let's see, the cost 59,000 earlier, it was around 76,000 to summarize it.

You know, on an average i ran, ran over 1900 queries. The elapsed time average elapse time was 1.39 seconds compute used was 128 costed me $75 approximately. But because of the a i scaling and optimizations, it downsize the compute automatically. You did not have to take any action. And my cost was $61.85 approximately. And that resulted in savings to you.

So that's the benefit of this new a i driven technology that you will get automatically. It's completely hands off. Just tell us what is your workload. What are your limits? Give us that input and we will automatically take care of it.

So I want to conclude this, what are the benefits of using, you know, um this new a i driven scaling and optimization technique up to 10 x price performance benefits you can get from this. These are tailored performance optimization specifically for your workload. It will eliminate the manual effort that you know, we all are spending time. And the fourth is it will avoid the performance cliffs and time outs that you may experience for your long running workloads.

So that's my conclusion here. Um I encourage everyone to try out red shift serverless dollar 300 free serverless credits is available to you. If you are a new user, i highly encourage to try out this new technology. You will be very pleased with it. Ah, a nice blog will be available to you that describes this feature in more details and with that, thank you for coming in and listening to us.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值