Amazon Q generative SQL in Amazon Redshift Query Editor

最新推荐文章于 2024-09-12 20:11:47 发布

李白的朋友王维

最新推荐文章于 2024-09-12 20:11:47 发布

阅读量107

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/135091092

版权

Hello, everyone. Great, great. How many of you guys are using AI? I see only few hands actually 100% of you guys are using AI. Yeah. So whether you are using a mobile phone or you are driving a Tesla or talking to Alexa, you know, or using Imagine.com, you are using AI.

So let me ask a different question. How many of you guys are building applications with AI? Yeah, I see, you know, few faces. Alright, my name is De Panda. I'm a senior manager of product management with Amazon Redshift and with me, I have Murally Na Na Swami and he is the principal ML scientist working in the Red team. And me and Murali have worked together to build many innovations in Redshift and we are going to go over, you know about this new future.

How Imagine Q generative sequel is going to make your life better, how it is going to make you more productive by authoring yours, your queries with natural language.

Ok. So there is a question always, you know, what is generative AI, you know, generative AI allows you to create content you know, ideas, you know, conversation, images, videos and music using a, you know, artificial language foundation model. And you probably have hard charge GPT you know, that is in a kind, a foundation model for generative and generative is powered by large language models, you know, that work on a vast amount of data um and help you create content, you know why customers want to use AI.

So when we talk to customers like you, they want to AI to improve their, you know, customer service or enhance the, you know, customer experiences, they want to improve the employee productivity, you know, they want to optimize their business process. And there are so many huge cases you will find use with your analytics platform or with your data warehouses that you can use AI to improve, you know how to gain insight on your data.

Amazon. You know, you know, I don't need to talk about this light but it has been leading innovations in, in machine learning and AI and it is in John's DNA. So I don't need to read this slide, you know, you guys know about all these things.

So when we talk to customers, you know that we have different personas that work with Red, you know, we have developers and the data engineers who want to build analytics applications and they want to use the IML with their application. We have, you know, data scientists and data analyst, you know, who want to improve their productivity or you want to get better insight from their data using machine learning model and, and they want to get insights faster.

Then we have administrators, you know, who want to make the management of their in a serverless or, or simpler by using machine learning. So if you look at, you know, I'm going to give a recap of different ML innovations that I've been doing in Red in the past, you know, several years.

So in 2019, we you know introduced a auto WM and auto skilling, which is driven by machine learning. Then we introduced in 20 Redshift ML and actually Murali and I worked together on that project, you know, that allows you to create machine learning model using SQL. So you do not have to be a machine learning specialist to create machine learning model with Red ML. You can just use a SQL statement. And I'm going to show some example, you know, how, how can you create machine learning model with, with Red ML? But again, in 2021 we introduced serverless that also leverages a lot of machine learning capability.

Uh last year, we introduced the auto MV and auto refresh, auto MVs are nothing. But when you, your customers are running repetitive queries, like 300 people are running the same queries joining like five different tables. You know what auto MV does is that it automatically created creates MVs or materialized views for you, you know, looking at your query patterns.

Then this year at ReInvent, we introduced two features. One is the next generation of AI scaling and optimizations that allows the serverless, you know, work group or serverless end point to dynamically size based on your workload and in giving you better performance and reduce cost. And then then the generative AI support that a couple of features that we announced. One is the Imagine Q for generative sequel. The other one is that providing you the ability to, you know, invoke JumpStart foundation models, learn models directly from a sequel.

So I'm going to go a little bit more details into that. So AI for administrators, so as you know, we have been doing a lot of autonomic, you know, in, in the in the data warehouse. So we have automatic, you know, sort key and distribution key. You do not have to worry about creating those. We have auto vacuum and you know, analyze you have, you know, ah auto or smart defaults, you know, then we we we introduce some feature called MDD, you know, automatic MVs.

So we have been doing a lot of automation and all these are driven by machine learning models internally. And then we have received advisor that look at your query patterns and then use up your clusters and recommend. So that should, you should you should pause your cluster or you should add a sort key or dis key, you know.

So all those things are driven by internally by machine learning model. Then we, as I outlined earlier that we introduced this feature uh this week earlier, this week, this is a preview and you can sign up for this preview yourself. Ah if you are using serverless, that builds more intelligence into the serverless infrastructure, red serverless, that allows you like if you remember if you have used serverless earlier, then you always have to specify a base unit or base RP.

So let's say you start with base RP 32 and then based on your load, it goes up and it may auto scale and but like let's say there is no user activity in your server lesson point and it pauses and then then you next time you run your queries again, it starts as 32. Ok?

So what these features allows you is to have that dynamic compute. That means it's real dynamic based on that based on your load, it automatically changes your base. You know, there is no need for you to worry about setting the right base RP. And it, it always, you know, optimizes uh based on your workload your your query pattern, you know, and then improves your query.

So there is no manual action required except you know, for the preview you, you have to sign up that I want to use, you know, this this feature and you know that there was few things we have done for the develop, you know, for the administrator for for developers.

Uh we AWS provides a full stack of services, whether it is SageMaker. SageMaker provides a complete, you know uh set of services for either building your uh no code ML model using canvas. You know, it it provides all those things. Ii i do not have time to cover all these things, but you know, it AWS provides a complete stack for building, you know, uh and operating ML models at scale.

And then it also provides all the ML infrastructures like we have different even chips, you know, AWS, influencia and titanium for specifically for, you know, uh machine learning purposes. And then then we introduce, you know, SageBedrock, which also, you know, is our a platform for, you know, building generative bi application.

So if you look at, you know, this light, what it shows is that AWS provides a complete, you know, foundation for data and, and AI. So if you look at there are different data sources, you might have, you know, whether it's coming from IO devices or apps or logs or the different third party. And then we have, you know, different streaming injections product like other K or MSK and, and also you can, you know, bring this data, let's say for your into your OTP databases.

And then these data, you can either zero directly now zero TL into into a Red, you know, and then we already have support for streaming injection that bring your data and uh and then you can use it right when it is being created at the source. And then then we have on on the right. If you have see there is different product to consume that data. Whether people are consuming using tools like QuickSight or if you want to use machine build your machine learning models, then use different platforms that we have.

So coming to the developers, you know what we heard from many customers is that you know, they have different kind of data in their, you know, rece tables and they want to use, you know, models directly on those data. Like for example, you know, there are let's say you have a table which has, you know, customer feedback capture and it is in Red, you know, Red table and then what customers are doing is that they want to cal you know, find out how is the customer sentiment from the last event or like uh uh or uh or whatever whenever they're getting this, you know, customer survey comments.

So what they were doing is they are exporting the data and running in an external machine learning platform or or a product like Comprehend to find the sentiments. So what we are allowing customers to do now is that you can use your Red ML to create a machine learning model, pointing to a JumpStart LM model, you know, and this is a SQL statement that you use like create my model, you know from M model and, and it basically returns a support or a data type and then you know that you can, your users will be run those SQL directly in Red.

That means basically when you run the prediction, like you know that function, my LM model is the name of the function that actually is available to your users as a user defined function. And and then your users can run that directly in their dashboard or any other query. So you'll be able to easily find out, for example, customer customer certainty by directly calling a foundation model from sequel.

So when we talk to, you know, customers, you know about the challenges they their data analyst face is that first time you know what we hear is that in some of the organizations, there is a lot of chn of the data analyst that means data analysts come work for three months or four months and then leave. And there is a new person come for example. And whenever a new person comes, you know what happens is that in many of our organizations, the schemas are complex, different kind of schema, maybe it's completely different kind of business or industry type.

So it takes a lot of time to on board. Although they know they are expert in sequel. You know, it takes time to understand the complex and, and then, and then what i hear from customers that, you know, it's a little bit boring repetitive sequel, the same sequel, they need to run, it becomes boring and, and then, you know, when they run into some issues, you know, it becomes problematic for debugging and, and then many of these data analysts they want to use, you know, machine learning for get better insight from their data.

They are not able to do it because they don't know python or r or any other machine learning technology. So uh couple of years back or almost going to be three years now we introduce recip ml that allows customers to, you know, train their model using sql.

So assume that you don't know anything about machine learning, you have no clue but you want to use this for prediction, whether this customer is going to churn or not or i want to, you know, calculate, you know, how much, how much my revenue is going to be up to three months or i want to find out, you know, how much, you know, whether my lifetime customer, lifetime value for this customer is going to be.

So this kind of problems, you know, many people use excel and then do this kind of analysis. It takes a long time. What ah recipt ml does is that whenever you do it, you can just use a create model command, specify your training data that you want to train on. So that's, you know, and, and then you tell which column you want to predict. Ok, that's the target column. And then, and just run that model that the model actually runs multiple experiments.

You know, it finds the right problem type, whether it's a regression or you know, or classification problem. And, and once it finds the best model that suits you, your use case, then the model gets installed as a user defined functions. Then after that, your users can you know, run that and get you know predictions quickly.

So we have, you know, several customers who use like thousands of customers who use RedCML making 80 billion plus predictions per week. And then, then we have two customers like JobCase and MDRx who are using in a very interesting ways like JobCase does more than 1 billion predictions every day for making job recommendations to the job seekers in their site. And MDRx uses ML for, you know, medical therapy conditions predictions.

This is one thing that we hear from customer is that, you know, we need our users need a good tool for data analysis. So we have a tool called Amazon Query Editor, you know, which is a web based tool that users can log in directly using their single sign on and run their queries using SQL, you know, and do their data analysis. It also provides ability to create your notebook using your SQL and and mark downs do the visualizations share with other users.

It does this version control automatically. So you know, i'll just do a quick demo of the query editor. So if you see here, you know, i am connecting to a Red cluster. So i have multiple clusters, you can enable disable, you know, based on your own policies. You can just limit the different clusters. Once i connect to your clusters, i can actually automatically browse my glue catalog like this AWS data catalog or my different databases.

So here i have multiple databases and some of them are are you know, data sharing databases and and then i can, you know, expand and and and look at my catalog, you know, look at you know different schemas and you know where i have tables, you know, user defined functions and store procedures, views all the different objects. And then i can select a specific table and, and run queries or like it provides ability for, you know, syntax error checking.

It provides ability to auto completion that i'm gonna so quickly. But a more interesting thing is that many customers, you know, they they want to provide their data analyst, you know, for in their own own personal space to create tables and load data. It actually provides you ability to load data and create tables using visual directly.

So if you see here, you know, i have, you know, create different, you can create the visual wizard like databases of schema or create load data. So you can load data either from your desktop or from your s3 bucket. So i'm going to quickly demo how to load data from uh from your, you know, from your desktop and and use the auto influencing capability to, you know, create the scheme table as well as you know, load data. Here is the remaining transcript formatted:

So if i will select one uh csv file the account history, you know csv file, you know, then i am going to select, you know, my uh which database i want to load into. So i i selected, yeah, these are the different parameters that my d limiter is, is a comma, you know, and, and then i select i want to create a new table um for this one.

So i want to create a new table i specified in this database

I'm going to specify a schema. Um and I want to use the demo schema and give a table name. And if you see this basically automatically read the, you know the table header and, and selected the different data type. If you want to change a specific data type, you can change it and there and then you know, go from there.

So this is a precursor to the query editor because the Q capability is built inside the uh the query editor. So create the table and then you can just load the data after that. So this is how easy is it for you to, you know, create a table, load data and then run queries.

The other interesting the capability that we have is something called SQL notebook. So what I'm going to show you a aaa SQL notebook, you know that i have already created, but you can create your own notebook organized in different folders like google folders. You have, you know, an organized this folder. You know, let me open my uh you know, a sales notebook that has right now, you know, a couple of cells.

Aaa notebook can have multiple, you know cells. So one is the annotations of the mark down cells. So you can use the standard mark downs for creating a header if you want to or or a description for yourself. So this is a header for my notebook and then then you have multiple, you know sequel statement or multiple markdown.

So i have if you see your loaded because i already had created a, you know, a big chart for this specific sequel. What i'm gonna do is that i'm going to run, you can run all the sequel together or run only one. So i'll just run this sequel and i'll show you how easy it is to, you know, create a chart from here.

So if you see this data, so i can uh i can actually export this data in csv or jason, if you want to share with someone or click on chart. So and then we have we provide different kind of visualizations. Um and you can provide actually right, give your table order, column, order all this kind of stuff, you know before.

So now you know, i have my notebook, i want to share with another user. I can actually easily share this by clicking on share and all this, you know, uh are automatically versioned. So, but if you want to create a specific version, you can actually save a version of the notebook.

So with that, you know, what i'm gonna do is that go to the next thing is what is generative sequel? Ok. Uh generative sequel is as i've been telling you, like it's kind of improves your productivity by allowing, you know, you guys to express your queries or offer your queries in natural language or plain english.

So you and, and these are actually personalized, you know, that means your personalized is not like whatever any sequel that it generates, it's personalized to your schema, to your context. It is based on your query history, it is conversational. That means it is not just one like random it generates. And then you, you, you, you, you can work on to improve your query.

For example, you ask a question like what was my top, you know, venue and it is going to give you a sequel and you know, maybe i think it, it, when it's generated a sequel, it considered, you know, total revenue for, you know, identifying the top venue, you can ask actually, i meant, you know, buy ticket sales. So that is going to limit to, you know, ticket sale change your query.

So it is, and, and it is accurate because it, it uses a lot of your data, your context. That means what database you're using in context. If you have column definition or any definition or table definition, you use that. If you are using, you know, let's say you have created your, you know, foreign key relationship. Although red doesn't enforce the foreign key relationship, we actually use that to create the joint conditions, you know, it it is.

And then when it generates the sequel, as when murali is going to demonstrate, it is not just for one table, it can join 34 tables, you know, to generate your sequel and it is secure. That means basically we do not bypass the security for generating the sequel or running the sequel.

So when you get the sql and you try to run the sequel, all the red safe security rls, cls, data masking everything is going to be enforced. The other thing is that we never send this data. what about your table definition? your anything to the foundation model for training it is you know used, you know, is just for building inferences using rag.

So i think uh murali is going to go more details how these things work. So if you, there are multiple benefits for the Q generative sequel. First thing is that it increases productivity. That means if you are lazy, you know, i'm always lazy. Um so if you want to write a sequel, you know it, uh you can ask that in english and it will generate because if they say uh you know, you have to join three tables, you know, you have to put like fast table s all the different tables. You know, it, you know, if you are not writing every day like me, um then you know, uh it, it, it actually makes it much easier.

So it is going to make, you know, improve your productivity. Second thing is that it helps you get started fast. That means basically there is a data analyst is coming in, he doesn't understand the schema that well. So what we do is that because it, it can actually leverage the schema as well as the previous queries, the query history. It, it can help you get started fast.

Then again it is provides, you know, secure access to your data. There is no bypass of any secure and, and then as you, you know, run more and more sequel, you know, it improves the accuracy based on your data and on the recommendations. And when you get a recommendation, you can actually do not have to accept the recommendation. You can say give me more, you know, and then let's say you reject the first recommendation next time whenever it generates is not going to give that recommendation because you rejected it.

So it is, you know, you are giving direct feedback to generative sql. So now i invite, you know, murali to come and go over more details how these things work and, and do demo of, you know, the generative s equal capability. Thanks woman.

Yep. Yeah, thanks. Yeah. Um so i definitely cannot match the energy. So i'm gonna try. So i'm going to try to at least go fast, right? So basically, as they pointed out, the goal of whatever we're doing here for Q is to give you personalized sequel recommendations that are context aware specific to your database, but in being safe and secure at the same time.

And what we mean particularly by secure is that none of your data content tables, et cetera are used to train any foundational model. So you know, all the data governance, everything you don't have to worry about that if you're using Q. But at the same time, you actually want to make these things useful.

So what i'll try to talk about is how we do it right? While being secure. Ok. So to recap what they was talking about, right? The kind of the big problem we heard from customers while we were building this is that once the tables in their database is more than say 10, a few tens, not to mention hundreds. It's very hard for anyone to remember. What was the join key exactly. How should i join these two tables? You know, what's the filter conditions? What makes sense? You know, this is really, really hard and this gets even harder if you get someone new and just plonk them in front of red shift and say, tell me the sales for the last month, right? And this is a huge time sink for our customers. And we want to make this faster.

The second thing we want to allow customers to do is to essentially create more and more complex sequel, but faster. So as they says, he doesn't write much sequel, neither do i anymore. So, you know, i really struggle to remember how to build these large compositional queries, right? Where you take a query, put it inside another one and then try to get a bigger and bigger answer. So the conversational capabilities of Q which i'll demonstrate, make these things much easier. Ok?

Um so these, so we keep talking about this thing called personalized sequel, right? So what exactly do i mean, i'll show you like three or four examples before getting into the actual demos.

Um so the first kind of example, these are some queries from a standard sequel thing called spider and another one called bird, you can look them up, but these are just standard sequel question. So now suppose you had a question like this, right? Count the number of schools in alameda county and you just went to like an lm and gave it this exact question, you would get what i would call the natural output, which is a query, which kind of makes sense and you know, has all the stuff in it. But if you actually go back to that benchmark of those tests and look at the tables in there, this query does not execute because it does not know the actual schema, the table names and the column names that the that this database has.

So if you go and do this in queue, once you have loaded up this data set, you'll get this actually reasonably much more, a bit more complicated query. But the interesting thing is that it knew that it was numb tst tr right? Instead of numb underscore test takers, it may have been better, even it knew which tables to go look into and so on. And this is what we mean by personalized to your database tables and schema. So q does this right? We can go to another example.

So let's say this is again another question from a different database in the same benchmark. And you go and ask this question and i've given it the scheme now, right? And it does something and this is a very good query, it's fairly complicated and it makes almost perfect sense. But you go run this again and the answer turns out to be zero right now. Why is this? Well, if you go and look at the historical queries that the customer is used, you will figure out that actually country, right? They are not actually giving the name of the country but the code which is in this case, c the way you figure this out, if you, if you don't look at data, which we do not is to look at at least some of the past queries try to figure out the pattern that we are using country codes and try. And this is what we mean by personalized your data without training on your data without looking at your right.

Um the third kind of thing, another example is, you know, you have this another question and you get some natural input now this could work. But when you do try to run it, you get this error right? And this error is basically for sequel experts here is, you know, there is an ambiguous term in the select. So what you can do is you can actually just give this error back to q if you want. And we do this automatically sometimes as well. And once you do this, you get an answer that again works. So as a result of this, both you talking back to q and us automatically doing this in the background, the results get personalized to your data warehouse, which is red shift. So now we start understanding red shift syntax and so on. Ok?

So now we get to the kind of recorded demos version of this. So these things, this was launched yesterday, so you can go and actually try it out. Um but i have learned to my, you know, regret never do a live demo. So i'm not going to do one. So this is really, really easy to enable. But this enablement has to be done by the administrator. And once it's done by the administrator, it's available to all users in the query editor which they showed.

So it's very easy. So how do you do it? You see the settings or at the bottom left, the gear icon, right? And you just say enable generative sequel and save it. Once you have done that the purple button is now available and you have this chat window on the side where you can start asking questions.

The other interesting thing about query editor is that we also have this ability to load up a few test databases. So i'm going to load one of them called ticket

Okay, and I'm going to run the future examples on that just so that it's something you could also test out later on. Okay? Um, so once you do load this thing, you can start trying out this personalized sequel, but just to show you again what I mean by personalized, I'm actually going to run the first query and give you an example that doesn't work. Okay?

So I suppose I have this whole thing set up, you know, I have my generative sequel thing on the side and we have some quick examples. But what I do is I go and connect to a different database called dev. Now, this database does not have any of these tables, does not have any of these things. So when you go and try to actually create a query here, right? You're gonna get one that does it look again, it's a reasonable query, you can insert it into your notebook, you can execute it, but you know, it's not gonna work, right? Because these are not tables that exist in dev, we put them somewhere else, right?

Um I did want to show this because this is like a very common mistake because we don't look across your tables. Again, we keep a very careful eye on all the permission boundaries. So if you are connected to a specific database, your queries only work on that database, right?

So what did we actually want? So we actually wanted this query on the left. So now I go back, I check that I'm connected to the right cluster and the right database, right? I see that I have ticket there with all the right tables. And now if I go and ask um you know, any one of these are questions we have selected that work on ticket, but you you would try different ones on your database, of course.

Um and you get a query. Now, the nice thing about the way it's integrated is there's a nice button there, you know, add to notebook, just put it into the notebook. Um, then you run the query and this time, you know, you'll get an answer that's reasonable. And as they showed, you can always chart these answers, you can do bar charts, whatever you want and get your stuff good.

So the next thing that we said about q is that it's conversational and it's the whole thing is personalized to your conversation. So what does that mean? Right. So we had this at the end of the previous question, we had a query. Um and then now we can now try to, you know, dive deeper by just asking, right? So now you can ask you which are the top three instead. Now note that you don't have to repeat your previous question. You don't have to repeat the previous query. All this is in the context of the right and our few algorithms.

So now we get a query that gave you the three, but you can go further right now. This is where the compositional stuff comes in. Now, you want to ask a query that uses this bigger query in the smaller query inside it. And to be honest, this query is beyond my sequel, understanding the one that's going to come out. Now right.

Um, it is a fairly large query and what it's done is, you know, it's put the select inside and then, you know, it's done all the right stuff. This would take me 15 minutes with a lot of searching online and even then i may not have got this threat. So this is the kind of stuff that q is supposed to help speed up. Right. And you get it and unsurprisingly, most of the popular venues are actually in new york, right? And what's interesting is the query has all the table names, the schema names everything in there, ok?

Um and the other thing that conversational things allows it allows you to do is personalize it to your data domain. So if there is some business knowledge that only you have, right? When q gives you an answer that you don't like, right? You go and ask it for more details or make or fix it or so on.

So, you know, for example, you can go, what's the total sales in dollars or whatever it is, right? So far, you have not assumed that the customer didn't even have to know the scheme, everything, everything has worked so far just in plain english. But suppose you get a query and you don't like something, right? You want some, you don't think that price paid is actually in dollars, you know this, but the the queue does not, right?

You can just say and the interesting thing here is this works with spelling errors, it works with lots of mistakes. But if you say, ok, this is not the one, it will give you a different query by trying to find quantity sold times, you know, the price of the ticket, which is also a reasonable approximation for dollars. Again, as they said, this is joining across many, many tables, you know, it's finding the right conditions and in the end, it's still giving you a query that kind of works.

Um so this again would have taken a person a reasonable amount of time. Um now, one thing i do want to say, which is going to be important later is so i'm logged in as a user one right now, right? So later, i'll show you how by sharing the query history, you don't have to tell this cue, the same thing again, when you go and log in as user two.

So now user two is like a new user and so on. So um another way that q is kind of personalized to your database is that suppose you kind of write a query or there is actually a query from a slightly different dialect. So this would may have worked on a non red shift cluster, for example, but suppose you have this query, right? And you know, this is what you want to do. So you want to add row numbers based on the quantity sold descending. So you start writing your query and so on. But you get some error, to be honest, this error is reasonably hard to understand. Like i have no clue why this is the error because that's not actually the mistake.

So what you can do is you can just copy this question down and ask you how do i fix this query? Right? So you will go away and try to give you a query that would actually work on your schema again, not it took sales and put ticket dot sales because you need to be connected to the right schema for these kind of queries to execute.

So it does all this and then since it's in my demo, it's going to work. So you get, you know, row numbers added appropriately. So this is the kind of stuff that q helps increase productivity for our customers.

The other thing in addition to security of your data and personalization is we also want to make sure everything here is safe. So what this means is basically let's say, suppose you go in and ask, you know, unreasonable question, let's say, you know, delete table sales, right? So q is still going to give you an answer, right? Because it's an easy query to be honest, right? But what we do make sure is that we always warn saying that somewhere in this query, you are modifying some tables or something in your database and make sure that you want to do this.

Now, since my demo is going to continue, i'm not going to delete the sales table. But um the other thing i did want to point out is so we also always, always make sure that we respect all permissions of the user logged into the query editor, right? So what that means is let me take a simpler example, which is not going to delete anything.

So here, now we are logged in as user too, right? So this is a user, not the administrator and and ask he wants to create a query to select five random rows and insert them back, right? So this is a reasonable thing to me, maybe do. But again, we do raise the warning first of all, but maybe the user wants to go ahead anyway, right? And then they try to run the query. Okay?

However, if you remember in the beginning of the thing, the administrator was the one who created those tables, the ticket schema. So user two does not have these permissions. So red shift will immediately prevent it. So none of the protections of red shift go away and everything is as secure and safe as it was before.

Um so now i'll go into a couple of tips on how to make vega or sorry. This thing more and more useful to you as you deploy it across the organization. Okay? The first thing that we allow you to do is pick, share the queries to a new user. So suppose a new user comes into your company, they obviously do not have any query history. So the so you cannot figure out, you know, which are the right tables, how to join them all the time. It's still quite powerful, but it's not as good.

So we have this command where you can just which is a standard red shift command. It has it's in the documentation and so on which allows you basically now allows user to, to, to see the query history of other users. And as a result, you can now use it. So again, q never breaks any permission boundary of red shift. Okay?

So once you do this, um now i log back in as user two and again, query editor, oops. So query editor makes this quite easy. You can just, you know, change the database, user name and password and now i will be logged in as user too.

Um so i do want to say this, none of this demo so far has been sped up and neither has this. So here i know i have a minute extra but you know, so now the user user two can go ask now what's going to be a fairly complicated question, right?

Um now what's interesting is q will remember that the last time you told it not to use price paid for user one because it has access to the user queries. And now it's going to give you a query that does not use price paid, but actually multiplies the numbers again right now, sharing query history is a double edged sword.

So if you share the query history of all the wrong queries, you know, q is going to get messed up. So this one needs to be kind of done carefully and we're going to provide more options for you to selectively add queries that have been vetted in certain ways to the history and improve performance.

Again, this query worked perfectly well. It's a fairly complicated query. It knew from the query history that states were tn not tennessee the whole name. And again, a fairly complicated. Okay.

The other kind of tip is that, you know, none of the stuff is perfect. You do get queries that don't work. So here i kind of made some changes so that the first query doesn't work, but this query usually works. So we take a query and then let's say we get an answer, right.

Um again, you know, it's a, it's an almost perfect query in this case, but you know, we add it to the notebook, we run it and then, you know, we get an error. Now, maybe i don't want to go search for how to debug this error or whatever it is. It looks fairly, actually, it's not a very complicated error, but you paste it back and just, you know, prompt you with it. And now it gives you a new query and this query tries to fix that error, right? And you know, you have the whole history of the mark down saying how did you end up with this query and so on?

So you execute and then now you get the right answer that you should have got in the first place. Okay? So just to kind of wrap up some of the tips and tricks is whenever you can be as specific as you can. So if you do know the table name and column names and you know, they are badly named, use the bad names. It's fine, right? That's better than trying to get you to guess, right?

It's always better if you already know which part of the database, which schema you're actually interested in, just add it to the search path. This is a red shift sequel command. So this allows the queries to work even if the schema names are not up. And the most important thing is, you know, be ready to iterate and this just makes you much, much faster.

So ask follow up questions and build up queries slowly one by one, ask errors, ask back errors and why they didn't work and then things get better, right?

Um so for the last, last bit, I'll hand it back to de to wrap up and then we'll take questions. Thanks m le did you guys like the demo? So you guys can try this yourself this is preview and there is zero cost during the preview. So you do not have to incur any cost. You can run as much as you can with your own data. And that's what we want to try this out with your own data, own schemas. And that will help us and then give us feedback if it is not working, that will help us improve.

It is available for preview in two regions. Us is one and, and us west two that is, you know, the oregon region and, and i uh you know, region. So, so we have uh two regions and then, then if you haven't tried, you know, serverless, you can actually try serverless, you know, for uh you know, we give you $300 credit for trying out serverless.

So try it yourself. You know, if you want to try this together, you can try this together serve the q generative sequel is available for both your provision clusters and serverless. So there is you just have to use query editor and uh thanks everyone for coming and then, then we have more, you know, resources available, you know, as you see, you know, there are books available, there are red shift, you know, definitive guide. And, and then i want to also add a sales block for my book. They have less machine learning with red shift ml.

Uh so anyway, uh thanks everyone.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Amazon Q generative SQL in Amazon Redshift Query Editor

Right?
复制链接

扫一扫