Powering self-service & near real-time analytics with Amazon Redshift

最新推荐文章于 2024-08-21 13:37:17 发布

litaibai-04

最新推荐文章于 2024-08-21 13:37:17 发布

阅读量131

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/littlechenlin/article/details/134800578

版权

And how's everybody doing? Mhm right. Hey, you're in analytics session. A NT 211. Today, we are going to talk about self-service analytics with Amazon Redshift. Uh my name is Naresh Chana. I'm the engineering director for Redshift and excited to welcome Core with me, Kate Kim, who's a senior director at Fan. Uh she's going to talk about some of the challenges that Fan had and how they resolve it using Red shift.

Also, co presenting is Bebu Panda who is a senior manager today. He's going to be wearing the analyst hat and has a couple of exciting demos uh to show you.

So here's a rough flow for today's session:

We are going to talk about why self-service analytics uh matters.
And then we're going to talk about some of the requirements around uh self service analytics and what we at Redshift have been doing uh across multiple dimensions, compute data and consumption.
Uh at that point, I'm going to segue to Kate who's going to walk you through Van Del's journey uh with Redshift and the, the challenges that they were trying to address.
And finally, we are going to go to debut who's going to demo the consumption layer and on how ML and analytics come together in a powerful way.

Most of us have had this experience. You walk into a grocery store, you pick what you need and you pay for what you need. And at that point, you sort of don't worry about logistics. How did this gallon of milk get here? What are the supply chain there? Is, there is a machinery in the background that worries about all those details.

We want to do the exact same thing with data where your business analysts, your data users, they focus on their outcomes, their needs business outcomes and they don't have to worry about infrastructure. When do I scale? When do I spin up compute? Uh is my is my access secure is my data encrypted and all of that we want to sort of uh not have our customers worry about and and this is the kind of philosophy we have taken here.

So there are different kind of personas that deal with uh analytics. Let's look at a few of them:

First we have data engineers. Their their job is to work on preparing cleansing, data, making it consumable and uh they serve a large community of users. This include uh data scientists, your application developers and your your analysts ad hoc queries.
And and their job is to make sure all these data is ready available for consumption by business leaders who can then look at these insights and then predict outcomes, influence outcomes for the business at AWS.

We have seen a significant shift here. I've, I've been in databases for over two decades. And when I started, when I started out, it used to be weekly ingestion into your analytics system, then it evolved into daily, it evolved into, you know, multiple times or a few times a day. And we are now at, you know, minutes and then seconds. So near real time, analytics influences the recency of the data. And especially you see this more and more with interactive use cases, gen a I applications.

So next, let's look at uh self service analytics through multiple lenses. First, we will start with self service for compute layer, then for the data layer and finally debut then is going to close with self service for the consumption layer.

Let's start with compute our users. What they expect from the computer layer is the ability to get high performance, the ability to get started within seconds so that you can start querying the data uh gone are the days where you have to talk to a few teams to set up access to your data and it takes weeks and so on. So being able to quickly spin up some compute with high performance, being able to query that data and and not have to worry about migration. Migration is a much headed word, right? It's it's cumbersome. So how can i take my, for example, Tableau application pointed to my data source in Redshift, for example, and, and get started and, and then important from uh any for any enterprises is to stay within budget and uh not have cost surprises that always makes the uh finance teams quite unhappy.

So 18 months ago, uh uh we, we introduced Redshift serverless and it is one of our fastest growing uh service in Redshift. And the reason for that is it, it's very simple to use. It addresses each of the four requirements that our customers have been asking for. And again, just like my grocery store example, there is a lot of things that is scaling that is happening automatically behind the scenes. There is compute that is provision, there are snapshots being taken data security that is, you know, we consider that as a day zero requirement zero at AWS. Data is encrypted at rest and in motion all of that is happening. It's just happening in the, in the background and managed by AWS.

If you haven't tried serverless Redshift highly encourage you, it's it's super easy to get started. Maybe a quick show of hands. How many people have tried serverless Redshift? Ok, a few of you. Ok. We, we do have a very good way to like there is a $300 credit so you can, you know, quickly get started and kick the tires. We love to hear your feedback.

Next. Let's talk about the cell service for uh data or injection layer. So here on the left hand side, what you see is a bunch of data sources. And on the right hand side, you see different consumption patterns. I'm going to spend a few minutes breaking this s uh slide and talking about uh the some guiding principles that we use.

So one is at AWS. Our goal is when it comes to analytics, avoid data movement as much as possible. And you will hear me talk about this. So with things like data sharing, that's the, those are the principles or federation, for example, um the there are times when you actually need to move the data, especially when data moves, say from uh transactional system like Aurora or a streaming system like in into your analytics system today.

What i expect a lot of you would have is pipelines and teams that maintain these data movement pipelines. And now there's a problem with these pipelines, these pipelines when they work. That's great. But unfortunately, they have a tendency to break late Friday evenings when everybody's gone and and now becomes a problem. When did it fail? Where do you resume it from? What if our customers did not have to worry about that? What if we took on that responsibility?

So using that, so what we have done is we have simplified these data movement pipelines and i'm going to introduce a few different examples. First up is Auto Copy from S3. Uh this is a feature that is currently in preview. It's as simple as you point Redshift to S3 bucket. Redshift is monitoring that bucket and as data lines it efficiently gets ingested. So you don't have to now maintain either stored procedures or pipelines that are copying this data every 10 minutes hours or so on.

Another example is um so we talked about data recency is becoming more and more important. So with Redshift streaming ingestion, you can Redshift is able to directly read data from K and K FCA sources. And now think of this iot data log, data fraud analytics. This is where you don't want to know what happened two minutes ago. You want to know what happened right now. And, and with, with Redshift streaming ingestion, you're able to get very high throughput and low latency. And we have customers who have, who are reporting latencies of under 10 seconds.

Uh from a query perspective, what, what this and the way this is achieved is without Redshift streaming injection, what would happen is your data would be staged on to S3 from your streaming source, batched and then the batch ingested. So we have simplified that.

Uh then you, you have probably heard us talk about zero ATL. This is a theme and uh where we take data from different source systems. And one example is Aurora MySQL. This is something we launched as GA generally available two weeks ago. And uh and uh popular way to actually get data transactional data into Redshift again, low latency, high throughput and get rid of this data moment pipeline.

You're probably starting to see a pattern here. Uh this week you are going to hear more announcements around the theme of other zero ETL uh launches. We are going to do so. Please stay tuned for those.

So now let's switch to the right hand side where we have uh the different consumption layer. So here your your users, your different roles, admins, developers are accessing this data. It's super important that the data access follow rules of governance. Security has to be simple by design. So you can audit it. You can reason about it using things like role level security data masking, you're able to make sure cause your users only access data that they can or they are allowed to in the middle of this picture.

What you see is a compute layer and this is actually a multi cluster compute. And what what you're seeing here is a mix of serverless and provision. And there is there is an important thing here, there is one copy of the data. In this case, it's either in S3 as your open file format like Iceberg or it is Redshift manage storage.

These different compute layers are accessing transaction consistent data. Uh and S3 again, a single copy. The best part is these clusters could be either in the same account or on different accounts or in different regions. Why different regions might be interesting. We have some very large enterprises where data in EU for example, has to be has to stay in EU. So this is where you can actually have a computer cluster in in Germany and another in North America and being able to query that again, one copy of the data very powerful.

So we talked a lot about simplifying analytics and access to data. It's important to have a strong solid foundation for self service analytics. That way you may solve the needs for today. It's important that you are able to address the needs that are like your users are going to keep growing. You will have more teams, more data, that's a consistent pattern. We see there are two popular architectures that we see with multi cluster.

One is uh what we call hub and spoke, there is a central ETL cluster that's typically tends to be provision like always on where you're writing your streaming data or transactional data. And and then you have a bunch of spokes or consumers that are accessing that data again, not no data copies.

Uh in in this picture here, you see uh dashboarding cluster science cluster, each of those teams is paying for the computer they use and these tend to be either serverless or provisioned. So for a workload that is uh say spiky sporadic in nature, you just want to use serverless. So your dashboard spins up at 9am and maybe by 11am the activity starts winding down.

So you don't have to have to pay for compute. At that point, I would say about seven out of 10 customers that I speak with, use some flavor of hub and spoke. Uh there is another uh architecture which is used by more advanced customers. And this is uh data mesh and and think of large enterprises where different teams own different uh parts of the data.

For example, your finance team may have finance data which they want to then share with marketing to do some analysis. Now, marketing may build some models and may want to then evaluate the financial impact. In which case, they may share back those insights with uh with the finance teams.

So in this case, your data, consumers can also be data producers for other teams and use cases. Uh one example, one great example of this data mesh architecture is the work that the fan team has done. And I would like to welcome Kate to come share you more details about it. Thank you, Naish.

Hello everyone. Um i, my name is Kay Kim. I'm a senior director at Fanduel. I'm with the uh data engineering team at Fanduel. Have you, how many of you uh heard of uh Fanduel, please? Well, quite a lot of a sports fan. Good.

So for those of you who doesn't uh who are not really so familiar with uh Fanduel. Uh Fanduel is uh America's leading sports book and we're the premier mobile sports betting operator. And our company's mission is to make sports uh even more exciting. Uh so what does vu what does data mean for Fanu?

We're like data driven organization, like, you know, many of uh your company, uh we process like internal, external data into a centralized data warehouse system and every single business units uh leverage data and they uh they have a full capability of self data analytics.

And today, what i wanna talk about is the uh risk and trading team that's in the, in the center. The risky trading team is the uh uh data quants. Uh they sharpen customer betting behaviors and makeup predictions and they generate a key uh matrixes for eventual business, uh something like, you know, estimated revenue and risk of profiling and uh you know, customer segmentation uh and et cetera.

And in Fanduel, we jokingly talk about, we only have a two seasons, one is uh nfl season and the other one is non nfl season preparing for football season. So in between last year's kickoff and this year's uh super bowl like, you know, data volume and the, the the jobs that we were executing, it grew five times bigger.

And we started seeing some early symptoms of uh some negative impact because we were having a just a single data warehouse architecture. So as you can see here that the quarry efficiency was going below 50%. So what that means is that when a quarry is submitted, it is actually using 48% of its time to execute and return the result. So 52% of the time it is doing nothing but waiting in the queue.

So we started seeing this like early symptoms of negative signs, you know, the business was growing bigger, which was good, but we could not scale up to the speed of the business growth. So going back to the uh risky and trading teams uh as uh our critical stakeholders as far as the uh self analytics goes.

So we started seeing the challenges of their, you know, customer insights and personalization and jobs and, and because of their nature of like, you know, going through very granular uh transaction records easily, they would need like, you know, billions of records to process. We started seeing that the job was taking way too long, seven hours, that's like a quarter uh of a day.

So seven hours then while that uh their job was running seven hours, that means all the key metrics that we were expecting for the business was all blocked. So if that translates into a $50,000 business value lost uh a day. And then because it was a single cluster arch, while the job, while the uh resource intensive job is running, no other qualities could go through until it is done.

So that means like hundreds of like analyst jobs that were supposed to deliver on time, it was not getting done at all. So as you can see that the customer satisfaction level uh was going down pretty bad.

So the data engineering team we were given with a mission to solve this problem immediately. So first and foremost, uh we had to recover the run time, but we could not in, we could not increase like more nodes. So we had to maintain uh more at the same or similar tco uh to solve this problem.

So this is when we started looking into uh some of the uh uh improved uh red shift features that uh you know, aws launched uh you know, in the last couple of years. So serverless options is one options that we started looking into. How do we solve the problem of like comp computer with intensive job to maintain, to run it at the minimal cost as possible.

And then the, the second is that the data warehouse we had like, you know, close to 10,000 job execution a day. So the data was constantly coming in and then updating. So we had to make those fresh data available uh for the users to access real time. So we could not add any more etl to, you know, copy the data over to a different clusters or different destinations. Because that would mean that the data latency is going to be an issue.

So the data had to be accessed real time. So we started looking into a data sharing and then because of the uniqueness of the risky and trading teams, what they were doing, uh they were generating the key metrics that were, that had to be shared across the business.

So we started even looking into the further of reverse data sharing as well and um the contention issues uh we had to unlock the con unblock concurrent uh self analytics that they were generating uh at the same time. So we started looking into instead of a single cluster architecture, a multi cluster architecture.

So the solution that we came up with is that we started breaking it down single data warehouse architecture into a multi cluster architecture. First of all, we separated the eto cluster main eto cluster as a producer and then we started laying out a consumer uh multiple consumer clusters using data sharing uh capabilities.

So as you can see that uh we added like analytics consumer cluster and then we gave a very specific uh risk and trading consumer cluster and then some other uh you know, multi cluster architecture. And for some consumer clusters, we added, we converted them to a server less, for example, risk and trading because they were, we knew that they were reading billions of records.

And it's a, it's a very, you know, like resource intensive job. But then you would probably need like, you know, 23 hours a day for the compute processing. So we added a serverless as an option so that we can maintain the minimal cost as much as possible.

And then we also added a reverse data sharing so that the risk and trading consumer cluster can be a producer at the same time that the data can be shared back across the uh multiple uh consumer clusters as well.

So the result that we have seen is pretty good, you know, we were able to recover the job run time uh within the sl a. It was actually uh three times better sl a with their jobs. And we were able to increase uh our quality efficiency to 73% from 48%. And we were able to reduce actually uh the cost per day jobs using server less 10 times lower than what it was.

So that all entails to a 15 million revenue upside per year. I wish it was uh a month, but we're not there yet. So it's a 15 million uh revenue upside uh a year that we uh got the success out of it. And then the satisfaction of course is like, you know, trending very, very nicely too.

So this was a, a combination of a data engineering like collaborating with the business stakeholders at the same time. So i just wanna highlight that this wasn't just one team effort. It was uh you know, like a multi teams uh effort to put this through.

So what are we thinking as a next step? The next step is that, you know, a lot of these jobs are created by data scientists and data quants and we have definitely a room to revisit maybe using, you know, like ml technology uh to be adopted.

So, what we're doing is that we're evaluating aws and redshift ml options uh for specific uh like insights that they are generating something like, you know, personalized betting options and real time risk of profiling. And then see if that is more fitted for uh ml approach.

And as much as we were able to drive like, you know, self analytics service in the b i area, we're hoping that we can drive, you know, further self servicing in mla i spaces as well. So that kind of wraps up my presentation and i wanna bring debut onto the stage and then debut is gonna talk about, you know, what kind of uh aws ml investment they're making. Thank you.

Thanks Kate. Football, you know, and red shift, serverless red shift data sharing. What an interesting story. I love it.

So, customers like uh Kate and many of you guys tell us, like you guys want to democratize the access to your data and analytics to your end users and based on all the discussions that we have, you know, what customers want to use is that they want to provide a very nice sql editor to their users, you know, who can easily build, you know sql queries for ad hoc analytics and do visualization, you know, for quick uh insight.

And the other thing that we hear about is that many customers have started to use machine learning. How many of you guys are using machine learning? Yeah, there are many. So what you know we did with red shift is that we added a feature called red shift ml about a couple of years back that allow you know, users sql, users like data analysts, database developer to build machine learning model.

You know, for a variety of use cases, whether it is product recommendation, custom ch prediction, you know revenue forecasting, you know, just by using sql and you don't have to know python or r or any machine learning tools. You just have to be a sql expert. You can just run your, you know, create model command by using sql, provide the training data by providing a sql or a table in a uh a name.

Then you can tell red sift ml, you know which you know column you want to predict. That means the target column. And what reds does is that it works with sage maker, autopilot, automatically trains your model, you know compiles it with sage maker new and then deploys into red shape, whether it's a provision cluster or serverless as a user defined function.

Then you can do your predictions by using sql. So machine learning, we brought machine learning from to our sql and your database. So there are several customers like job case. Job case is a job site that does more than a billion predictions per day with recept ml and we have 80 billion predictions are being done by collectively by red shaped ml customers every week.

And job case as being a, you know, job site, what they do is that they do product recommendation or the job recommendation to their users. Beside that like measure and health do a lot of machine learning capability in the bill for um available to you, their insurance specialist that kind of help them identify whether a customer is going to churn from a specific prescription or not, whether this prescription is going to work for this customer.

So there is a lot of different use cases. Also, customers are using it for, you know, revenue prediction or customer lifetime value predictions. And how many of you guys are, you know, willing to invest in ja i, there are many.

So we actually announced a feature yesterday with red cml that allows you to build, you know, llm uh uh huge llm directly from your sql. So what you can do is that, let's say you have, you know llm model in sagemaker jumpstart. Then now you can invoke that directly from sql by using our, you know, bring your own model capability and allow you to do like use the data in red.

For example, we, we actually published a blog yesterday that shows you how you can do sentiment analysis. If you have customer commands stored in reds along with other data that you have in red, then you know what i hear from customers is that they want a tool that they want to use for their, you know, business analyst, you know, and and data analyst.

So we have a tool called sql, you know, a reds query editor v two and that is integrated with single sign on. So you know, you you if your users want to enable to your users, they can go to, you know, directly to query editor using their octa or azure ready or, or even id credentials.

We announced that feature yesterday, you know dc si m identity center and, and then connect to red directly without having to re ath it again and use their, you know, credentials or their user credentials to run queries. That means if someone is a member of analyst group, they can get their privileges directly on. uh and the analyst group is in your, as you ready, they can get their privileges directly on red safe using the same identity center uh then same identity role.

And we also have ability to, you know, to visualize your data, collaborate with other users. It automatically provides versioning of your uh sql queries as we hear from customers that the data analysts do not know how to use github or git, you know glab, they can, you know, they want a mechanism to, you know, uh version control their sql queries, it automatically does that for them.

What i'll do is that next, we'll show you a quick demo of the query editor. So as you can see here, you know, you can have either black background or you know, white background. So ii i like the black background. So let's connect to a cluster. You can just click on a cluster

"Uh you know, because I'm using federation capabilities is automatically connected. And as you can see all these databases, you can browse even the AWS Glue catalog can be, you know, uh browsed automatically. But let me, you know, look at my uh schema here, the dab uh you know schema under the dab database. I see all my different schemas. They are including my data sharing schemas and, and, and the uh external schemas uh like the the Spectrum schemas or the data lake schemas.

So I can, you know, expand all my uh a schema, look at the table views, you know, uh udf uh stored procedures. Um and you know, i can see the table details, you know, like all any other tool i can just select a table run queries directly from there. And I'm going to show you the more interesting feature is that many customers, you know, they provide their data analyst to load data and create, you know, their own table in their own space. So it allows you to visually create your database tables, schemas. And also about allows you to, you know, load your data.

So what i'll do is that i'll create a table directly by using the auto schema influencing capability from a cs v file. So i basically, i, i, i'll select a csv file for my uh for my account uh table. You know, and let me select that csv file. Once i, you know, the account history is selected, you'll see the actual table. Uh the actual columns are automatically inference from there. So uh i i selected the form, i, i selected the delimiter all kind of uh for, for loading my data and, and then i go next and ii, i select a, a new table because i want to create a new table. You can see all my columns, you know, this, this actually comes from the column header in the cs v file. I can, i can put that on other specific schema or give the name of the table. If i want to change any of the, you know, column definition, i can actually go and change the column definition.

Um you know, for example, i'm changing this column that, you know, to virture to uh let's say small or double precision and, and then i just create my table and then if i want to load my data, i can just click load data so it will load the data into my table. So it makes it very easy for the data analyst who are creating their own data table to load. And and we will provide also ability to, you know, actually load the data from s3.

So as you can see here, so what i'll also use another innovative, innovative feature that we have called sql notebooks. So as you can see here, like i have different notebooks as well as folders, you can create your own folder, like google folder that you have, you can organize your, you know, uh notebooks. So the notebooks capability is for you guys like for sql users, it's not python notebook. It is actually for any sql users, we have only took two things. One is that annotations that means you can provide different kinds of headers by using different kind of mark down language. And the other one is a sql.

So like for example, i have given this header as sales report, you know, and uh you know, and and then you know, i see here like for example, different cell here, one cell that i ran earlier, you can see the data there in a, in a visual format. So i have another cell there, i can run all the cells, but i can run only a specific cell also. So i run the cell. So if you see here after i run the cell i see the data there, you know, you can you know download the data to cs b or export data in cs b format or what i want to do is a quick visualization.

So i click on the chart there so you see there, then it provides an ability to change the different kind of graphical format. So i want to change that like these are the different, you know, charting mechanism. We have different formats you have, i just selected, you know, bar chart, i can select my, you know, uh the value uh and the column name and i can hear it, i can give you know, the header, the column, you know xxryxr, all those capabilities and you can share the notebook as you can see here, this notebook is already shared, but you can share this notebook to your team members.

So this is how easy for your analyst to run queries and um and use it uh in uh for your day to day analysis. So with that, you know what i'm going to do, go to the next a demo you can see here like uh naresh, you know, give a, you know, quick intro of the streaming injection that streaming inge allows you to bring data from different streaming engines like kafka or, or kinesis. And, and then uh what in this demo, what we are doing is that we are bringing the credit card transactions, we are bringing the credit card transactions, all they like in the point of sale, you know, something is happening that it is going to accept uh through, you know, imagine guinness as a stream and then, then we have red shift ml which have created a, you know, machine learning model to you know, see the data and, and and make a predictions whether that uh the credit card transaction is a fraudulent transaction or not.

Ok. And you know, i can, you can visualize this data either using query editor or you can get the data using any jd bc od bc tool. And also we have api called data api. We have more than 11,000 customers who are using data api and running more than you know, 25 million queries every day for different kind of event uhn applications to uh to ad hoc analysis application.

So, but uh what i'm going to do with the the quick demo of, of this, you know, this specific huge case that we have. So let's get started. So let me first go to the kinesis uh in a console and, and then create, i have already created the in a stream, data stream called in a customer payment um you know, transaction stream. So let me uh expand this and see whether the data is actually coming into this stream or not. So if you see here, you know, there are get records and you know, there are a lot of data is already coming into the stream. So that means basically the stream is ready to be consumed by accept.

You must be wondering where do i get to the point of sales data directly? So what you know, i do not have point of sale data. So what i have here is a lambda job, you know, populating the data. So let me just quickly show you that you know lambda job. So if you see here, you know the lambda job, so i want to run some tests and then you know, bring the data. So actually, i deliberately, i selected the wrong one just to show that, you know, it was a real, real demo, not a, not a cooktop demo. So i'll select the right stream and then start the stream. Now start loading data, you know, so the test data, now the lamar job is started, you know, around the data.

So i'm back into the query editor. So now i have two different, you know sql notebooks. So i'm going to go to the fast notebook that actually allows me to ingest the data from the k st. So what i'm doing first is that creating an external schema that is, you know, uh you know, reading the kins stream. Ok. If you see here is basically the create schema from kins and i'm using the default i roll. So this is already created. So i am not going to run that, but i'm going to run the next cell just to show that, you know, the stream is being read so that you see this basically the stream, the credit card transaction stream is being read as a part of the deter of schema.

So then what i'll do is that in order to read the data and, and and mealy into red sha i have to create a meli view, you know, on the top of that uh stream. So if you see here, i'm creating a metalized view, you know, so if you see here, you know, i'm just reading the data from this stream and, and then uh i am, you know that a metalized view is created and, and then, you know, what i want to see is that i want to see whether the data is coming in or not just by looking at the metalized view.

So i'm going to run the, you know, uh query against that. So, so actually i didn't select this sql properly. So then in a series, the actual data is actually coming in. You can see this data from rede all my credit card transactions are in here. So what i'm going to do now is that, you know, create a machine learning model, you know, and then it's, you know, you'll see, you'll see how easy it is to create a machine learning model.

So i'm going to transition to the my, you know, my next uh um in another book. So so now as you see this, you know, transactions are there. Now i'm this is my in this notebook, you can see the create model command. So it is nothing but you know the from. And if you see a select, that means that's the you know the training data that i'm providing to this model. And i see see here, i do not have anything like about the problem type, whether it's a classification or anything, it automatically does that for me, i just specified what you know, uh column, i'm trying to predict.

So if you see here, the target is a transaction fraud, whether the transaction is a fraud or not. And and then i'm going to provide a name for the function, the prediction function. So that's the function name called, you know fm you know customer ccfd. And then then i am using a three bucket where the training data is is is prepared because uh red cml automatically parses the data preprocessor data makes it available to sage maker for the training. And this train training takes a little maybe like from 90 minutes to three hours depending upon the size of the data, it's trained.

But if you see here, i use the so model command to show the status of the model. As you can see the model is ready, you know, and it was trained, you know, you can, you can see the query that was i used for. Uh but for the test data uh or the training data. If you see the model type, it uses the ex boost, but i had not specified anywhere. You know, the model type is, you know, and then, then you, you, you'll see the auto you know, if you are an expert, sage maker expert and you want to know more about the uh you know, in in the sagemaker console, the sagemaker autopilot job name is there also to do that.

So now what i will do is that the model is ready? I want to run check whether how accurate is the model as of today. So i'm, i'm running this, you know, the uh the sequel, you know, uh to check the accuracy of the model. And i can see here that, you know, my accuracy of the model is 0.92 that means it's pretty good.

So let's, you know, now try to use this model in uh to see whether it can predict my transaction uh or fraud or not. So i, i came back here. So what i'm going to do here is that i'm creating a actually a view and in that view, i'm using this prediction function. So if you see you'll get this powerpoint and the demo is embedded there. So you can see there, there's a prediction function there that i'm using the same prediction function the that uh there and, and the actual data from another uh you know, uh from that uh metalized view.

Um and, and creating this view to, you know, use this data. So if you see here, that's the function that i'm using the, you know fn customer cc you know uh fd, you know, function. So, and i'm passing all this data from my metalized view directly to that data into the table. And then, then i am going to create this view. So then you know, this view is, is actually i also have created already.

So i'm going to do is run a query against that specific view to see whether the data are fraudulent or not. So as you see here, i'm going to run my query. And as you can see, you know, there are a lot of transactions with arriving are actually are, are are fraudulent transactions because again, we are kind of cooked up here in a real life environment. Probably only few will there will be fraudulent transaction. But you know, just to so, so, so this huge case, you know, we created a lot of fraud there, but this is this was a demo that we saw, you know, the the actual real time data is coming in uh to uh to a, you know, k a stream, we are using red shift ml to do a real time predictions. This is how powerful, you know, aws analytics and red shift has you know, enabled you to do.

Um so with that, let's, yeah, so you can try red safe serverless. Today, we provide you like $300 credit to do uh to try red shift. And, and reds ml is available to every user who is, you know, uh not every user, but, you know, if your organization have blocked you to use it, then you can't use it. But every customer can use red ml where sage maker and red are available and there are a lot of resources that we have.

You know, we have, you know, a lot of books, even i authored this book. Uh uh i'm putting a blog there ml book that we published recently. Uh with that, you know what we, you know in this presentation, what we did, we showed you how aws, you know, uh levels you to do self-service and near real time analytics by bringing your data, real time data, whether it is from your streaming source, whether it is from s3 or it is from your operational data database. And then then uh you know, provide provides different tool to consume that data. And kate showed you, you know how, you know, fan duel transformed there. Did our warehouse with a multi uh cluster architecture. Thanks everyone.

Uh and uh please complete the session survey."

litaibai-04

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Powering self-service & near real-time analytics with Amazon Redshift

And how's everybody doing? Mhm right. Hey, you're in analytics session. A NT 211. Today, we are going to talk about self-service analytics with Amazon Redshift. Uh my name is Naresh Chana. I'm the engineering director for Redshift and excited to welcome Co
复制链接

扫一扫