Toyota drives innovation with global data mesh

最新推荐文章于 2024-08-19 14:16:39 发布

李白的朋友王维

最新推荐文章于 2024-08-19 14:16:39 发布

阅读量81

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134866190

版权

All right. So today the topic is about using generative AI to deliver self-service data products.

Quite often, the challenge for the IT teams, which is what I would assume majority of you here are, is to be constrained by the business users coming and asking you on a daily basis - can you get me this data? Can you get me that data and so on?

But we have been working on the self-service for quite a while and not many people have actually cracked the code on that one because it's hard getting self-service.

So here my topic is about using latest tools like generative AI, how to simplify the delivering of the data products and improve the self-service manner.

Um so just a brief introduction of Dino - Dino is a logical data management company. We integrate the data, manage the data and deliver the data in a logical fashion using a technology called data virtualization.

How many of you have heard about data virtualization? Ok. Again, the same couple of people.

Alright, let's jump right in. So when I go into Google these days and I search for, let's say any terminology. In this case, I search for logical data management - before it would just give me a list of all the web pages I would go visit and info the information myself.

But now it actually uses the generative AI itself to understand what is in it and then provide that information right in the same screen without me having to click down to the next level and learn about it. If all I need to know at the high level, just about this much, which is fine. If I need to learn more information, then I can click on the 1st, 2nd, 3rd link and so on.

How useful it would be in our daily business if something was simple like this to provide questions. For example, I want to understand what, who are the most profitable customers? Usually this query comes in the form of reports, dashboards and so on.

So the business users, I'm a business user. You would actually go and talk to somebody. Hey, can you get me the data that actually shows me the last, the most profitable customers in the last six months in the last year and so on? And you basically wait for some time for them to give the data back to you.

Now, if I were to empowered to use a query like this, just like the way I did it in Google, I would actually go and type in "who are the most profitable customers?" and it in the background does the work both against the database of the data warehouse, gets the results and delivers it back to me.

That will be really cool. I don't have to go bother someone. I get the data immediately. And at the same time, the IT that is in charge of managing all the data infrastructure can just focus on their chances. They don't have to worry about stop that work and get the data to me.

So what we have done in Dino in our data fabric product is integrate generative AI to simply enable this for business users like me to be able to ask questions in natural language and get the response right away without having to wait for somebody.

So in many cases, like if you are to working against the data warehouse, you would have to issue a SQL query to get the information. And as a business user, I might not know the SQL, so I might have to bug somebody to do that stuff.

So in this case, what the prompt is, for example, I need the information on the actors and all the films that this actor has been portrayed in. And when such one comes, what the order does is it generates the SQL behind it.

Now, even when it generates a SQL, I'm not intelligent enough to actually read and understand the SQL. So it actually provides a description of what that SQL code means. So that way I can be assured that it is translating my natural language query into a proper SQL query that can get me the correct results.

Now, if I'm not happy with it, I can go back and change, I can add some more words to it to make it more accurate. But it provides that. And finally, it executes against the data warehouse, the database or multiple systems brings back the results.

In some cases, it might have to join to deliver the results. And then it provides that in this particular user interface where I can see the information that I asked for.

So this is what the power of generative AI is going to be to deliver the information that I need right away without having to go talk to somebody to do it. So that's the self-service that we talk about.

Now, let's try to understand the underlying platform of how exactly it does that. But before I go into that, this is a Gartner hype cycle, some of you might have seen it, there are many different hype cycles. This one is very specifically for the data management and you can see it starts in the bottom left corner, it goes all the way and ends up at the top right side.

So the generative AI for data management is just the beginning. This is the latest one that is getting on the hype cycle. There is a lot of noise about it and even it hasn't peaked. So in the next year or two, you're going to see that come to the top of the hype cycle and then ride in down the wave as it becomes more mature.

But the data fabric which is the underlying technology that enables this capability of providing self-service and using the generative AI, it actually is a more mature technology. It has been around for the last 2 to 5 years and still it has ways to go for the next five years. It might reach the plateau of productivity, but it is a more mature technology that is already available for you.

So what is the data fabric? Have you guys heard about the word data fabric? How many of you are familiar with the word data fabric? Ok. Alright, good Gardner promotes the word data fabric quite a bit and it's a, I would say it's a new terminology on old concept.

Many of you might be familiar with this particular diagram, right? You have multiple sources, you have multiple consumers and they want to get the data from multiple places. You basically start writing the pipeline, the data pipelines that crisscross all over the place and you join this information and then you deliver it.

There was one customer, a banking customer that actually has about 1 million jobs that they run in the night, 1 million jobs assuming a failure rate of even 1%. How much time are you going to spend the next day morning debugging where the problem is? So that way you can fix it. So it is pretty cumbersome to go with this particular architecture.

A more simplified fashion would be to avoid the clutter and have a hub and spoke a model in which you have an enterprise data layer that actually abstracts all the underlying complexity. So in this case, this is the logical data fabric as the middle layer.

So the logical data fabric connects to the information, all the sources underneath and it integrates the data and it joins them and provides a unified view of that information to the consumers at the top. And it does that without having to store that information in the traditional data management.

Many of you are probably taking all this information in order to unify the information, putting it into a data lake or into a data warehouse, right? So there are purposes for which you would use the data warehouses and the purpose for which you use data lakes. But you should not be using those repositories as basically a container to throw all the data in the enterprise. Just because somebody is asking, I need all this unified information.

There are better ways to get it. And this logical approach allows you to keep the sources for the purpose they are intended. But to deliver the unified information, it provides a virtual view of that information. It doesn't need to move the data doesn't need to replicate the data. And so it gets the data much more faster. You accomplish the same thing.

Like you can have a data catalog where a business user can see the information. That's the one that I just showed you. You could be a business analyst that is using Tableau Power BI doing your analysis. And you can actually get the same information.

You could be a data scientist who would primarily use like R and Python, you might be going against a data lake. But now you can actually ask this logical data fabric layer and it will provide the same data for you or it will even expose the data products as APIs in a data marketplace. And you can actually provide that to your developers to build applications or even you can expose it to your partners outside of the organization.

So they can actually deal with this data products. So it's a much cleaner layer. It avoids the complexity that was there in the previous slide and it gets the data much more faster and it simplifies your architecture and it future proofs it for any changes underneath. And that's a very powerful goal because I as a business user using Tableau or Power BI don't need to worry about where the data comes from.

And you as the IT guy can figure out how you are going to migrate this to AWS or any other system at your own pace without having to disrupt me getting my own data.

So let's look at three concepts that the data fabric uses AI to simplify the delivery of the data products and enable the self-service.

The first one is it uses the AI for a learning and recommendation purposes. As you are working through the data, it actually learns who is using, what are they accessing? When are they accessing? So all this information it actually learns and it provides a recommendation. Just like you use Netflix, you log into Netflix, it provides you a suggestion of the movies that you watch based on what you have watched before or even YouTube. Here is the remaining formatted transcript:

So it is similar to that, it actually provides a customized recommendation based on your usage of this particular data.

Now, one thing that we have added here is the ability to support large language models. When you are using large language models, they are thirsty for information. They want a lot of data for the model to actually work.

Quite often the data scientists or whoever works on these ones spend an enormous amount of time accessing the data, preparing the data so that they can run the models.

But with this logical data fabric, you can actually get the data so that you can focus on just preparing the models, fine tuning the models and then delivering the results from that and the insights.

You don't have to worry about where to go get the data, how to get the data and all that, it simplifies getting the data formatting it a unified normalized manner and then providing it.

So that is something that is a pretty big use case.

The second one is the ability to accelerate. Quite often. I get the question. Ok. So if you're running a chart in a Tableau or Power BI and it's a real time access going against these databases that I'm actually using it on a constant basis. How is the performance?

So I can tell you that the performance is pretty fast because quite often, the misconception is if I'm actually running a query, it actually goes against this database that has 2 billion rows and it's going to lift all the data.

No, if I ask the question, who are my most profitable customers? My most profitable customers are 100,000, 10,000, whatever it is, that's the number of rows I need returned back from the query. It's not the 2 billion.

So what the logical data fabric does is it runs the query down in the sources and brings back just the results and combines it that gives it, that's why it is very fast.

And quite often when you ask a query, you're not asking the query, you're not running, you know, accessing like in a store and getting all the data, you're asking a summary information.

In this case, if you're asking for the most profitable customer. That's a summary information. If you're trying to understand the products in a particular geography, it's a summary information.

So the logical data fabric allow you to build summary tables. It recommends if you actually build this particular summary table, it will be much faster. The next time you issue the query, it will go against a summary table which will be much faster than going against this large database and getting all the results from there.

So it actually provides recommendations for accelerating the queries and getting the results fast.

And the last one that we have added is in terms of fins. Quite often, I kind of hear many customers challenged with the escalating cost on the cloud.

That is because like if you were to use an analogy, like you get an electricity bill at the end of the month, that's when you know how much you have actually used in the past month, it's too late to act on it, right?

Maybe you'll start turning off some of the lights and so on for the next month. But you do not know until after it is done.

What we actually do is with, with Dino. We are the central engine where all the queries pass through. We can understand how much consumption is actually going on underneath in those data sources.

So we have built some dashboards which we can actually demo in our booth. How using those you can understand right away in real time, how much cost you're incurring based on the queries your teams are running now, you are not the guys who are running the queries, there is being run by the business teams, but you are ultimately managing the cost aspect of it.

So here you can actually know what are the cost that and which team is consuming the most one query that they can, they can probably optimize the query. So you can actually go back and tell them to change it or you can restrict it. There are many things that you can do.

So we have actually built some dashboards that enables you to show in real time the cost aspect of running these queries going in and out of your cloud infrastructure.

So this is a diagram that actually shows of dinner of working in the aws ecosystem. So we work with many of the aws services, redshift, aurora, athena, all these ones s3 and so on.

So we have direct connectors to those to be able to bring the data from them. And we combine that with other sources, your non aws sources.

How many of you would have like, you know, just beyond aws? Do you have aws plus other things? How many of you have that? Ok. That's pretty much the common scenario, right? It's nobody is going to have everything in one particular cloud service provider.

So it provides that hybrid nature of being able to get the data from s3 combine it with other sources and provide that unified information down to the consumers that you see.

On the left side, whether it is you are doing data governance with colibri, you're using your analytical tools like tableau and so on.

We also provided a data catalog for your business users to be able to use it right away without having to use any of the other tools. So and that's where I initially started by showing the generative AI capability of asking a natural language query.

So we are a big in the aws. We have hundreds of customers using dinner with the aws services and technologies.

So here is an example of a customer. So this is lease plan. They actually lease cars. It's more well known in europe than here in the us. They do have operations in the us.

They actually use D as the logical data fabric. And here you can see going against multiple services in the third column aws services and they also use snowflake where they're using it for the data warehouse.

And they have other technologies for the data science like sage maker, for the AI data I and so on. And the dinner is the this intermediate layer or the abstraction layer that actually hides the complexity, everything on the left side.

And then it simplifies the data access on the right side, on the right side, they get the data from DI DI the managers, bringing the data together all in a virtual fashion without having to replicate this into yet another repository.

So it gets the data much more faster and they won an award for that particular use case last year.

So one day, a few people who had heard about dinner, so dinner is a leader in data management. We use a logical approach to integrating, managing and delivering the data. And the logical approach is basically called data virtualization, which some of you have heard about.

We are a mid size company. We have about 800 employees spread across 20 to 25 countries and then we have over 1000 customers and about 250 plus partners.

And we are a recognize leader by the analyst by gartner for three times in a row. We are the leader in data integration and we also three times. We have been the winner of the customer choice award from GP insights.

So we are in the booth right behind in le of five. So I encourage you to stop by to take a look at the demo that I just showed the screenshot of the, the generative AI capability and also the fops. They are pretty cool and you can try dinner.

So we have it available for free. You go to the marketplace, aws marketplace and you search for dinner. We have four different subscriptions that are available for you to go try it and use it.

You can also go to the dinner website and you can register for it and then you can get access to it. And we also have many different collateral that explains all this.

So you can go to dinner.com in the resources and you just type aws, you'll get multiple different resources that you can use.

So that's the end of my presentation. I'll hang out here in case you guys have any questions. Ok?

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Toyota drives innovation with global data mesh

All right. So today the topic is about using generative AI to deliver self-service data products.Quite often, the challenge for the IT teams, which is what I would assume majority of you here are, is to be constrained by the business users coming and askin
复制链接

扫一扫