Data patterns: Get the big picture for data applications

最新推荐文章于 2024-07-20 19:31:22 发布

李白的朋友王维

最新推荐文章于 2024-07-20 19:31:22 发布

阅读量100

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134809972

版权

Hello and welcome everyone.

Day one. Are you all excited? Ok, nice. Let me start with the question here. How many data engineers and data architect in the room? Ok. How many of you have hands on experience with big data applications? Awesome. So this session is going to be very helpful for people who deals with data on a daily basis.

Today's focus is going to be on data patterns. Uh before we deep dive into the session, let me introduce myself. My name is Elena Nari Karl. I'm a senior partner solution architect. Throughout my career, I dealt only with data and databases. I work with most of the relational and non relational databases. I'm also a subject matter expert in Dynamo DB. I have worked with many customers in architecting their big data applications. Currently, I'm focusing on genetic applications and health partners and customer to increase their adoption of using AWS vector databases with me.

I have two co presenters, my colleague Bangladesh. He is a principal solution architect and aws ambassador Matthew Houghton. He is a data architect in cdl.

This is the agenda today. It's packed with valuable insights and discussions. Let's start with modern data strategy and understand its importance in today's landscape and we'll see various data patterns and we'll also check how do we implement effective data patterns based on your use case. We will also address some of the customer challenges in data journey and look into top notch partner programs. And at last on the call to action segment, we will provide some of steps and resources that you can use to implement modern data strategy. effectively.

Data has become an integral part in today's information age, right. 90% of the data today are created in the last two years. We are seeing data is also growing zita bytes. It contains value and also it needs management. So do you think all the organization are using this data? Only one third of the organization are using this data and the rest are struggling and those organizations which are making use of this data, improve their business and work on innovations. How do such organizations are making use of data and do data driven decisions? They also address customer challenges and do a better data management. This is possible by applying modern data strategy. Modern data strategy, give you the best practices to manage data. Let's see the end to end life cycle of modern data strategy.

On the left, you are seeing data sources, more the number of data sources, more data will be produced. And modern data strategy uses flexible and cost effective storage to store the data. They are using purposeful database solutions analytics, data lakes and machine learning to derive valuable insights. They also use governance to create metadata for your data and also uses data governance to store or use the data governance wherever the data is stored. And at the end, the data reaches the end users like people application and devices and then derive valuable insights. Later, the same data is sent back to data sources to improve the customer experience.

We have touched upon modern data strategy. Let's check what is data patterns? Data patterns refers to the underlying structures and relationship that exist that exist within data sets in the modern data strategy. Analyzing this pattern is very vital in understanding your data and making data driven decisions. How do we do such analysis? How do we migrate your law to aws to do such analysis? We'll understand various data patterns. But do we have enough materials or resources like run books, technical references to understand these data patterns? That's what we are going to discuss today.

Let's take the pattern one here. How do you migrate transactional, analytical workloads on aws? This is the most frequently asked patterns. I would say the simple answer could be using migration tools but that just one part of it, right? There are so many steps involved in it. But before we check it, let's see what are the industries they are using this pattern. Most of the industries that are using this pattern or industry agnostics and their use cases ranges from using an erp applications to crm. And the customer motivation here is to use a perceived low risk of cloud adoption. They all want to move to or migrate to cloud and then they'll think about modernization. Probably their aim is to shut down the data center and move their application to cloud as soon as possible.

And the advantage here, no significant architectural changes. are there all the refactoring or modernization face? everything comes later. You can also use their skills, what they're using on their on prem for example, if i'm an oracle db or a skill db, i can use the same skill set even after migrating my law to aws. And the technologies used were rds, redshift and quicks set for whistle. And they also use some of our tools for migrations like dms, which is our data migration service before we check the run books and the architectures, let's understand what are the challenges they face with migrations as you are seeing in the screen here, 94% of organizations say they could be making more informed decisions and 80% of data migration projects either fail or run over time.

There is a famous quote, right? Go for the cause, not for the consequences. So let's understand what are the costs for these challenges? The number of databases are when the volume of database high, it's very tough to do a migration process. right. Most of the arctic here, data engineers agree with me and the cost cost is a crucial factor. There may be costs involved with the data transfer and also some of the licensing cost. And it also includes some hidden costs that we may not know until we do the migration building and deploying scripts. It's also a manual effort. And if you are trying to have a minimum downtime or zero downtime for your applications, it is a very tedious process and you need to include rollback strategies through this migration process, meeting the industry regulatory standards and complaint standards. It's very tedious process too.

So how we can help our organizations or help our customers when they want to migrate to aws. So that's where we would like to provide you some of the run books on archit reference. If the customer or the organization want to move to cloud effectively, take an example here. And if you're trying to move an article excel data workload, this is the sequence number of steps will be followed while you are migrating to aws. This is the run book that is created by using the same pattern again and again with the same process i see here, the step one starting from functional discovery or going to a dbc tool. And later, we are using an oil to understand the licensing requirement and cost. And if you want to upskill your database skill, you can probably go for an article i merchandise. And later you are going to test your architecture and then descend by going through a well architected lens. That is the place you will check how secure my application is, how available and reliable it will be. And then later you will be migrating data and the applications.

The same sequence of steps will be followed on your analytical side as well. You start with ct assessment is a schema conversion tool. So this is the place you will understand. Ok, i'm going to migrate my workload from oracle to red shift. How is my schema conversion is going to be? And then we will go to the same sequence of steps from immersion days. And then later, as we have seen on the database side, it's also going to go through well architected analytical lens. And at the end, you will be using an ie guidance, which is a process, it can be called as infrastructure even management. So this is the run book, you can consider this run book if you are trying to migrate your low to aws. And this is the first pattern i like to discuss.

We have seen the run book here. But what is the underlying architecture will be? As you see the application is running query against the traditional databases. We have a s and rds and we have a and aurora. And if you want to improve the latencies, if you want to jump it from microsecond or the millisecond to microsecond, you can use our caching services called amazon elastic cash. And later you can use some of the migration services like dms and aws glue and also zero etl. You think like i want to do an analytical processing. You can use our most popular analytical service, amazon that shift. And at the end, if you want to have an ri dashboard, you can use amazon quick set.

We have given all the components here. But do we need all the services to be part of the migration? No, as we mentioned, if you want to improve a latency from millisecond to microsecond, you can use caching services at the same time. You think like all my tables are in amazon aurora. I want to move it to amazon red shift. So you can use the zero feature alone. You don't have to use any of the dms and glue services at all. So the materialist view will be directly created in the amazon red shift when you populate your data in amazon aurora. So these are the advantages, but i would like to talk about some of the practical migration that has happened to talk about it.

I would like to call my aws ambassador matt matt stages.

Thanks lennon. Good morning, las vegas. So we've seen the reference architecture here that includes the key a w bs services to cover both oltp and o a migrations. But it's important to note that you should pick the services that match the architectural characteristics for the workload that you're actually migrating. So to help us think about this, i'm gonna cover a specific case study um that cd l needed to migrate. We had 600 databases that needed to be migrated from on premise to aws. And at the same time, we also wanted to complete a database engine swap from oracle to postgres and all of this needed to be completed and executed with minimal downtime.

Has anybody here used to change data capture software before? Yeah, a few. So change data capture or cdc. For short, this was really critical for our migration project. Cdc allows you to capture transactions from source databases and then replay them to a target database

The database engine that you are replicating to can also be different from the source. So the architecture that we had for our database migration was firstly an application running in a data center outside of AWS, and this was connected to an Oracle database and running a normal production workload.

We changed it to use Change Data Capture software, then read the database transaction log and replayed these into Amazon RDS PostgreSQL. We were able to keep the two databases in sync whilst maintaining the production workload on-premise.

The application that we were migrating, we transformed into being container based, and we built this in AWS running on the Amazon Elastic Container Service. Then when we were ready to switch over, we stopped the application on-premise, we ensured that the databases were in sync and then we started up the application stack that we'd built in AWS. And this ensured that we had a really quick switch over of the production workload to the AWS cloud.

The migration process is something that we had to repeat a lot, so automation for us was key. The Change Data Capture tasks were defined in Terraform templates and then that made each migration configuration driven. We could just find and replace client specific items like schema names.

So at the start of the migration, we locked the Oracle database accounts and then rolled back if we needed to. It was just a case of unlocking those database accounts and switching back the application endpoint. But by defining your migration as infrastructure as code like this, you get a number of benefits.

First of all, you reduce risk. Automation removes that human element - I'm really not my best doing a manual deployment late at night, for example. At some point, I'm gonna make a mistake. Also from a testing point of view, it's really important that the environments that you're progressing your application stack through are consistent throughout, and automation gives you that.

But you also get speed of deployment, so you can roll out changes quickly through multiple environments and for multiple customers without having to scale up your team.

So with the application running in AWS, we continue to use Change Data Capture to replay the transactions from a PostgreSQL database that was optimized for online transaction processing into another RDS instance that we'd optimized for our analytics workloads. And we coupled this with additional data storage in S3 and used AWS Glue for ETL and QuickSight for visual analytics.

It was in this phase of the migration we hit a problem that we had to deal with outside of the automation that we'd built into the migration process itself. So during the analysis of database specific functions as part of our migration, we found an Oracle feature in our analytics solution that had no equivalent in PostgreSQL, and this was fast refresh materialized views.

The analytics solution took about 700 database tables that had been optimized for online transaction processing and then denormalized that data into about 100 materialized views optimized for analytics workloads. We had to build out a fast refresh materialized view capability from scratch and we worked alongside experts from AWS's database freedom team to deliver this.

We've open sourced that project and we continue to maintain it and it's available free to all on AWS's GitHub repo. So this is the architecture for fast refresh views in PostgreSQL using database triggers. We capture transactions taking place in the underlying tables that make up our materialized views and these transactions are stored in additional tables shown here as the materialized view logs.

When we want to update a materialized view instead of running a piece of SQL to completely refresh all of the data in that view, we're able to just bring it up to date by processing only the changes that have taken place since the last refresh. And by using this fast refresh technology, we can achieve a latency as low as about one minute from the original transaction taking place.

So today, we're fully running in AWS, but we've done further optimizations to the architecture since the migration. So we have our Java based applications running in ECS, and within RDS we've taken advantage of multi-AZ and read replicas. The ability to configure additional resilience and support additional read capacity has been very useful to us within RDS.

We have our fast refresh module installed and then we have Lake Formation which is providing a central catalog of data sources held in RDS and S3, and this allows us to use fine grained access controls right down to a column level.

Amazon QuickSight is being used for our data visualization and QuickSight is able to talk to both RDS and S3, so we were able to take advantage of that additional data. We're using AWS Glue, we can run further ETL and we do this to provide curated data marts and data feeds to customers and third party systems.

And we also use AWS Transfer Family to allow us to easily move data between us and our customers. By combining just a few AWS services like this together, the overall architecture becomes really flexible and it allows us to cover use cases such as reporting, data feeds, embedded analytics, self service dashboards, and generative AI.

So this brings us back to the topic of modernization of analytics which Lenin is gonna cover in more detail.

Thank you, Matt. Let's understand the term modernization in AWS perspective. Modernization is basically breaking down the monolithic application into microservice architecture. By doing that, you can accelerate your business innovations and also address the technical challenges with your customers.

What kind of industry is looking for this modernization pattern? Mostly the industries are agnostic and use cases. They migrated from their on-prem to AWS and they want to use cloud native components. And the customer motivation is here to build a decoupled architecture so that they can increase the agility and innovation side of it. An advantage here is they are going to use the best tool for the right use cases.

And they're also going to liberate some purpose built database solutions and technologies used - like Aurora DB to DynamoDB on the database side. It also includes analytical and visualization and some data integration services.

Okay, what are the challenges in modernization? Building the right tool for modernization is a tedious process, and also bringing up the skillset for the new technologies will create a lot of manual effort as well.

And data ingestions - if you are going to deal with high volume of database data ingestions is also going to be tedious. Data management - most of the applications looking for this modernization are very highly scalable and have a lot of data complexity.

People mindset - "I was working with the old technologies, I'm more comfortable with that". People are always hesitant to move to newer technologies.

Integrations with the new technologies is also a tedious process in this modernization.

So when an organization is going to face this kind of struggle, how do we help them? Why don't we provide some technical guidance that will help them from the scratch till the launch of the applications? Let's get into that here.

The runbook for this pattern provides a structural approach to guide the modernization part of the process with the data lake and ML. So in this phase, you are going to find the right tool for the right job starting from the functional discovery.

Take the same example here of a transactional data workload. And if you are moving from on-prem to AWS, so it starts with functional discovery and will go to data solution advisor. This is the place you will decide - do I need to stick to relational database or shall I explore NoSQL technologies? That way here is like if you know that you have predefined access patterns, you can use NoSQL technologies. If you think your application has complex data and joints involved, you can stick to relational database.

And you will do your validation, which is your architectural validations. And after this third phase, you will go and then check about the sizing of it and costing. And if you want any partner support to implement it and then later you will move to well architected lens.

As we have seen in the migration phase, this lens will check how secure your application will be, how available and reliable it will be. And later, this particular pattern is more than what we have seen in migration pattern. It will have application refactoring phase where you will do application refactoring and check like how I can modernize my application better if there are new use cases being involved.

And at the end, you will reach out to the AI side and you have seen here, there is an optional phase. You can also build your own data lake and understand if you want to use some of our ML capabilities.

How is the underlying architecture looking like for this modernization data pattern? If you have seen the application itself is breaking into number of microservice architectures, right? It has the relational and purposeful databases and it also has streaming solutions involved. And then it has that integrations layer which will include AWS DMS and AWS Glue. And it will also include some big data applications like Amazon EMR.

And if you want to do an analytical part of it, you can also use some of the analytical services like Athena and Amazon Redshift. At the end, we'll be using some visualization dashboard using Amazon QuickSight.

I just want to go through the same flow with an example here. Say suppose you are going to work with some customer like financial institutions or some of the insurance companies, they would like to migrate and modernize their applications for the mobile banking and many other use cases.

So at that time, you don't have to use all the databases or all the analytical services we have given it here. You can use only selected databases. For example, if they are already running DynamoDB, they want to improve the caching and they want to improve the latencies, they can use Amazon ElastiCache.

Along with that I have also worked with real estate applications, Zillow - if they want to run a query against that database, which is most of it are going to be location based. So at that time, they can use geo-supported query databases that Amazon Neptune is going to provide.

At the same time if there is some use cases where we need to do fraud detection, so that time you can use Amazon Neptune. And if you think like, okay now I have all my services, databases and analytics, now I want to explore some of the ML capabilities - you can create a data lake using all the data in S3 and you can use some services like Amazon SageMaker and Athena.

Nowadays, generative AI is taking a lot of importance. So if you want to use any database which has vector support, you can land all your data in S3 as you are seeing in the screen here and then use some vector databases like OpenSearch which has the vector capability to do analysis.

Let's get into the next pattern here. Many of the industries are asking for this pattern - how do you decentralize organizational data in AWS? There should be some disadvantages with centralized approach right? So let's discuss that first.

As you have seen in the screen here, most of the domain data is pushed into one central team. So creating data expertise in one central team is going to be a challenge. At the same time, if you want to improve the scaling for the new consumers, it's not that easy with the centralized approach as you have seen in the screen here.

Only the domain data has been pushed to central team, so few data experts have been given access to play with the data. And tomorrow if you want to bring data driven organizations, this is a challenge to build data driven organizations because one few experts have been given access to it.

And data governance - there are many cases where you have less data governance. So it's a security issue and the data is not shared across. Otherwise you may not have the visibility to see how the data is being accessed. So that also creates lack of data audit.

What is the solution here? Data mesh addresses the challenge in scaling and growing data complexity in modern enterprises. So here are the key aspects of the data mesh concept. Let's go over these keywords one by one:

Decentralized approach - as we have seen earlier, it is only using centralized approach. So only one database team has the access for it. But in the data mesh, the concept here is to decentralize it.

Domain ownership - so you are going to provide data and domain ownership so that the end to end data lifecycle is managed by the same team.

Data is treated as a product - so when you treat that as a product, you can share that product with many parts of the organization and you have self service data platform.

And finally, you have credited governance. Ok. I'm going to decentralize all the data and that is what going to give me a path to innovation that will bring chaos, right? So the secret source for success here is to use federated approach.

Federated approach basically bring the striking a balance between the decently data sources and centralized sources as well. What are the industries are using this particular pattern? They are mostly agnostic as we have seen on migration and modernization. The use case here is to use it for large scale analytics and using cloud native components.

The customer motivation here, they want to reduce the data silos, however, they have used earlier with their data an advantage here it is going to increase the scaling at the consumer side and also the producer end and technologies used on Amazon S3 EM and I AM and for visualization, they can use Quick Set and AWS and other data integrations like AWS, Glue Godaddy is a largest web hosting company.

They are facing a lot of challenges the using the centralist approach. So they want to build an architecture that will avoid these challenges with the centralist approach. So let's see how they have built this architecture.

At first they want to register the data set in the central organization which is a federated government account. So they will create a catalog and the catalog is going to create the or populate the metadata for that particular organization. And later the same catalog is moved back to the producer side where the data curation, transformation and enrichment will take place and the producer or the owner of the business will give a permission to those engineers, so that same data catalog will be shared to the consumer side.

So once it reaches the consumer side, the consumer also account, the consumer account will also give permissions to some of the consumer personnel so that they can run query against this particular data catalog. So they can use some of the services like Redshift or Amazon Athena to query it.

So by using this approach, you can see you can scale the producers at the same time, you can scale the consumer as well. Tomorrow, if Godaddy is going to get some of the newer use cases, they can simply need to share this particular data catalog only for that particular organization. So the data is treated as a shared product at the same time, everything is maintained or accessed by using the data governance account.

We have seen about how we can use data mesh. Let's understand how you can govern your data better and also some of the other data patterns that are emerging nowadays to talk about that. Let me call my colleague Resh.

Hey, thanks Lynn. Hello everyone. It's great to be here. So, so far we have seen how to migrate and how to modernize and how to decentralize the data. So the next pattern is how are we gonna enforce data governance across the organizations?

So let's start with the definition of data governance. You may have your own definition for data governance. But in reality, data governance is an approach to ensure your organization's data is in right condition to meet your business objectives and business operations. And in order to meet your data in right conditions, your business and the IT has to have a partnership.

So where does this data gon pattern fits? So it is industry agnostic and the use cases are a nation wide data management. And one of the motivation is to have a centralized balance control and to share and discover the data at scale. So some of the technologies we use is to implement data governance are Amazon Datazone and AWS Lake Formation.

For context, 85% of the companies wants to be data driven, but only 35% are able to achieve that. But if you dig deep into the reasons is one lack of data strategy or implementation of poor data governance. So historically speaking, data governance is seen as a lockdown measure of your data where you lock your data in silos, where you're not able to innovate. But in reality, if you have a proper data governance, you should be able to move your data freely across civil organizations.

So since your data is spread across different teams and knocks, it is very hard to discover the data and you may have a different mechanism for access control, you know, it could be a manual way of requesting and allowing access and you won't be able to scale across the organization. With that approach. And having poor data governance also means you have an increased security risk because of lack of audit control.

So if there is a breach, it would be hard to mitigate because you don't have an audit control mechanism there. And also poor quality of data would lead to inconsistent decision making across your hog.

So how to build a data governance across the org? Let's take a look at a sample run book we have developed for data governance use case. So on the left hand side, just like any other use case, we will start with business initiatives. So in this case, let's say you want to build a supply chain and what are my applications and analytic use case that is going to support supply chain. So it could be supply a score card or inventory.

Then in the next phase, what we're going to say is what are the data i would need to build this use case. You don't need all the data at this point, but only the data that you would need to build this particular business initiatives. And you need to figure out who your producers and consumers for this particular business initiatives.

And next comes the crucial part, which is the data governance and data management part. So this is where you would ensure you have your data quality, data life cycle, master, data, catalog, data, catalog, data lineage and security. And then comes the next phase which is where architecture, technology tooling and partners.

So this is the phase where we will find out like what sort of technology i would need to implement this use case, you know, that could be data lake or data warehouse and then comes the operating model. And this is where you would have to have a partnership between your team and process to achieve your near goals and the long term goals before you l the data to be consumed by the consumers.

So here is a sample reference architecture. So on the left hand side, you have two domains, you know, assume they are marketing and you know the other producers, sales and on the right hand side, you have the consumer call them operations. Let's say operation team wants to build or innovate a product based on the data that has been produced by the producers.

So for producers, you know, we do have multiple services to support the engine framework. So as you see on the screen, you do have Amazon, kins, Amazon redshift and Amazon glue. Then at the bottom comes the main part which is the data management and this is the stage where you would make sure you have your data life cycle set, starting from the inception to the place where it gets obsolete, then comes the data lineage where you're gonna track your data or changes of data over time, then comes the data quality and this is the place where you ensure the quality is achieved so that your consumers can consume the data with trust and comes the data classification.

So this is the place where you want to classify the data. So that when there is a business case that arises, you know whom to share the data with and complaints for security measures. And the main aspect of this reference architecture is the one at the center which is the federated data governance and you do have two services, aws lake formation and amazon data zone.

So amazon datazone for exa mp, it has four main components. You have the portal, you have the catalog, you do have the uh uh what's the other one? Yeah. Uh ok. So it has the portal, it has the catalog, it has the gowns layer, right. So amazon data zone also lets you grant access to the aws lake formation tables and views so that your data stays at the source. But you should be able to, you know, have these assets populated to the data zone and your consumers on the right hand side should be able to consume the data using the datazone.

And you know you do have amazon athena where you can able to access the data and you have the red shift query editor to consume the data. And if let's say you want to build a mission learning model using the data that you have just consumed. You do have amazon sage maker, it does provide you ml governance, for example, you have the model codes, you have the unified dashboard where you can track what's happening.

So now you have all the data you need. But then how can you improve the customer experience and how can you generate value for the customers using the data that you have? And this is where artificial intelligence and machine learning comes in place. Your traditional artificial learning and learning, you know, comes where does you know nlp or forecasting or some kind of processing? Whereas with the recent trend with generative a applications, we are able to generate images, texts, videos and so on.

And if you think of building an effective generative air application, the key differentiator here is your data. If you think about it, every organization gets hold of the very same foundation model, what you would see, but then what makes us unique is your data. So for building an effective application, your data is going to be the key for this.

So let's see that generative a i fits its purpose. So like every other use case, this is industry agnostic and most of the use cases that we have seen is you know, building chat boards or transitional language, data augmentation and so on. And one of the main motivation is automation and efficiency and a lot of research and analysis department relies on our data on generative a i technologies and we do have a lot of technologies here but again, i want to highlight some of the technologies we've been seeing with the customers starting with the vector data store.

So we do have amazon aurora open sage and uh kendra. Then we do also have amazon sage maker and amazon bedrock to support this generative a i applications. So a study shows that in 2025 10% of the data that we are gonna consume would be produced by generative applications. So this means currently we stand at 1% of the data produced by generative a. So in the next two years, a lot of the applications or the data that we're going to consume would be produced by gender a i but the a i gr a i is evolving.

So does the challenges and the gender to a i applications are able to produce highly synthetic fabricated data. So this means you are able to produce misinformation and disinformation, you can spread fake news for example. And if let's say your input or the model of the data is biased, then the model that you want to build with that biased data is going to be biased as well. And therefore the output is going to be biased.

And we have seen in the news recently, a lot of music videos have been produced by generative a i. So who is going to own the music, so who is going to own the intellectual property rights? You know, these are some of the challenges that comes with generative a i pattern.

So like you said, in order to build an effective a i application, your data is going to be the key. So you have four emerging patterns where you could use your data to customize generative a i model. And they broadly falls under three categories in context model, fine tuning and training your own model.

So in the contextual prompt engineering, what we're gonna do is we're gonna build or design a prompt which takes implicit informations cues and data arguments with the retrieval argument of generation or rag as we may call, it can generate and retrieve the data at the same time. And with model fine tuning, what you're gonna do is you're gonna use a pre trained model and you're gonna use your existing data and then you're going to fine tune that model.

And the last one is the training model. So where you're gonna use your domain specific data to build your own model. So let's take a look at each pattern in depth.

So contextual prompt engineering. So who needs it? So this is for organizations who doesn't want to spend time on coding or resources, but at the same time, they want to leverage assets. So let's say you have two types of prompting with contextual prompt, which is zero short prompting and few short prompting.

So zero short prompting. For example, it can be able to give you output without any training needed well as few short promptings can be able to give you useful information, but it needs minimal training with few input output paths.

So if we take zero short printing, in this example, let's say user asks that i want to write a bio for my linkedin. So what's gonna happen here is based on the user profile data, we're gonna retrieve a template and then we're gonna add additional augmentation again based on the user profile data. So what we have effectively done is we have context contextualized the prompt based on user profiles data and you were able to achieve a customized response for the user in this case.

The next pattern that we see here is retrieval augmented generation. So like I said earlier, rag is a model where you can retrieve un generated information at the same time. So in this example, let's say you as a company wants to, you know, create a finance chatbot where you know, you want to act the chat bot as a finance recommender.

So i assume you have all the user information with their pay slips, banking information spending patterns, investment stocks and so on. So on the right hand side, just like any other organization, you have all the data in the data la you're gonna take that data and you're gonna preprocess it. But rather than feeding that information into the llm, what you want to do is you are going to create embeddings and you're gonna store that embeddings into a vector database.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Data patterns: Get the big picture for data applications

Hello and welcome everyone.Day one. Are you all excited? Ok, nice. Let me start with the question here. How many data engineers and data architect in the room? Ok. How many of you have hands on experience with big data applications? Awesome. So this sessio
复制链接

扫一扫

Data patterns: Get the big picture for data applications

“相关推荐”对你有帮助么？