Mike: Hello, welcome everybody. Thank you for joining us for this session on generative AI. I know that there are no other sessions about generative AI at re:Invent this year. So I'm grateful for you joining us in this one.
In this session, we're talking about architectures and applications in depth. Hello, my name is Mike Chambers. I'm a Developer Advocate for Amazon Web Services and pretty much specializing in generative AI.
Tiffany: And I'm Tiffany. Also a Developer Advocate at AWS, specializing in AI.
Mike: Awesome. Shall we get into the agenda of what we're gonna cover today? Oh no, we both got clickers. What could possibly go wrong? Sorry.
So for the agenda today, we have:
- Generative AI - a quick recap. So hopefully you have some understanding of what generative AI is.
- Then we will go through the question - what is retrieval generation, right? What are agents?
- We'll go through security audits and compliance for applications with generative AI.
- And a little bit of predictions for the future of generative AI in the tech world.
Tiffany: Yeah, it's predictions where we hallucinate.
Mike: Exactly. Alright, so we have a poll question for you guys. The question is, are you using generative AI in your workloads or workflow today? I'm strangely enough using AI, because I put presentations like this together. But yeah, absolutely. I think generative AI has been a huge impact on my life and the life of my family as well. So yeah, using a Code Whisperer, for example, to help you write code. What about yourself?
Tiffany: Well, I mean, even in the workflow, maybe you guys also use ChatGPT. We're in the wrong room. Why are you even here? Well, we're here to, to talk about it, I guess, right? Or maybe trying to understand and we've got some more questions coming up which I think sort of going in may sort of this in a little bit more depth. I, I don't know if you find this interesting. But I think that polling was kind of interesting and also we welcome the other rooms that are joining us as well via the stream because other people up and down the strip are watching this. So yeah.
Mike: Okay. So let's dig into this in a little bit more detail. So I guess we start off here. We wanted to start off with a quick recap of generative AI. I think we're making an assumption at the moment in this session that everybody has some kind of familiarity with generative AI. Let's go for the really simple quick start.
So, generative AI, it's machine learning, it's artificial intelligence that generates content. A lot of the machine learning that we've dealt with back in the old days, like last year was all about making predictions about what data was. So, here's a picture of a dog, here's a picture of a cat, great stuff like that. But in this case, we're able to type in prompts and we're able to type in prompts to models that can do these generative tasks. And I'm sure we've all had a go with putting in a prompt and asking for a picture to be generated - a golden retriever wearing glasses, a hat in the portrait painting. I did type that and it did make that or going to chat bots and typing in prompts and getting answers back.
So in this case, this is one of the open source large language models asking it what is generative AI came back with its answer there. So, generative technology maybe we're all on the same page and we've all had a go with this stuff.
And the center of all of this is the foundation model. Certainly the center of the applications that we build as builders and developers on top of this technology, there is a foundation model and these foundation models are machine learning models that have been trained with a transformer architecture.
We talked a little bit more about this in a 200 level session that we had just before this morning. So if we are not doing it again, but it will be up online at some point. So boa210 if you want have a look at that once it's up online. But the transformer architecture was really the pivotal moment when using this technology, we're able to train really big models.
So we're able to take data that we have. And a lot of the time in these presentations, we're probably going to end up speaking about large language models. LMs because that's really where at the moment, the majority of the enterprise and business and social and real value is it's great that I can go to a prompt and get a picture of a golden retriever wearing a hat. But there is so many more things that we can do with this technology, the large language models can do those things.
So we put all of our data and train our foundation models with compute. I've said GPU here, this could be other kinds of PUs. But we're talking about the types of architectures which are optimized at training the transformer architectures. And we may be aware of this, we're probably familiar with. This are not small things, these are not things, this isn't a rainy weekend project, which is do you know what I'm going to make myself an LM. That's not really how these work, not the big ones anyway.
And so there is a lot of data and a lot of compute power required to generate these models. And this has really been one of the transformative things and why we're talking about generative AI now, as opposed to a couple of years ago, far less of us were talking about it anyway. And it's because of this gradual change in size, this growth in models.
It's, it's, it's almost not worth trying to keep this slide up to date. The point of it is that models used to be really small, like back in the old days and very, very old days, we dealt with tiny little models with like a few hundreds of parameters, then some millions of parameters. But that line on the graph that's in the 2016, 2017 frame, that 2017 point is when transformers emerged. And when we, we discovered this ability to make much larger models, and that's really been the key to this is as the models have got bigger. So have their emergent properties have been discovered.
We've made these big models and said, oh my goodness, they can do all of these things. And so we are finding that models are not mathematically exponentially, but language exponentially growing and getting much, much bigger. What's going to happen in the future is a bit of a guessing game for everybody in reality.
What's happening is that this is fanning out, right? So, models are getting much, much bigger and there is a push to get more and more capabilities from the models by making them much bigger. But they're also finding that we can train smaller models which do other tasks quite well. So I know this isn't, you know, there are, there are new versions of LLama two and Falcon since this was put together. But there are those smaller models too.
So we, we, we, we're getting a lot of capability, a lot of spread of models generally, especially compared to the old days, they're all much, much bigger than we've dealt with before now and said at the beginning, we, we're talking largely about large language models here, large language models. They are a type of course of foundation model, but they're text based. They're the ones that we're gonna be using for our chat bots and for making text completions.
But a lot, lot more underlying this. And I think this is one of the things that for me anyway, it just absolutely blows me away. I mean, even when we were talking about this this morning, it still just amazes me that this technology works.
Um but really all these things are doing is making next word prediction, next token prediction. I guess we would say we understand that they work with tokens, not words, but that's all they're doing. We're sending these prompts in and the models are figuring out what the next word might be and then cycling through that again and again and again, we get some kind of output we can use. And that is where the, it's the root of all of this amazing emergent properties that we've found.
We enter a prompt and the LLM adds to completion. But like it's such a simple concept, very complicated from a mathematical perspective. But just by predicting the next word, we're able to create all of this quite astonishing capability.
So the large language models that we produce one of the things which sort of sets them apart from models that we've worked with in the past. Again, apart from size alone is that in the past, with natural language processing, we would create models to do specific jobs. So we would create models to do text generation. We would create models to do text summarization, et cetera, et cetera translation. I love translation. Like we could translate from English to French, but you can also translate from natural language to a computer language. So from natural language to Python, which I find super useful. And maybe some of you have used that
Sentiment analysis, et cetera. What we're finding is that we can use the large language models as general purpose models, they can do all of these things. And a lot more we say more dot dot dot that's important. We'll be coming back to that. But they can do all of those things. We don't necessarily have to take a single purpose model and we'll see what the future brings there will be specialized models, et cetera. But these large language models are super, super capable.
We're not gonna throw quiz questions at you, whether not quiz questions or survey questions at you the entire time that we're on stage here. But we've got another question for you here today and I'm super interested in this one is of all the questions I've probably asked any audience. I'm very interested in this one. But what is stopping you from building with generative AI today, there's some multiple options on there that aren't gonna come, they'll come up on the screen when we get there. I have an idea. I don't know.
Um, I don't want to predict anything. What's stopping you from working with AI? I, not much. We are working. We are, if we could not work with AI, I, we wouldn't be working at all. That's right. That's true. Yes. But we'll take. All right. Ah, interesting. Oh, interesting. Oh, it's changing and it's live, this is good privacy and security. Privacy and security. Ok. More votes coming in.
So, I, there's, there's a whole cohort of people where there's nothing stopping you and I guess the devil's in the detail of this would just like to have so much more interactivity. There's nothing stopping you and you are doing it or there's nothing stopping you, but still you're not doing it. I don't know. But privacy and security. That's cool. We'll definitely be talking about that in this session.
Tiffany: And so for the next slide, so one of the thing that actually, and some of you have answered that question also answered that answer. What could prevent you also from using foundation models also is hallucination. And that's because some of you might have experienced that with LLM's. But sometimes you ask a question and because it really wants to give you an answer, sometimes it gives you something that is not accurate or that is probable of an answer, but it is not factually, right?
And this is something that if you use LLM's in your use case for your business, you don't want to have some of the answers that were wrong. You want this to be accurate all the time. And so you need more control over your LLM's, you need to customize them for, they have to be tailored for your business to make sure that the answer is that the AI is giving you is, is accurate.
And also there's the other limitation is that once you've trained your foundation models, their knowledge is limited to the time at when you have trained your data. So let's say you have trained your LLM back in 2021 then it will only know, for example, for example, for example, where you got that from.
Mike: Yeah. Well, it will only know the things that you've given it as a data set for training up until that date. But everything that has happened after that, it would, it would not know factually, I mean, it would be able to give you answers that are probabilistically, right? But the facts are not there.
Tiffany: Yeah, I think a really interesting way to put this is that large language models at that point, they have a good sense of the world. So they know that up is up, the down is down, the sky is blue, but they don't know about you, your business, your social enterprise, whatever it is you're doing, they don't have those facts.
Mike: Exactly.
"So one of the things that you could do to gain more control over your AI is using retrieval of multi generation and agents. We're gonna see um in a minute what this is, this actually means. So we're going to show you the general idea of what a retrieval augmented generation is.
So basically you have your LLM and you give it a query. So it is your prompt. This is the question you asked to your, to your LLM. And what you want to do is to give it more data. You want to inject data that will help your LLM understand better your question and make it sure that it has all the facts to make sure that the answer is accurate.
And the source for the data could be um could be different things. It could be a vector database. Uh it could be an API uh that uh helps uh the agent to retrieve information and then put this in the prompt. So basically, um it's not very complicated. What happens here is that you just augment your prompt with your initial rey plus the data that is relevant to the question.
And I think a real good call out here is this is what we're not doing. So the first thing that we're not, we're not going to uh immediately to the step of fine tuning. We're not immediately going to creating our own foundation model or doing continuation training, all the things which you absolutely can do. But the first thing that you should look at is prompting prompt engineering and this retrieval log of any generation, which is not as complicated as i think most people make out.
Yeah, exactly because you might have heard of fine tuning. But this requires a lot of compute like power and, and you need data and, and it takes time and it's costly when you do retrieval augmented generation, you only need to update the data like the source of the data that you send to your llm. And this is why your data is fresh. So now your foundation model has been trained up until 2021. But then in the data in the vector database or the a p, you give it more information that is up to date. And this is how it's able to generate a more accurate answer two.
So the mental model we have, so we know that those concepts can be very complicated and we like to use metaphors to like kind of wrap your hand around what those rag is. So an analogy that we found um with with mike is very useful, don't blame me. I know is the the wizard analogy. So basically what a find foundation model is here is the wizard students.
So at the beginning, uh you have a wizard, it it doesn't a student, it doesn't know exactly how to spell cast. it, it doesn't know much about magic. And what you can do is to send the wizard to school. So they have pre training. So this is where they learn about the world of magic, like general knowledge about everything that is magical, but it's, it's unable to cast specific spells.
Um like right after school, um it's if, if you want to specialize in a, let's say a domain of magic, there's a troll in the dungeon. Yeah, exactly. For example, you would not be able to kill the troll in the. So what you would need at that moment is probably to have a book of spells with you. And that book of spells is the data that you need to be able to basically kill the troll with you like that is with you or answer a question but or also specialize in a certain area of magic, like we say before, like maybe you want to specialize in transformation like you want to transform into an animal or, or you want to specialize in um with the, with the other areas of magic, i think you dig a hole, right?
Um so, but that basically what it is. So the student is your l of m and the spells is your data, your victor database. And oh yeah, that's the fine tuning part. Yeah. So what might happen if the wizard goes and looks at the spells and doesn't understand the terminology of those new spells, then you can send the student back to school to refine the training, like to refine the general knowledge of it. And that's basically how you can customize your llm.
And now going back to a little bit more of code because we know it's a 300 level session, we're going to show you how this works the wizards coming back by the way. So if you want to take pictures and say i was at a 300 level talk and they were showing me pictures of wizards, you'll have your opportunity to do that of accurate architecture.
So um and somewhat it's in this demonstration too. So what i've got here is some code um when we've been talking with people and exploring retrieval augmented generation. Um there has been some fusion to be completely clear with you. So of looking at the relationship between, especially when we're talking about vector databases and vectorized information and the actual large language model itself.
And in this demonstration, i'm going to show you some code, it's not necessarily the prettiest code in the world. And i'm not suggesting you'd use this in production either by the way. Um but i'm going to step through um uh an example of retrieval, retrieval augmented generation. So we can actually see it because very often you'll be implementing this with a library such as lang chain or with products and services that do all the undifferentiated heavy lifting for you. This is a very hands on no libraries kind of some libraries but no real libraries at work. So we can see exactly how it's working and how the large language model and the vector database are separate.
Yeah, just to make sure i don't do this in production. This is just to explain to you the concepts but then you have tools to you know, manage everything for you. So this is a very hands on example of it, but don't do this. They said don't do this, do this. Ok. So it's an example and i'm wanting to illustrate the point in python.
So at the beginning of this notebook, so i've got a notebook of code at the beginning of this notebook, i'm importing some libraries and we'll obviously step through and see where these libraries are being used. The key things here is that we're using uh faith fa i as the open source vector database in memory from facebook. Um so, or meta that's just being used just for convenience sake. It's used a lot in examples for this type of thing.
Um and then we're using photo three, the sdk for aws and ginger to do some uh templating. So the first thing that i'm going to do is i'm going to create for myself, a bedrock client. So i'm actually using amazon bedrock behind the scenes to do this. Um so i'm going to be using a couple of models for amazon bedrock, the uh embeddings model and then one of the large language models as well. But you could use any embeddings model, any large language model and experiment around in a similar kind of way. But getting myself this client means that i can move forward.
So the next thing i'm going to do is load up my documents. And so um i quickly last night thought, do you know what i feel like we should make this about spells? So we've got a whole bunch of spells here. There's a special secret one in there somewhere. If you can find it, we'll find it in a moment. So these are basically just text strings. I know these represent documents essentially. So the documents that you might have your new document store, small, simple something that we can just run here today. Remind me as i scroll through this to run these cells because otherwise we'll have a bad day. Ok. So i've run everything so far. That's cool.
Um so the next thing we're going to do is we're going to vectorize all of that information. So what that does for us is it takes all of those uh text documents spells in this case and it's going to convert them into a vector space. So we use an embeddings model to do that. And that basically takes the text string and converts it into an embedding space of 1000 and something vectors. And it would do that if it said hello world, it would be that big. If it was an entire 10 page article, it would also be that big. So there's, there's this nuance and careful crafting when you're doing this in production about how large a text string you want to put into the vector. But for now we're going to put the entire spell in.
And so this is just a simple function that i've got, which is going to help me do that. So it's going to take the text, the spell in this case and we're going to run it through the titan embeddings model. So these are just some keyword arguments here which help me to call the model from amazon bedrock. So amazon bedrock has a fairly standard interface, an sdk level interface. And so this construct here will look pretty similar for each time we're calling the model, whether we're generating an image with stable diffusion with my dog and a hat and whatever, um whether we're doing text generation or whether we're doing this embedding and, and this line here where we just call the client that we got.
So bedrock runtime client and call invoke model, this line is exactly the same for any of those generations that we're doing or any of the embeddings that we're doing. Um just passing in those keyword arguments there. So, um we've got that set up, we've got our functions set up ready to go and it's just going to essentially return for us the embeddings uh just extracting that out.
And so from this section here, which is very long, lots of enters there. Um we can now go and do that for all of our spells. So all i'm basically doing here is creating a numpire array um where i grab all of those embeddings that will be generated and put them into that numpire array. So let's run that and any suggestions for code improvements, please feel free to let me know. Not now, don't shout out.
Um so what that took a little while to run? You've probably noticed that. Um and that's because it was running through each of those spells, sending it off to the model, getting the embeddings, bringing it back so creating embeddings is not going to be instant. Um but it's gonna be pretty fast and it's gonna be much faster than trying to fine tune a model or do continuation training on a model using especially that limited size of a data set.
So we've got that done now. So i should have my spell embeddings all set up. Um let's let's do this, let's um risk everything by writing more code. Um let's uh just literally just print out the spell embeddings to the thing. So yeah, you can see. So numeric data, the embedding space for those different spells. It's a one way process we can convert it into bedding so we can't bring it back again. So we're using this or we will use this to find the data from our database.
Um the original documents essentially need to be stored somewhere which they are, of course further up in the notebook. We have the original list. We still need to have that because we're going to use this as an index and we're gonna get our data back from the original list. So we've got our embeddings. What i need to do is i need to go and put that inside of the actual vector database itself.
So by doing that, i have the capability then to be able to um perform queries on that. So this is a simple line. We're going to create a magic bookshelf. I haven't shown you all of this code have i. This is called the magic bookshelf. And this is the index. Essentially, you would call this index if you're writing proper code.
Um with that index. Now, we've got that and i can just run this. Oh, sorry, that's creating the index. Now, we're gonna populate the index with all of our data and just output the length of that. So we can see we've got 21 spells, documents pieces of text vectors inside of our vector database. All right.
So up until this point, um we've done nothing with large language models. We've only dealt with vector um sorry, with embeddings models. And we're gonna carry on like that just for a moment. So we want to ask a magical question and i know i'm wanting to know this, but how can i become a fish? Seemed like a good idea at the time. So how can i become a fish? That's, that's the question i'm going to ask my spells and i know that the answers in there and obviously in a larger system, you'd be able to ask more nuanced and interesting questions than that. But how can i become a fish?
The first thing that we need to do with that question then is we need to turn that question itself into some embeddings because what we want to do is we want to get the database to query um and do a similarity search to find all of the vectors that it's got, which are all the representations of the spells we've got and find the ones which are closest, like in the euclidean distance, closest to the vector of our question.
So i'm gonna embed my question and just for the sake of um being able to see what that looks like. Let's just take a quick look at that. So there it is, there's my embedded question very long. It's 1000 and something um uh vectors uh uh places in size numbers in size. How big that model? How big that vector is depends on the, on the embeddings model that's being used.
So i've got that now and so now i can go and query my index. So i'm going to say k of four. So i'm looking for the four nearest facts, spells documents, whatever they are the four nearest snippets of things that are closest to the question that i have. So i do that here. So i have k a four, i have my embedded query in there and i'm searching this index that i have so i can just press run on that"
"And it's pretty quick. It's in memory. It's obviously also very small. This isn't enterprise size and I get a couple of things back. So I get this first list of integer values. So these are the index positions of the facts of spells inside of our vector database, which most correlate to this.
Now, clearly, I'm setting myself up for success. So I am asking something specific about a specific specific spell we've got. So I am expecting 11, the thing at index position 11 to be correct and printing out here as well the distances. So the distances between the vector spaces. So inside of the vector database, we've got a big cloud of vectors which are our spells and we've got our query, it sits in the middle, what are the distances to the four closest? And so that is what those values are there.
So we could do something more sophisticated with those if we wanted to. Ok. So that's basically it, obviously, there's more, but that's basically it. We've now performed a query of our vector database. We embedded all of our um documents and we've performed a query. We've got some answers back something which is like the most likely answers are clustered there. No large language models have been used at this point at all. We've used an embeddings model, but that's all.
Where does the large language model come in? Well, it only comes in if you want it to in your application. So we do want to. So let's carry on. So we're going to now construct a prompt and we're going to prompt our large language model to help us answer the question. And so I'm using ginger here to as a templating language, a templating model to be able to put together a prompt which contains our information is a really crucial pattern here as we run this.
So let's run this first of all. So we've now loaded the string. That's basically it. And I'm saying, you know, given the spells provided, um, in the spells tags, find the answer to the question written in the question tags. There's lots of different kinds of prompt engineering kind of best practices. You can read a lot from the different model providers about what works really well. This is sort of taking play from um uh anthropic and the way that you work with claude. Um but we're actually going to end up using the amazon titan model.
And you can see in here with the ginger templating, we're gonna put all the spells in here and we've got the question will be put in here. Let's go and fill that out though. So I'm literally just using ginger. If you're not familiar with that, it's essentially a templating uh library mechanism that you can use quite easily in python. And it's basically just gonna smush the data together with our template.
So now let's go and print out the actual prompt so we can see what it is just gonna do that. Um i think i might need to just put that into there. Let's run that. And so this is now my prompt text prompt that I'm going to send to my large language model, key key concept here, the spells, the facts, the things have been entered into this as text. We are no longer talking about embeddings.
So the embeddings model that was used to vectorize our data was only there to help us do that similarity search. And then we took the text that we got back from that similarity search and we've put it as text into this. The large language of the large language model has embeddings. Well, just to make sure that this is clear before having a rag, we would just send the question like how can i transform into a fish to the yell them?
Now, everything that mike has just shown you is how he recovered relevant data in the vector database to augment this prompt. And this is like prompt engineering. Basically, he is putting more information into the query to the lm to make the lm able to answer accurately to the question.
Absolutely. It's again and that slide that you looked at before. And so um yeah, so now we're going to go and go ahead and put this into a large language model. The large language model ha also has embeddings built into it inside of the multiheaded self attention and all that kind of other stuff completely separate from the um uh embeddings that's used and the embeddings model that's used to get the data into the database.
So we have a prompt, it's just a text string, it's the secret of prompt engineering. It's just text strings and we're going to now send that in to the model. So this is now something more specific to amazon bedrock. I've got my keyword arguments that I'm going to set up again here. So in a similar way that I did when I called the embeddings model, this is now how I call this particular model.
So this is the titan text express v one. So a model that i think became generally available yesterday. Um so it's now available in amazon bedrock. You can enable the model access if you're not sure how to do that. Can we see me afterwards and i can help you do that. Um and we've got all our keyword arguments that we can send in with max tokens and all that kind of stuff.
Interestingly here, i've turned the temperature down to zero. So if you're familiar with temperature, it's about how creative you want the answer to be. And in this case, because we're dealing with specific data, which actually is very likely to be in the prompt. We don't need it to be very creative. We want it to be quite factual. We want to avoid hallucination if we can. And i'm not going to tell you that this architecture can eliminate hallucination, but it will definitely help move away from it.
So we've got, i think i ran that already, but let's run that. So we've got our keyboard arguments there and i can just run it through this section of code here. Be, it's the sort of boilerplate kind of streaming body object response from python if you're familiar with using the bodo three sdk. Um so we can basically call again this line exactly the same line from before. If you remember to invoke model, we're gonna get back our body response and we're gonna load out of that. The generation press play. Pray to the demo. Gods.
Yes, i'm running this live. Yes. They said you should just record a video of doing it just in case it doesn't work. But i have faith and you obviously have faith too. So to become a fish, are you ready for this to become a fish? You need to puff out your cheeks and say bloop, bloop, bloop, i'm a fish. I was hoping you'd do that.
Um i have another question which i'm gonna ask as well. Um so here's the idea. Oh, you saw the um special spell that we had in the middle of here as well? Let's go and grab this one. How can i get started with any aws service quickly? So if i just scroll back up to this, just do, do one more. Where's it set here? is our, ask a magical question asking you about the magic of aws.
So i'm gonna set my question. I'm just going to run these cells again. Shift, enter all the way down here is my new set of um potential answers with number 10 looking like it's uh one that index position number 10 looking like it's likely let's redo our uh prompt and maybe we'll skip past that. Otherwise you see the answer. It's very easy. As i said, this is very simple demo, but i really just wanna highlight how all this works.
And if i wanted to do that to get started with aws services quickly, i open up the console and use amazon q. Have you had a go with amazon q a little bit. Is anyone there? Ok, a little bit, have a go with amazon q. Adam talked about it in the keynote yesterday. Ok. So thanks for watching that.
This basically was to highlight hopefully underneath the surface how you can work and how retrieval augmented generation works. And i think that's really important. There are different ways that you can do this very easily, but that's what's happening under the surface. And i think having that intuitive mental model will really help to be able to debug applications that you're writing if you ride them in other ways. So seeing all that helps you understand what's happening.
But now um i, i'm going to talk about agents before. Yeah. Ok. And so now with our agents, it's back to have the finally the technical architecture complete. Uh the agents are basically um uh they can do actions for you. If, if you give like api s to uh agents, they will be able to perform a task for you so it's not just like you have your foundation model and you just ask it a question. Now, suddenly with agents, you're giving your foundation models arms and hands.
So basically they can, for example, if you ask them for um an item to buy on the retail, like randomly whatever retail company you can think of. Um and then at the end, you, you ask like uh se search for this item for me on this website and then you, you can basically ask, buy it for me and then the agent will be able to do the action for you and depending on whatever action you want it to do. Like you said, a good example earlier, you said retrieve the time. Yes, super super simple one, super simple one, but very useful.
If you want to be able to retrieve the time, then an agent would be the way to do it because then you would give the api to agent and then the agent would do the action for the, ask an lm what the time is? Some of them will tell you what the time is. But of course, they don't actually know and maybe it needs time to do other stuff like other tasks after that.
So um this is a way to make your agents make your l ms more capable to enhance their capabilities. So basically going back to a more wizard. Yeah. So a more serious architecture, the wizard is the foundation model. And in your example, in the code, it was t 10 uh the vector database is so it replaces the book of spells. But in your example, it was the list of spells. Yeah uh that you vectorize and put in a vector database.
And then we have the agents and basically, oh yeah, the pre training fine tuning. So that could be another model. But then basically how those pieces uh interact together is that you send a query uh to an agent, then the agent will interrogate the foundation model and ask basically, do you need more information to answer that question? If the foundation model doesn't have enough information, it will say yes, retrieve some information for me.
The agent will then retrieve data from the vector database. And then as we showed with the code and hence the prompt with the query and the necessary data to answer the question. And now we can finally talk about amazon bedrock because before that, it was all just you know theory like we were talking about general architecture. But what amazon bedrock does for you is that it is a fully managed solution that does that creates everything for you behind the scenes.
So you don't need to to uh vectorize your data. You don't need to create a vector database, you don't need to create agents. Everything is taken care of for you. So basically how it shows here in the console is you to amazon barro, then you can find your foundation model.
So here you can choose between jurassic titan clad common lemma and stable diffusion. If you want to know exactly what each foundation model do, then you have a description of what they are good at doing. Basically, if you want to do a multi language, if you want to use a multi language model, because your use case is to speak with different languages and answer in different languages, then you would use dr if you just want to do text generation and probably clause is the best option for you. If you want to generate images, then stable diffusion is the is your pick.
Then depending on your use case, this is how you would choose your foundation model. Then here we have the knowledge base. So basically this is where you put your data. Um what it is in practice is that practically whatever data you have, it could be html pages, it could be text, it could be json whatever you have that you want to use to enhance the knowledge of your llm. You put it in an s3 bucket and then you give the ultr s3 bucket here and behind the scene be log is going to do the vector, the embeddings and put everything in a vector database for you"
And then you add the agents. If you want to give more capacity to your, to your LMS to perform other tasks, then here are all the APIs that you can give it so your agents can perform the task for you. Awesome stuff.
So now put my pocket protector in and my tie on and we'll talk about security audit and compliance. And of course, that was something that was super important to you when we talked to you earlier on and we did that, that, that poll before.
And so um I'm gonna jump into another architecture diagram at this point and it sort of covers sort of what we did before, but there's no wizards this time. Sorry. So we have on the left hand side, the apps that we're generating and, and I just, I'll pick up on this point as well. We just talked there about Amazon Bedrock and how it can do all of those things if you've already been experimenting around prior to the release yesterday of those services in the keynote.
Um we also have integration with things like LangChain. So if you've been using the open source LangChain project, which is an amazing project and very, very active, busy, a little bit complicated, you can build those applications and there is an Amazon Bedrock LM library that you can use with, with LangChain as well.
So let's for a moment, imagine that that's the kind of thing that we're doing here. So we've got an app, our apps are applications which are running code which are interacting with a large language model and they're also interacting with data sources. So very typical kind of thing that you might do with LangChain.
And if we see a typical flow, so we're thinking with our security hats are now about where connections are being made and where data specifically is flowing, where is the data? So someone asks our app a question that relates to data that sits in the data sources that we have.
And so our app will take that question likely if this is how it's been architected. But this is common. The app will take that query form a prompt around it. It'll do some prompt engineering with a template like we've seen and it'll send it to the large language model. Basically, I have some natural language. I'm a Python application. I don't know what to do with that large language model. Please help me out, tell me what's happening.
And the large language model again, depending on how it's been prompted can respond back and it might respond back. Say for example, with a SQL query, I really want to make sure we understand this. Like RAG doesn't have to be a vector database. It's just what everybody likes to talk about. You can do RAG with SQL database with a CSV file, a text file, whatever so it could return back. This is how to get the data that will answer that question. This is how to work with your SQL database.
And then the application will say excellent. Thank you. I know what to do with a SQL query and run that over the data source, get some information back and it's a new number or it's a table of data or whatever it is. So it could send that data back to the large language model so that the large language model can then process that in context and say, ok, well, based on the question that was asked, you've now run that query. Thank you for that. Now, we've got this data and send a natural language response back to the application so that the application can do what it needs to do next. And maybe this is responding to a user inside of a chat session, whatever you might be wanting to do.
So that's a pretty typical flow, especially if you're using something like LangChain. A lot of this will happen behind the scenes if you're using something like Amazon Bedrock with agents. Um what's really important to consider here is what's happening here. So all of this data which is flowing to and fro i sort of talked about it at a high level just then. And now, but I want you to consider what data is included from what data sources inside of these transactions, which are happening with the large language model.
And in actual fact, uh especially, I mean, if you're using and again, I mentioned LangChain and I do, I like LangChain, I think it's really cool and it's a little bit difficult sometimes to see what's happening in the back end. Um and so if you do peer into the logs as you're working with this, what you can see and i, i'm specifically talking about like an experimental sequel chain that i've been working at. It will say, ok. Um when it sets up that initial prompt, the very, very first one. So that sends a query over to the large language model. It will say I'm a large language model. Sorry. No, you're a large language model. Um you don't know anything about the answer to this question, but you can get the answer from this database i have access to and it does say this in natural language. It's freaky how we program these days.
Um but it also says i have access to this database. This is what this database looks like. Here is the table uh show table construction depending on what kind of table it is here is the first three or four or 10 rows of data from that database, all of it. So the the large language model has the context and can understand how to perform the query. So it can create a syntactically correct SQL query. It's the only way it can do it.
So let's all just be clear about what information is traveling from the application to the large language model, sensitive data, potentially customer records, not necessarily that filtered. And so that's fine. As long as we know that's happening, as long as we know, we're keeping that secure. So the security perimeter that we have around a large language model is important and uh where it is and how it's operated and whether we, whether it's in a trusted zone within our architectural makeup is important.
And the idea here is that we really want to wrap all of this, not just the application and the data sources, like we've always done forever in security, audit and compliance, but it's also the large language model as well. That is absolutely part of this. It's not just providing interesting chat responses. It actually is seeing and no, uh it's seeing and the system that, that runs on sees sometimes that sensitive data which is fine as long as we know that and we're putting the appropriate security controls around it.
So in response to that, I mean, you need to look at the security of the architectures you're putting together. If you're hosting your own model, then you can look at which server that's on. If you're connecting with services, you need to know where those services are and make sure you've got the necessary compliance in place that meets your requirements.
Inside of Amazon Bedrock, there are services, there are capabilities to help us with this. So part of this is on the uh uh logging, auditing, logging of the models. So you can turn on logging for all of the invocations and responses that you're gonna get back from these large, from these foundation models. So we can turn on S3 logging, CloudWatch logging or both.
And so we can end up with all of that data saved, which is maybe something that's really useful for a compliance thing. I've got to tell you from a debug perspective. It's amazing. It's great if you're working with LangChain and you're a little bit frustrated by not being able to see exactly what's going on. Use it with Amazon Bedrock, hook up this, you'll see everything which is going on all of the prompts and the templates and the combinations of things get put into there. So that's super useful and this combines in with your existing security um posture and your existing uh logging uh solutions.
Um it either goes into S3, you can take it out and put it somewhere else, goes into CloudWatch again. You can take it out and put it somewhere else or keep it there or do whatever you want with it. So that's the sort of logging and audit from the perspective of the privacy then, um and more to the point if you're working with compliant workloads.
So if you are working with um stuff that is sensitive to you and you're highly sensitive about where it goes. One or if you're working in a regulated work, uh a regulated industry where you can't send data over the public internet, even if you wanted to, even if you had all of the SSL layers and everything you were pretty comfortable about, maybe you still can't.
And so if that's the case, then we have this option for you as well. So let's just be clear about it when you're connecting to AWS services like S3 and DynamoDB and all those things. The default position is that you're connecting to a public endpoint, right? It's still authenticated and it's still secure and you've still got encryption and it's fine. But there are these architectures that you may be familiar with when you have an application inside of a private VPC, which doesn't have any access to the internet. You are able to use these gateway endpoints so that you can get through to these services.
So with S3, the easy one, it's a free one as well. You can have an S3 endpoint inside of your VPC so that your traffic goes directly to it. And you can do exactly the same thing with Amazon Bedrock as well or a very similar thing with Amazon Bedrock as well. It's called PrivateLink, it's available for other services as well as Amazon Bedrock.
So you can have your application sat inside of your private VPC. You can have then no internet gateway there. So it's got no access to the internet because it's your regulated workspace and you can have a private link endpoint that allows you to go directly to the large language model inside of Amazon Bedrock.
Um and then that connection is not going over the public internet and it opens up the possibilities for you to be able to host regulated workloads. Um and of course, I am Identity and Access Management is there for you to be able to define your security perimeters.
So we're basically having this architecture with this enormous power of an enormous large language model there. But uh and you are completely in control of the security picture.
I spoke for a long time. So a last poll for you. Um what do you want to explore next in the realm of generative AI? Yeah. And then we're gonna talk about predictions about what's coming up, super interested to see what's gonna happen here because you can select more than one. And so maybe you wanna select all of them or none of them, we can see, we can see what the results of that coming up are.
And we're gonna have um not really any time for questions. I'm very conscious of your time and some people will need to get to the next session, but I'm more than happy tiffany and I'm sure I can speak on your yes, you can speak on my behalf. We will both be here so you can speak for yourself. We'll both be here afterwards and we can take questions and talk to you in a moment.
How are we doing with those poll results? Oh, it's very balanced. It's uh yeah, it's surprisingly balanced. Building agents using RAGs with agents. I think. I'm not surprised to see that building with a agency is sort of like that leading one. If I'm honest, I think that's probably one of the most exciting things that's going to be happening, giving capabilities to it over the, over the next, particularly the next year.
Maybe we can move to predictions. But if I press this button, nothing happens. Can you press a button? Thank you. All right. Where to focus? Where to focus? I think my wizarding skills maybe. No, let's go.
So yeah, as we said, and you answered that uh also in the poll, so customizing solutions with agents and RAGs obviously. Um but also it was your concern at the beginning, security and this was this, this is actually a big topic. So focus on governance and security. I think so.
Yeah, i, i think, i think the, we've, we've experimented a lot with large language models and generative AI especially this year and i think that we should continue to do so because i think we're still unlocking the capabilities of it and we're still figuring out how it works for us.
Um but it's now kind of time to switch to production mode. And I think a lot of what we're talking about here fits into that space models will get more sophisticated and add modalities for sure. I mean, the we've seen like the exponential curve of, of LMs being more capable of, of answering questions and for sure that we're gonna see breakthrough in the next following years, even months.
Uh generative AI will unlock projects that were not previously feasible. Yes, I mean, we were talking about this yesterday. We were and I think this is something as well that I took a little bit of a lead from this. I'm not going to pretend this is necessarily my own thought.
So andrew ink, who is a very famous luminary in the machine learning world is quite interesting talk actually on youtube where he talks about what's currently happening in the space of AI and generative AI and I think one of the messages from that is that the capabilities of these models, we talked about it before about how they're capable of doing so many things, not just there is not one thing that they're trained to do.
So we are now in a position where we can start to look around our enterprise, social enterprise projects, whatever we're working on and finding those data sets that actually have value in them. But it wasn't to get that value out before. Now, we have these pre trained models, someone has done a lot of heavy lifting for us to make these models. And i think now we're being able to do things which weren't feasible before, economically feasible, economically. And just like, technically it's much more easy now.
Absolutely to do those things. All right. That's a wrap up. Thank you very much for so much.