Building applications with generative AI and live data

Hello, everybody. How's it going? Oh, hi, surprise. Um welcome to Building Applications with Generative AI and Live Data. Uh my name is Memo Doring. Hello. Hello. My name's Mike Chambers, Developer Advocate at AWS Services. I think we had an alternative name for this as well. Yes, it was, it seemed like a good idea at the time. Maybe you'll see why. Yeah. And, and then we built it and it was maybe less of a good idea. But um as you can see, we've got a little, little demo up here for you all.

But if you, if you go to the Expo or if you've been at the Expo, uh you could have played with this as well. Um and we wanted to go a little bit more into detail because it's a little bit of a black box. So that's the inspiration behind this, something that was actually running and that people could, could play with. Um so we could go one a couple of steps beyond a proof of concept.

So. Oh, straight into the demo. Absolutely. Ok. So we have some code to show you running. Um and I have some bre buttons to press. So if I just press this button here, hopefully you'll see my um, I got VS Code and I'm just gonna press run on this. We're not stepping through the code. You do, I don't wanna do that to you.

Um so if I press play here, hopefully everything will work for us. If you were in earlier, you might have seen some bits and pieces working. Um, who was in earlier and saw some bits and pieces working. Uh are they holding up their hands? Memo? You were here, right? And it was ok. All right.

Um can you talk for just a second? Because for some reason that's not working, I'll, I'll tell you a little bit about the demo and what we're, we're trying to do. So we have here is just a little armature with a regular webcam and a regular wooden toy. Uh and what we wanted to do is take live data in the form of images and use generative AI to generate an output.

Um and what we also wanted to do is what usually refer to chaining combining multiple uh large language models or foundation models to have a more complex output. Um the way it works is that we take a picture of the, of a character. Um if you go down to the expo, you'll find multiple characters. We only stole one for our session. So we only have the blue elephant, but you could pick different ones.

Um, and you're, you're good. Yeah. Well, I hope so. So, what we're looking at here is the app. So this is the app that i was just trying to press play on. Um, and like Memo was saying yes. So we've got, um, of things. We've got this camera, he hooked up and we've also got a big button because we're trying to make an interactive demo something which is hands on which can excite people without having to bring them to a keyboard to have to type some stuff.

So with this set up here, if I just press the button, it works, of course, it works. And now we can see a camera feed of elephant sat underneath here. So what happens? And how far did you get through the explanation with what happened?

I i i'll talk about the, the, the the experience of what happens here. So when i press the button, um a picture of the uh character in this case, blue elephant gets taken, it gets looked at by Amazon Recognition, it then speeds through. So what you're seeing here is happening in real time and it is now inside of Amazon Bedrock, the idea that we're looking at an elephant and that we're at re:Invent 2023.

And so Amazon Bedrock using the Claude v2 model is in the back end, currently constructing a story hopefully about blue elephant um or not as the case may be so we have this issue here where um we've trained a model specifically for the show floor because the lighting on the show floor is a particular way.

Um and so it's definitely not a blue elephant, all the good stuff. So you might, you might recognize that trademark character up there. Um not a blue elephant, i can talk to why that's happened. So basically, this lighting set up and what we have here is not working with our custom uh custom trained recognition model.

And so it didn't detect unfortunately blue elephant. And so what it's done is Claude has decided that because um we've not got um a character there and it's gonna hallucinate for us a character instead. So these are all the kinds of challenges that we've had to face and hence, it felt like a good idea at the time.

And so it constructs a story for us. Um and um you're gonna try doing it again, why not? I, i must look for a pen. Yeah. So the part of the idea was we wanted a single button press to be the interaction for like something that customers could play with. No typing, no prior knowledge required. And then for people that were more interested and, and wanted to know more about this, we would have this session.

But if you were just, what is this LLM thing, what's show me a practical application of it? We could do it um on top of the data that's coming from the camera. We're also taking a feed of headlines from re:Invent. So everything should be re:Invent or AWS themed.

Um, so you'll see here this one, it's called an AWS mascot this time. I think you can scroll up there. And so it, it didn't do a trademark character, thankfully. Um, now there's an AWS mascot that appears to be a blue iguana so that there you go a lot.

Um this is like Mike said, this is to be expected because we have a model that was trained and we'll, we'll go into detail in a little bit of how we did it, but it's trained for the show floor. This is completely different lighting, a different surface. Um everything is, is different. So it's, it's having a harder time finding the blue elephant.

And then for us that we look at this and we think blue elephant, but uh a model would think toy uh kid et cetera, et cetera. So it's not a real blue elephant, right? So if we had aaa real blue elephant, a, um it'd be pretty fear inducing and b uh the model would behave better.

Um when we were trying to take a picture of it. Yes. So i put us back to slides. I'm feeling slides might be a bit more deterministic in the example. Yeah. Yeah. Uh slide uh generation is definitely not deterministic.

We also have live polling for, for all of you. So if you're interested, you can pull out your phone, use that QR code to do polling as results come in. Um, we'll get a, a signal from the back and they'll, they'll let us know and we'll, we'll start discussing them a little bit. Uh, they're all multiple, multiple answer. Um, no, multiple choice. Sorry, no wrong answers.

Um, we just wanna gauge your opinion on a few different things so we can talk a little bit more about it. Absolutely. So take this, if you've, everybody's got it, we can switch the screen, we've got the screen there. I can see the answers.

Are you using generative AI in your workloads or workflow today? No. Yeah. So about a third of you are using um gen gen AI for your workloads today. Um I don't know why I'm looking up there if I got it down here. But um and it's dropping. So it's even even less than a third.

I think that's to be expected. It's a new technology. And what we wanted to talk about is a little about the architecture that we use, the technology that we use in an AWS context for how we built this. I don't think anyone is about to become a millionaire off a stories made from for a blue elephant made out of wood, but I'm I've been wrong before. It won't be the first time.

Yeah. So that's not the, the, the results from that poll are something which I think I've seen throughout re:Invent as well. So, um obviously a big focus of re:Invent this year, if you haven't noticed has been generative AI and I think a lot of us have used the tools and played around with chat bots and experienced the kinds of things that large language models can do.

And now we're in a mode of, well, how can we actually practically apply this into an application stack into a business context. And so whilst we're talking about toys, there's a lot of that in the undercurrent of what we're talking about today. And Mike and I are silly. So we wanted to do something funny, but um we did want to go into detail on how we built this and, and there's a method to the madness.

So you'll think, well, this uh this little armature is a bit of overkill, but something that we wanted to make sure is there's people walking through the expo hall at all times, we didn't wanna capture someone's face uh by mistake or anything like that. We didn't want to share or store any sensitive data.

Um so we, we mounted it in a way where it was, you'd have to pick it up and point it at your face. And actually the one that's at the expo floor has like a little board at the bottom. So even if you try to point it at your face, you couldn't do that.

Um but there were also architecture, architectural decisions that we made based on that. So instead of saying my, my original idea was we'll take a picture, we'll put it in an S3 bucket that'll trigger an event and off we go to the races. But that means i'm storing images and if someone somehow manages to throw their driver's license on there, now, I have sensitive data.

So Mike came up with a brilliant idea of having a little light client. And what the client does is it makes an API call to Recognition or through API Gateway into Recognition. But we're passing the image as a byte encoded string. So we're never storing the image and we don't log the that byte encoded string anywhere um through the process.

So we don't have a CloudTrail where it's not in Cloud logs or anything like that. So the image doesn't necessarily exist anywhere in the system. Then we use Recognition and we get a json object back saying um blue elephant when, when it works properly.

Um and then what we do is we're orchestrating everything using uh Step Functions. So instead of using uh a framework or anything like that, we're doing everything inside of r AWS account using a wo services to, to manage the whole thing, we're using Step Functions. It's calling Lambda functions um to invoke Amazon Bedrock and that's where multiple LLMs come in.

As, as Mike mentioned, all the text generation. All the story generation, the descriptions for the images are being handled by Claude v2. And all the image generation is being handled by Sta uh Stable Fusion SD XL 0.8 0.9.

Um once the images are generated, we put them in three buckets, we have a Lambda function to help us stitch everything together. What you are seeing uh in the presentation was an actual web page that was being rendered with the images and that's why we could control it slides or it refreshes itself.

Um and we can present that on uh a different monitor. All of this is uh also controlled by the state machine and using Lambda functions we can return back. So we all, we, we couldn't figure out how to do it up here.

"But we also have an output window where we can see all the debug stages. We say, ok, you're at this point, this is the a rn for the for the function that got called, we generate the text, et cetera, et cetera. So we can see everything and we can, we can um we have pretty good observability into the system as it's as it's flowing through it.

That's right. So this is a technical setup that we have on the show floor. So if you want to go and take a look, you can play around with more than just blue elephant. We've also got all the other animals too. Um and as memo was saying before, they, the model there is trained, um it's a recognition, custom uh custom labels model that has been trained specifically on the show floor. It actually was i created this um when i was uh i live in australia and that's not important particularly, but i wasn't here. I created it at home and trained the model up and it was working fine when we got to the show floor.

We also had the same problem. It was actually really easy to be able to retrain a custom model with recognition, custom labels just overnight and just come and bring it in and get the activation working. It's been working flawlessly all week apart from in this room. So i'm sorry about that. And i, i think the, the, the way you train the custom labels is, is also interesting because it didn't work on that piece. And then we were at in the show floor on sunday night um before it opened and we were doing some testing and it broke and mike was like, oh, i need to, to test to get more data, to train a new model. And i was like, how hard is like, are we gonna be clicking through a bunch of pictures with our iphones or with our phones or whatever? And uh you came up with a pretty, i felt uh innovative solution.

So using the same camera, we just took video and we moved the animals around in the, we were in the live space and then mike chopped up that video and the fact that it was a little grainy and sometimes it was out of focus was actually good for us because customers are gonna come up and they're, they're not necessarily gonna put the elephant dead smack in the middle. Um it might be even a little bit out of the frame. They might put a couple things around. We had a glass here or, or something like that. And then the model was a little bit shadows, shadows the thing as well. So people standing close to it with shadows. And i think for, for this one, probably what is the biggest change is the the bottom or like the different, it's black instead of white and then the lights are coming that way. So the lighting is actually much better in here. We should have the whole thing in here all week. Um so yeah. Um working with live live data.

Yes, i think it's ok. So this is where we could talk a little bit more about the practicalities of um of working with large language models and live data. So we'll, we'll pause blue elephant there for a moment. What we're doing here really is using retrieval augmented generation and that might be a stretch for some of you. I can see some smiles in the audience, but one of the real key things and one of the main messages that i wanted to bring to re invent this year is that retriever augmented generation is super, super powerful. It's super useful and it's really the the forefront of where we're currently adopting large language models in a business context, sort of outside of research. And it helps us to be able to get more accurate results with what is very much an undetermined thing of a large language model.

So the way that we work with large language models, and you might be familiar is that we prompt them. So we, we, we write some text. So prompt word, way of saying text and we send that into the large language model. The large language model then will take that prompt and will create a response for us or a completion. And it does that by taking the prompt and running it through the large language model, uh getting a next token or a next word and then adding that on and then roaming through and through and through all of this amazing technology we're working with is predicated on the idea of just generating the next word.

That's fantastic. We've seen um and we acknowledge and we work with hallucinations with these things sometimes as we've seen today when it's not given input or um when it's allowed to be too creative, it can hallucinate and come out with something which is not necessarily what we wanted. And there's a number of different methods that we can put in place to try and mitigate this. And earlier on this year, maybe late last year, we would be looking at training our own models, doing fine tuning and getting a massive data set and pushing it all in. And then a lot of us sort of took a step back and, and said, well, how about if we just put the data inside of the prompt and this is in context learning.

So taking data that we have that wasn't necessarily available to the large language model at the point of training and putting it in the prompt. So then the large language model can see it at the time of inference at the time that we're making the generation and the data that goes in there becomes more prominent and more at the front of the mind. And i'm anthropomorphizing large language models, please bear with me. It's further in the front of its context, context window and the data is there and you, you used a few technical terms there.

So let me just uh go over them. So for those that didn't earn math wizards like mike, uh something that's deterministic is you're always gonna get the same result with the same input. So anytime you ask me if i want coffee, the answer is gonna be yes, no matter the time of the day, that's deterministic. Um if you ask me if i want pizza that might change if i just ate or if i'm about to go to bed, et cetera, et cetera, that's not deterministic.

So, the, the la large language models in general are non deterministic. You can make them closer to deterministic or, or deterministic depending on the model by how you configure them. But that's, that's something that's different when you're working with an llm from a certain api. So if i make call to an api and i ask it, how do i get from la, longitude x and y to la longitude a and b? It most likely will always give me the same route. But if you start thinking about something that is context aware like traffic, it might give me different routes and that's non deterministic.

So it's giving me different outputs depending on the context with, with ll ms with, with foundation models that is, that is very common. And then context is what we call the ability of a foundation model uh to see how much data it can capture. And that is gonna also vary model by model and the types of data that you can share some of them. Uh if you were uh at the keynote um models from anthropic, you can give it uh large book as part of the context and it'll take it all into account with your prompt.

So absolutely. So, so the amount of data that you can take from your business system or from wherever can actually be quite sizable often with prompts especially when we start to play with large language models, we're typing in a sentence or two and saying, will it do this thing? You can actually put in quite a lot of, of data and, and to the point of nondeterministic, we'll look at that a little bit in the demo as well. Someone once said to me, it feels like large language models have got a random number generator in them. They do, that's what they have. They have this random number generator to help make them feel more creative, more like humans. And sometimes we need to kind of turn that down and dial that back in terms of then where you can get the data from.

Um well, it's wherever you've got that data and with retrieval, augmented genera uh retrieval augmented generation, um quite often you'll hear conversations probably, especially this week by people who are really excited about this technology, including me that you can get it from vector databases. Um and you can absolutely can. But a lot of us don't have vector database currently. So it's important to remember that we can also get data from other systems. You can get it from a sql database. You can get it from an api that you might have connected to. Goodness knows anything. You can get it from a cs v file, you can get it from a piece of python code that we put together for a demo like this. It's all similar kinds of stuff. If you can search a system, get some data back, put it into your prompt. That's really what of augmented generation is all about.

Yeah, you can even get it from a picture of a blue elephant, which is the the route we chose to take. Um so this is uh a quick screenshot of the model working with the custom labels. So we, we knew what are the toys that we had were. So that's what we trained it on. And it was very successful at finding those with within a certain parameters uh which were the, the show floor.

That's right. We optimized for where we believe that most customers would interact with it. And we weren't too uh afraid to fail here in front of all, all of our closest 300 friends um when we were talking about this. So this is an amazon recognition spelt with a k if you're looking for it. And it's a pre uh well, it's, it's a image recognition object detection service.

So it actually out of the box, there is pre trained models to be able to make recognition, to be able to make detections on a number of different things. So you can take um an image, just a standard image of just a crowd of people or a city street or whatever. And if you look in the console, there are actually some demos in the console. So it's, it's actually very interactive and it will recognize common, common objects.

So people and cars and trains and i don't know, sunglasses and mass, that kind of thing. But if you want more, more specificity or if you're working with something like this where it's a st elephant, that's where we had to train our own model. Um and this is more traditional machine learning, not necessarily on the generative a i side, but if you wanna abstract it all out, we just send an image up and we get a json object back that says there's an elephant here or there's the aws mascot, whatever that uh monster was in what you saw generated.

Yeah. And, and recognition, recognition, custom labels has the user interface to be able to generate your data set. So if you upload images and as memo said, i took a video on the show floor where we moved the elephant in the room. We hadn't said that before we moved the uh all the um different characters around. And then we took screenshots from those and i was able to create bounding boxes around them and create the data set like we've got there.

Yeah, cool. We have another question to add one more. Paul. What live data could you unlock with generative a i? And, and i, i hope that you can now sort of see this in a frame um which is a little different maybe to some of the other presentations we've got. Um you can connect to any data source. You've got, as long as you can get some kind of textual output to it, you can use it with in context learning which becomes retrieval. Augmented generation. Your live data can now be used and, and something that i um the more we worked on this and the more that i've done with gen a i uh i think it's important to, for all of us is there's a whole new vocabulary that comes with, with these technologies, but it doesn't mean that they're necessarily harder than what we were doing before or they're completely separated. There's, there's ways to just uh abstract those in.

So you'll see, you'll hear things like zero shot or um one shot et cetera, but they're just, they're just strategies around prompting, for example, um that, that are the state of the art today, but we didn't even know about those maybe 12 months ago. The really exciting thing right now is that lots of us are in a very much an experimental mode. So let's let's just continue to push forward and what data we can put in there.

So customer database, um the biggest one we have here is analytics and reporting, which is super interesting. So that's interesting because of course, a lot of the people who are making analytics and reporting services are using a lot of these services to augment what they have um including things like quicksight. So putting your data into quicksight using ja i as part of the product. But of course, we can also build these solutions ourselves as well. Awesome. And the wiki was a, was a, was a good one. Alright. You're gonna press this button, i'll press this button.

Um so we thought we'd just do a quick demonstration of some of the different components of what we're doing. Um oh the demos. Not yet. Yep. It feels like we practiced this, right? We a big component of what we did is around text and a lot of what you'll see around foundation models is around text generation or um so we, we wanted to split out the different pieces uh as we use them. Here is the remaining transcript formatted for better readability:

So we're using um can't remember uh we're using a large language model from our partner anthropic through a service that's called bedrock. Uh and in case you're not familiar with bedrock, what it allows you to do is to make api calls um into a large language model. So you don't have to stand up your own infrastructure. You don't have to go um negotiate a deal with a vendor for a lllm. You can use, you can use it through an api it's within your aws console.

Uh we're using cloud b two and it's a text generation llm. Um the, the people at anthropic use something that's called constitutional a i. So they've trained or they've included um big data sets around um the different alignment uh that they want from the, from the model as they're training it. So it uh returns results that are positive and uh are ethical in, in, in the way they approach it.

And then 100,000 tokens is the context window for. And i think that's outdated now because like the newer versions of clad, i think 2.1 has 200 k tokens as an over simplification. You can think of a token as a word. Uh a really large word like architectural might actually be three tokens. But um if you wanna just kinda ballpark it a a token, so you can pass this model 100 k tokens, your prompt included and say, give me a response.

So it could be a whole sales report or something like that and say, "Hey, give me the, the data from this report specifically in um in a different format or uh extract a certain piece of data from it." And you're just working with text and prompts what you get back or what actually gets generated. Looks like this. It just aaa regular json object, the request.

Yeah. Yeah. So this is the, the prompt and there's a few parameters there that you'll notice and you'll hear them around the industry. Um the temperature, top k top p and a stop sequence temperature is literally how spicy you want your answer. So the higher the temperature, the spicier it's gonna get. And by spicier, it means that it's gonna be less strict around. what is the most probable next token?

So, if it, the temperature is low and i say, uh, the brown fox jumps over, it'll most likely go the lazy dog. But if i turn up the temperature it, i could say the brown fox jumps over and it could go the jet airplane. Yeah, the jet airplane or the circus on the moon. Right. Like, it just, it gets very creative and you can play around with those. It's not all or nothing.

Um some models when the temperature is very low, they're actually deterministic. So the same prompt will give you the same response. It doesn't hold true for every model. But um that's one way you can tweak them. Another way is top k and top p that are both related to, it's still gonna be he how heavily weighted it'll be around the top, um, or the most probable next tokens. And it's two different ways to slice the data. But you can also, you can have a little bit more uh variant uh in there without going and moving the temperature entirely and you can play with all of these just like you would with an api or if you were in photoshop and you were uh moving the gamma and the contrast et cetera in a, in a image that you're, you're gonna, you can use these as well.

So this is the code that we've been showing today. And an example of how i put together the prompt and then an example which truncated because i needed to make it sure we could get it on the slide of the response.

So on the left hand side, we've got the prompt. Now, this prompt is written in a format which is used with anthropic and the claw models, which is a conversational chatbot or chat conversational model. So we actually start the prompt off with human colon and then later on further down and we can't see it because we couldn't fit it on. We've got assistant colon and that's at that point that we're actually asking the model to come back with the, with the generation.

But the thing that i wanted to show you here, um and the thing that's really useful for the architecture specifically that we put together was you can see in the middle of that prompt. Um we've got a um a section where we specify some jason output or specify a jason template. So we say, um and, and, and just in case there's anybody hasn't tried this, we are working with text prompts, we're talking to an llm in natural language. So these aren't comments, this is actually how we're getting the system to work. And so halfway down here, it says, and here is how to provide the output in this jason format include at least three items. So that's three pages of the story in one story body.

And so then we've got that. Now, just anybody watching the, the curly braces with the raw written in it, that's because we're using ginger to do some templating. And obviously, the uh jason structure would mess up that otherwise. So this is basically saying ignore templating for this part.

Um but it's giving that jason structure of what we want with the title and the body and the image prompts inside there. And then on the right hand side again, it's truncated. These are all great elephant jokes, by the way, truncated output.

Um we've got the output showing us what it actually looks like. So we do have blue elephant. In this particular example, ellie goes to re invent and you can see the jason structure that's been output. So we've been specific with claude to say that we want adjacent output so that we can then programmatically work with that in the subsequent parts. And i think that's a very important distinction because we wanted to chain things together. And we wanted to have this be part of an intermediate part in the program running json was a good fit for our format, but we could have said anything xml uh formatted as a table that i'm gonna throw on a, on a website, whatever we wanted to do. In our case, it, it was just the middle step into something else. So we wanted to have json so we could go dot notation or whatever and pull out elements from it from um step functions. Awesome.

So i can give a bit of a demonstration of um using claude. Um so if i can just press down here and everything works for us. So i just thought um i'll show you briefly using claud claud v two and we can look at some of the inference parameters or the, the, the parameters that we saw in the um payload before like temperature, for example. And we're using it inside of amazon bedrock, which of course is what we're using in the back end for our project as well.

Um amazon bedrock, if you've not seen it and maybe just a quick recap. So it's a collection of foundation models. So um text generation models, but also image generation models which we can use inside of our application. And it makes them accessible via an api end point much in the same way as you might find with lots of different aws services, just real quick recap.

If you've not used it before going into the console page for the first time, just scroll down to the bottom and have a look at model access. Um and this has been an extremely exciting place to look at over the next couple of uh last couple of days. This is more models have been released. So you do have to come down here and select the models you want to have access to. And in this particular account. That's all of them for me because i'm super excited to use all of them.

Um, so we can see that we've got models from different providers like a i 21 labs. We've got our own as well. Amazon's own models, but that's not all. So we do have many models from different providers. So there's anthropic models there with claude and claude instant. We've got cohere, we've got meta with the llama models in there and that's just exploded recently, we've added more models into there and we've also got stability a i with stable diffusion.

So sdlsd xl 0.8 is the version we're using for this uh particular demo. So if i just scroll up here and go and look at the menu on the left hand side, we've actually got playgrounds. And whilst this is uh great for demonstrations like this. This really is the beginning of the developer journey for building in foundation models, large language models, interation, the experimentation is required you need to experiment with different models, the different models you have available and you should also experiment with different prompting and prompting structures. And it's much easier to do that in a playground like this than it is to write your own code. Deploy something, test it. No, ok. Roll it back, do something else, do something else.

So this text playground is, is not just about trying to demonstrate what these things are capable of. It's the beginning of the development journey. So if i just um maybe collapse that down to make it a bit bigger for us, we can go ahead here and select a model and i'm going to select, well, we'll select the one that we're using for this uh particular demo, which is claud v two. And we've got all of the different um uh attributes to do with the models that are available in here. Of course, with all the documentation as well.

Let's go ahead and use actual claud v two, which is the one we were using and click apply. And so we end up in this um text playground here, there is also a chat playground and an image playground, but this text is kind of like the, the rarest you can do. I think it's probably the most useful if you're right at the stage where you want to start building your first application.

So we can type in a prompt into here. We can also change some of these configuration options over on the side and, and making generations with large language models live on stage in vegas is an amazingly awesome experience and you should really try it.

Um let's see what this does. Ok. So we could say um and there's some reason for this, but let's say, write a summary, you're gonna freak out here. Write a summary um about las vegas. Um and we're just going to put that in. Now, what's gonna happen when i press run is um it's gonna actually change the format. If you remember before, when we were looking at the prompt template, we've got the human colon stuff and then assistant colon. And that's actually the format you need to prompt with claude. Otherwise it will return an error and the um the console here, the, the playground knows that.

So when i press run it will reformat it for us in that way. And those two line breaks at the top incidentally are intentional as well. And so then it's coming down with its um with its generation now it stopped there. Um and it stopped after um the city's tolerance is for numerous and maybe that was good that it stopped, but it stopped there. Um and you notice it's not generated any more text.

So now's the time for us to go to the site, stop reading the details. I don't know what it says that we're gonna have a look at the configuration on the side. So on the side here, we've got these things that we were looking at before in the uh in the payload that we were sending. So we've got temperature top p, top k, we've also got maximum length. Those are the ones that i kind of want to focus in on.

So temperature as memo was saying, is all about how creative you want it to be. I mentioned before about large language models having random number generators in them. This essentially is um telling the large language model how much we want that randomness to actually influence the output. We can go into this in a lot more. I'm happy to stand on the side of the stage later. If anyone wants to come and ask questions about the details of how this works, but really just turning this down is going to make it more deterministic, turning it up will make it less when writing a story like we're doing here, we'll turn it all the way up and see what on earth can happen and have all kinds of fun things happen.

If you're trying to do some um something where you're doing sentiment analysis, for example, and you want a single word output, you might want to turn this down if you're generating code because that's the application you're writing. You might want to turn down the creativity there too and i can show it working kind of in a minute. Um hopefully, and we've also got maximum length.

So you'll notice that we've sort of cut off um halfway through the generation. And that's because this maximum length is set to 300. I could set this much higher and we could delete this and try again. And in that instance, it should probably go through to the end. It's not setting the number of tokens that we will generate. It's setting the maximum number of tokens we might generate. The tokens are mostly just words, but there are some really special ones in there as well. And they might uh the, the model may generate an end of sequence token basically saying i've done what i think you wanted to happen. So i won't go any further. So that's why it's maximum tokens, not absolute total tokens.

Um let's ramp that all the way down to something quite small. So, so that could be quite useful from a cost determination, determination perspective because um most large language models and in amazon bedrock two, you pay for the inference based on tokens, tokens in tokens out, there are provision through puts as well, but basically it's tokens in tokens out.

So let's change this to write a one word summary about las vegas and i've done this before. It's ok. The word's ok, usually, ok, let's see what happens. Let's turn the temperature all the way up. Um and press run and hopefully, yeah, we get one word come back, which is useful. Um and it's entertaining. Fantastic. Here is the remaining transcript formatted for better readability:

So let's go ahead and run that again. I'm just gonna run that a few times. So it says entertaining. Oh, it says entertaining again, we're looking for it to say something a bit different.

It's still entertaining and hopefully the live demos that we're putting on here are entertaining too. Let's make sure that that's there. We are rolling a dice. And so I guess in the grand scheme of things it's coming back to the same thing. The idea here is, oh, there we go. Fun. Yay. Let's go for something one more. Is it gonna say entertaining? Who thinks it's gonna say entertaining when I press run again? We got at least, yeah, we've got several hands. Okay, let's see what we can do. I don't want to disappoint you but at the same time I do. Hey entertaining. Okay, so the idea is it's making a random roll and it's really trying to come up with something different.

I'm just gonna tell you and it probably will work if I turn the temperature right down, it's likely to come up with the same thing every single time. I'm tempted not to press run to see if that's actually what happens. I mean when you're when you're demoing something which has actually got something random in the middle of it, what are you, what are you supposed to do?

So fun. That's good. That's fine. It one more time. Who thinks it's gonna say fun? It should say fun I positive people thank you so much. Alright, let's press this yay. Okay, so I'm not gonna press it again cause I'm on stage but it it's actually a bit tricky to test this temperature thing at this kind of scale. If you're generating much more text then you do see some differences. And so I just wanted to put that in the demonstration and hey, the demo is fun, right? And I think it's important to highlight that.

What Bedrock allows you to do is to test all the models this quickly, right? You don't, Mike didn't have to do anything special. We didn't have to spin up an EC2 instance and didn't have to do any, anything like that. You can just think what model do you want to test? You can type in, start doing your prompt engineering. And a lot of the work is that like, how do you write the prompt? That is the ideal for what you wanna get as an output and you, that's where like saying, okay, I want the output as an adjacent object. Here's a format, etc, etc using the right data in the context windows. It helps. But once you're fairly confident you found the right LLM and you have the right prompts, then you can start building.

So you don't have to spin up a whole infrastructure, a whole app to do it. You can start playing around with it today. And your, the, the pricing model is per token. So each one, each one of them is a little bit different, just like different APIs the, the values for temperature for top, for top k etc are gonna be different for each one of the models, how the prompts need to be formatted, what's gonna work better, etc is gonna be per model.

So we do encourage you to play around with as many as you can. And that's the whole point of Bedrock. In some cases, the models that are in Bedrock is the easiest or the only way to gain access to them. So, there's, there's a big advantage there as well.

It's overlays on top of it. You can, it's very, very popular. It's very handy. I like it but it's not a requirement. But you, with the industry moving so quickly it almost appears like, oh everything has to use LangChain. Absolutely not necessarily the case. Absolutely not in this case, we didn't use LangChain for this. I just wanted to highlight the elephant, highlight the obvious thing. Lots of people are using it and it's definitely something you can do if you're looking to orchestrate your application together.

Now we didn't do that. What we did instead was use AWS Step Functions, as you may have seen or remembered from the diagram we saw before. And we use Step Functions to basically coordinate the different tasks we wanted. So if we wanted to go away and get some live data, we can do that, bring back the string, bring it into the Step Function context, state, sorry. And then pass that into other parts of the architecture, the other parts of the function, AWS Step Functions.

If you're not familiar with it, it's basically a state machine that you can use to tie together different services and Lambda functions. It's very, very extendable. So it can, you can basically build a serverless application using all service endpoints and Lambda functions. So you don't have to deploy your own infrastructure and it will maintain state for you and you can break apart your application into different discrete components.

And this is a screenshot of the console after this has been deployed and this is this actual project gives you an idea about what it looks like. On the left hand side is the definition, it's a little bit difficult to initially get into. There is a visual editor inside the console you can use as well. But if you're a bit old school like me, you do it this way. And so you put together this JSON structure, basically specify I want to start off where you want to start off. I want to call a Lambda function and then I want to call different parts.

Now, one of the really interesting things I think about this architecture is that you may recall that we get Claude v2 to output a JSON structure. And it's a very specific JSON structure. And what we do is we actually pass that actual structure, so the output of Claude v2, and we put that into the state machine, the Step Functions. And Step Functions reads it and splits up the jobs to do. So one of the things, if you remember back to our sample output we had multiple different pages, each page had text and an image prompt. And that was a set standard format and you can actually have variable numbers of pages. But so far in experimentation, it's always three. But with that, where the Step Function splits apart the JSON structure and parallel processes that because one of the key things that this does is generate the images for us.

So it sends the prompts off to Stable Diffusion running inside of Amazon Bedrock. And it actually happens. So you can see here iterate over body, that's part of the Step Functions code there that is actually running in parallel that will scale out to as many images as we have. So you can parallel process and you can create as many images as you want on the flight. You notice that when we made Elephant in Las Vegas in the demo took a took a couple of seconds to do. So we don't want to have to wait for each one and we just parallel process all of those and then we bring it all back together and we create one HTML story to be able to share with people on the show floor.

I think the timing and the how long generation takes is something that as developers, we all have to get used to. Many of us are used to working with an API or making, using DynamoDB to get data and it's very, very quick, milliseconds, etc. And when we're generating things, they can take multiple seconds or longer. So how are you gonna manage that from your application side? How are you signaling to the customer, hey, I'm working on this, I'm actively working on this. The site hasn't crashed or the app hasn't crashed. All of those different things. There's ways to do it in a proper way, but it does require some forethought and it might be different from what you're doing today.

Absolutely. And we'd come and talk to us at the end, we could definitely got strategies for that as well. So here revisiting just for a moment, the architecture that we had at the beginning. There is a subtle difference, small change. So one of the changes that we were able to incorporate because it was just editing an architecture slide was the first step where we had the Step Functions and you see it forking into two Bedrock logos where it's saying story generation and image generation.

In the middle, we had Lambda functions. So we had to write Lambda functions that Step Functions would trigger and then those would call the Bedrock service. As of yesterday, that is no longer necessary. You can call Bedrock directly from Step Functions. And I don't know how many of you here are developers. There's nothing that most developers like more than deleting and throwing code away. So we no longer need to have Lambda functions that are acting just as glue in moving things around. We can call the service directly that makes it easier for us because there's one less piece that we need to debug and maintain and grow. So that was a great announcement during the week.

And like those that there's many, many more and we could take this demo and add them. Unfortunately it's hard during the week because we're also doing a bunch of other stuff. But we wanted to also call out the fact that there's a whole slew of things coming along and we're iterating on the demo in multiple dimensions, I think.

Then the next one is a lot of what Mike did on his own on the opposite end of the spectrum from the cloud. There you go. Right. Yes. So I basically, I have a lot of things to do on the plane on the way home is what you're saying. We're refracturing code. That's fine. Yeah. So we had a lot of fun putting this together and hopefully it's been interesting for you to take a look at this as well.

Yes, we iterated through all kinds of different designs. We had grand plans, we had lots of things made, we tried to simplify it though down so that we could actually have something, we had something that worked. Yeah, I'm not sure we were accessible there, but on the show floor it works people, it does. So we, we went from a very simple Mike made it over a weekend with wood design and then look, Mike said we had a very ambitious plan with CNC cut things that we, we were shipping, but then those weren't as structurally sound as they needed to be. So we, we went for a simplified design that was very robust. So that's how we ended up with what you see today. But it was, it was very fun to work on this to show something that was relevant.

I think something that we just skimmed right over was we're also grabbing headlines from Re:Invent. And that's why a lot of this stuff is generated on the show floor. We'll say, oh the giraffe was at AWS Re:Invent 2023 or the service there will be service names, stuff like that around it because we wanted to merge having those headlines and the image and that's how we could work with live data. And those headlines could be coming from an API it could come from a data stream, it could come from, from whatever you want. And that's how, how you can combine it. So it doesn't have to be just static or typing or anything like that. It could be in a more event driven architecture as well.

Absolutely. So yeah, you just get your live data, you get your data, your live data data that the model's never seen before. We put them into the context. It's as simple as that. But then we start, we call it retrieval augmented generation. There's lots of directions we can go off in that direction too.

So one on the poll and the theme of wanting to inspire you and we wanna hear from you as well. So what else would you want us to, what else do you want to explore? So it can inform us and Mike and I can go take our blue elephant demo and take it in that direction for next year. What do you want to explore next? Not blue elephants, not blue elephants? Ok. I'm gonna answer this as well. Let's do it weird. I think you can and I think we got some results.

So using RAG, so retrieval augmented generation, fine tuning as well is there but oh live data. So this is coming in as we're watching. Ok. Yeah. So there's, there's a couple there around fine tuning. Getting started, I think. Mike, can you throw on Party Rock on there for? So we have not seen this, you need to see this a new product that we announced recently that's called Party Rock that also can help you get started on your journey with generative AI. It's very, very friendly. No coding necessary.

And what you can do is you can build small experiences that will help you play around with prompt engineering. So this one and there's a way to share those with your friends or with the internet in general. So we, I think Mike just grab a random one from there. It's a podcast generator and you type in what the topic of the podcast is. And it'll, it's gonna generate, this wasn't, this wasn't planned. So it's gonna generate the podcast, any of this was planned, podcast description.

Yeah, we just got up there. We really don't know who was gonna deliver this talk. We stole, stole their demo. It's generating the podcast name, the podcast description and then it's gonna generate a back story coasts, images, names. So you can put, you can, you can fit add just some data there and it's gonna generate this. But if you wanted to build your own, whatever it was, right? Like make me cover albums for my music band or help me write titles for my, for my blog. You can, you can do that today, just drag and drop.

You can if you go into the settings of each one of those little boxes. You can choose what else LM you wanna work with. You can type the prompt, you can, you can do all the prompt engineering from there. No AWS account required so you can play with it today. Anyone can do it. It's really just so you can start exploring and getting familiar with all the LLM's doing prompt engineering and once you're ready to take the next step forward, there's Bedrock, there's SageMaker, there's, there's a bunch of other services as well.

We're almost at a time. But the most important thing I wanted to say is thank you. It's Thursday. You were here with us all, all really late. I think we're the last thing between most of you and a beer or a nap. So I really, really appreciate the time. Mike and I were saying it is so, so humbling, seeing so many people here to listen to us, to us talk. So I, I really, really wanted to say thank you for that. Thank you so much. Thank you for being here. Any questions come and talk to us in.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值