Building a cloud-backed generative AI game in 60 minutes

Everyone, thanks for coming. Um I think generative AI may have been the source of you coming, but thank you for coming.

Today we're gonna talk around building a generative AI game in 60 minutes. So we're gonna look at what is generative AI, I'm sure a lot of people at the conference this week are gonna talk about it, but just gonna make sure on the same page before we jump in.

We're then gonna describe the game that we built in 60 minutes and then the steps that are involved in building this game, but in building really general generative AI applications.

We're gonna talk about LLM selection - so what large language models we want to use - through to how we're gonna host those large language models.

Then we're gonna touch on prompt engineering, more of an art than a science. But we'll go through some of the steps and then how we integrate that into that, that into an application and then putting it all together into how to build a generative AI app.

My name's Ben Ellerby, I'm the CTO of AIS and also an AWS Server Hero. So I spend a lot of time speaking at conferences, I run the Serverless User Group in London and also a couple of open source blogs. You can find all of that on the AIS website.

AIS is a consultancy that I run. So I help a lot of enterprises to adopt modern cloud and modern AI and also startups to build with the best of cloud. If you're interested in any of that, I'm around. But this talk is not about me.

So generative AI, generative AI is a type of artificial intelligence that can generate new content, generate content in many modalities that can be text, images or really any different types of data basing that on patterns from existing data and generating new content with the same characteristics.

If we think of different use cases, it can help you write an email to justify a work reason to go to Vegas. I advise this for next year. Very useful. It can help you generate artwork for a podcast. We run the Gen AI Day podcast. All the artwork is generated by AI. It can summarize long documents that you don't want to read.

We have an open source project called Quiver on GitHub where you can ask questions to a large document and it can generate futuristic headshots. So the Gen AI Days website, if you have a look at it, all the team members including myself have very futuristic headshots using generative AI.

But if we categorize them into more business based use cases, we have text generation, image generation, search, text summarization and of course conversational chatbots. And we will not be talking about chatbots today. I think we need a break from chatbots.

So text summarization, let's build something now. Text summarization can be a little bit repetitive. So we're gonna make it a bit interesting. We're gonna do emoji summarization.

So our aim today is to build a game called Emoji Us - Emojis Worth the Words. This is not a real app or a good business idea, but this is a demonstration of how we can build a game using generative AI.

The concept's quite simple. It's an app that allows you to guess a famous book or a famous movie or film from a set of emojis. So we'll get six emojis for the book or the film and then we have to guess the work of art.

For instance, we've got six emojis here, obviously Harry Potter and the Chamber of Secrets. We then have to guess what that book or form is. We submit our guess or the user submits their guess and then they get a score.

The interesting part is not just the emoji generation but also the scoring. We're not doing any exact sort of text based scoring. We're basing it on the semantic similarity. So let's say I put in Harry Potter and the Philosopher's Stone, which is called something different here. But anyway, in the UK it's the first book instead of Harry Potter and the Chamber of Secrets, it's still very similar semantically, but they are different works. So we're gonna get a score but it's not 100% but something around 70%.

So to build this, we're gonna look at how we select a model, a large language model, how we think about hosting that model, prompt engineering again, more of an art than a science, and then how we integrate that into a fully serverless application.

So model selection, we need a model that's gonna take our famous book or famous movie and generate a six emoji summary of it. And this is where the large language model comes in.

Now a large language model - I mean a lot about them in the press recently - they're very large AI models that are pre-trained on huge amounts of data and they start to extract meaning and patterns from those data and then can be used to generate new content of different modalities.

When we're selecting a large language model, it's quite an overwhelming choice and there's some really popular ones you see in the news. But I sort of use these six different criteria:

  • The medium - are we doing text to text? Are we doing text to image? Are we doing image to text? That's the first area we need to think about.

  • Fine tuning - this is do we need to further train the model on a smaller set of data? And different models allow different types of fine tuning and have different efficiencies of that fine tuning.

  • Task complexity - our task is actually quite simple. Some tasks can be a lot more advanced.

  • Provision throughput - I'll talk about this a bit more during the hosting side. But what kind of traffic are we expecting? So this is something we're doing as a one-off. Is there 100,000 requests per second? It's gonna change the sort of model we're going to use.

  • Tokens per call - this is really how much text do we need to use during the generation of our content. And also when it comes to search applications, how much text are we searching?

  • And then finally, and most importantly for a free game, cost - this is gonna be our main driver of selecting a model today because cost is the key way we're gonna scale the application.

Model hosting - I couldn't really be at re:Invent and not talk about Bedrock. But Bedrock, I really think does change the game when it comes to building with large language models. And the main reason for me is that it's fully serverless.

I'm a big fan of shifting complexity, shifting undifferentiated heavy lifting and really being able just to call a large language model API and not think about how it's running, not think about how it's hosted, but instead have that handled by the cloud provider.

It also fits into an ecosystem of other cloud services. Large language models don't exist in a vacuum. We still need to be able to integrate with our application databases, with our data lakes, with our user authentication and actually build a fully fledged application like we're going to see today.

So Bedrock really gives us that. It gives us a fully serverless way to interact. We can invoke the model and get our response.

Model selection - it lets us choose proprietary models, but it also lets us choose from a large set of open source models. Very simple to invoke. We give a model ID, the prompt, and some configuration parameters we're going to look at a little bit. And then that cloud ecosystem, as I said, it interacts directly with the ecosystem of services we're already familiar with in the cloud.

You please have a look at Bedrock. It's really interesting to play around with. Matt Carey from my team at AIS wrote a really great article where he deep dives into how to play around with Bedrock. You can find that on the Gen AI Days blog.

But if we have a look at sort of the landing page, you can see some usual suspects when it comes to large language models. And that's a key thing - you have direct access to those models without any setup, without any complexity. You can actually see different use cases for those models.

So we can see Claude is really good at text generation. We can see another model that's particularly good at something else. And finally we get to the playground. So when it comes to developer experience with these models, we can actually play around with the model before we've written a single line of code, before we've provisioned anything, before we've written any infrastructure code. We can start to play around and see - can it generate emoji summaries from a particular book or film?

So we know how to select our model. We know we've not chosen our model yet. We're gonna see that in a little bit. We know how to host our model. We're gonna host it fully serverless in Bedrock.

Next we have to write our prompts. Prompt engineering - probably the biggest buzzword of Gen AI this year. But prompt engineering is really the art of creating natural language instructions to guide a generative model in creating accurate and desired outputs.

Two words that I think are important there:

  • Art - it's not exactly a science yet. There is still some playing around with it. There are some steps we can follow and there's starting to be more testing we can do.

  • And it is really guiding an output. It's still not deterministic. We're still guiding towards a particular output.

In the simplest approach to prompt engineering, we have zero shot prompts. This is a super simple task description: "Tell me about the best car to buy." We're not giving any examples and the output is actually gonna be quite unpredictable. Unpredictable is a negative way of saying it - it's gonna be very creative, but we don't necessarily want it to be very creative when we're trying to integrate this into our application.

So we can do advanced zero shot prompts where we're giving a lot more information about the sort of decision criteria we want to use, how we want the output to be given and formatted. So it's more comprehensive, still no examples, and it's gonna be less creative. But again, it's gonna be a bit more predictable.

Then we move on to single shot prompts. So we're gonna give a task description, so "Summarize the sentiment of the sentence" and then we're gonna give it an example. And this ability to give an example and then an incomplete example where we give the input but not the output has shown really good results in generating a lot more predictable output.

So we've given it an input example "I love going to the mall" and we have the sentiment being happy. We've given it another input "I enjoy going to the park", the model's gonna kind of understand a strong word but gonna more reliably generate the desired output of happy.

Again we can then have few shot prompts. So we're still giving a task description, but we're giving multiple examples. And again, that incomplete example where we're expecting a result. So this is still very simple, multiple examples, but it's getting a lot more predictable.

If we go back to our problem, the problem we're trying to solve, we've got the input "Harry Potter and the Chamber of Secrets" and we want to get a six emoji output summarizing the plot line.

A single shot, sorry a zero shot prompt for this: "Write me the plot of Harry Potter and the Chamber of Secrets as a set of emojis." It's really simple. We're not given any examples. It's gonna be quite unpredictable. And I don't know about your code, but getting our code to then use this output is not the easiest thing in the world to display to the user because we're getting some free text put in that free text could change when the model changes or change depending on the inputs. And then we're getting a random amount of emojis summarizing the plot.

If you do look at the emojis, it's quite a good plot summary if you've read the book, but it's a random amount of output, which isn't what we want for the application we're trying to build.

We then move on to our advanced zero shot prompts so we're giving it sort of a setup telling it it's a helpful assistant, telling it it's gonna help us build a game, giving it the output format that we want, and then giving it the input that we then want the answer for.

We're still giving it no examples, but it will start to perform quite well. Gonna take a quick break from definitions of different types of prompt engineering.

And I know this isn't exactly the best format for Q&A but does anyone know what book or film this is? Yeah, Lord of the Rings. I'm not gonna say which book the whole work. Anyone got this one? Lion King. And then the hardest one of these examples, anyone got this one? Matrix. It is quite good for the crowd.

So after a short break, we went with an advanced zero shot prompt and actually the results are really quite good. It was generating the format that we wanted. It was very clear, it was giving six emojis for each and it was giving the right emojis for the book or the film.

There's a little bit more work we need to do prompt engineering, generating that input prompt importance. But then there's also some inference parameters that we can give and Bedrock makes this quite easy because it structures these inference parameters in a similar way for all the models that we might use.

Now not all models support all inference parameters and some of their ranges are different. But largely our interaction model is the same depending on which other model we're using.

The three categories:

  • Randomness and diversity - this is sort of the level of creativity, the scope of thought, how crazy it can go.

  • Length - fairly standard sort of how we can control the response outputs and have stop sequences.

  • Repetition - how we can avoid repetition and have stop sequences.

If we look at randomness and diversity, the one that's talked about the most is temperature. And temperature actually was the most useful when building this game. Obviously we built this game in 60 minutes, it was quite fast. But temperature really allowed us to get the results we wanted.

There's also top k and top p, they control sort of the selection of candidate words based on the probability distribution. They are still useful. But require a little bit more experimenting as you're going through. Not all models support access to those we then have sorry access to those in terms of temperature.

We can see an example here. So we said give us the film as a set of six emojis. Again this is Lord of the Rings. We can see a wizard told two people to walk to a mountain, very low temperature. And again these are real examples. I didn't make these, we ran the temperature up sort of midway the wizards tell them to take a ring to a mountain and then two people are walking home and then we get really high temperature and we have a concept of a king that the wizard then fights and there's a sword, there's an explosion or a star and a ring.

And we set the temperature at about 80% in the actual example that we've shown already. We can see how temperature, and this really trivial example, starts to change things. When you have long text outputs in image generation, temperature can have a huge impact on what we get out the other side.

Length - fairly simple. We've got response length, we can also have length penalties as a different way to sort of restrict. And then stop sequences if there's certain things that we want to stop or halt generation of the output, which can be good for safeguarding outputs in our application.

In terms of what we used, I'm giving you a spoiler of what model we used. We used Amazon TITAN through Bedrock. But these are the parameters that we used. And you can see these are the ranges in the default. So it's not a huge amount to think about when you come to inference parameters, but they have a huge impact on the quality of the outputs that you get. And they are something you should spend time experimenting with.

And actually we've started to do some end-to-end testing with automatic sort of generation of different configuration parameters to sort of see what outputs we get.

So we've talked about selecting a model. We've talked about hosting it in a fully serverless way. We've talked about the art of prompt engineering and how inference parameters can be useful. But now we're gonna put all that together, see what model we're gonna use, and see what code it takes to get this to work.

Before we get to the code, we can talk briefly about the architecture. I mentioned before I'm a big fan of serverless. I'm a big fan of shifting undifferentiated heavy lifting, reducing total cost of ownership, and enabling developer experimentation.

Our front end is fully serverless. So we've got CloudFront as a CDN which is handling the caching of our front end resources. Then we've built a React application that's hosted in S3.

We then have the guessing API. So using API Gateway as a fully serverless API Gateway, it was the API Gateway REST version. And then Lambda, which is kind of the workhorse of serverless - the function as a service - which is going to run code in response to the events, the events here being the API Gateway events and then interacting with Bedrock through the API in a fully serverless way.

The fun part, the emoji engine, always a microservice I've wanted to make. This is what's gonna generate the emoji, store them associated with the works of art. And actually it's gonna come into the scoring a bit later.

Every month we're gonna use an EventBridge scheduler. We could use other services to then schedule that Lambda function. That Lambda function then calling the APIs of IMDB and Goodreads. But then getting those works of arts back, we're then gonna batch them because they're sort of a max output length that we could actually have.

So doing them in sets of 50 I think it was into Bedrock to get our emoji summaries out and then writing them back to DynamoDB, which is a fully serverless key value database. So that when a user loads the page, they can call the guessing API. The guessing API can then go to DynamoDB and get the first question we want the user to guess.

Now of course we can make this more advanced with user accounts and scoring, being continuing and be able to share and also being able to make sure you don't get asked the same question twice. It's just a toy example to show how we can play through this.

I say a toy example, it distracted the office for like a month. But a toy example.

We've already deep dived into that when it comes to the emoji API. As I said, we're invoking a model and that model is in Bedrock. When it comes to the code to do this, it's fairly simple to do using the AWS SDK. And this is the actual code we used for the examples we've seen earlier and for the game that we built.

But we can see we've got our input text that's that long prompt that we made during the prompt engineering phase. We've got our text generation config - these are our input parameters, sorry our inference parameters. So we can see our temperature is set at 0.7. We also set a top p of 0.4 and a max token count of 100.

We then send that, we get the response. You have to decode it, don't forget to decode it. We have the response, then we can use that in our application or actually write it's done ready to be in our case.

I'm sure a lot of you have heard about LangChain. It's probably one of the other more hot topics in the Gen AI space this year. LangChain is a framework that's designed to simplify interaction with large language models and it creates a consistent interface.

Again I said earlier Bedrock creates some consistency across interaction with different models. LangChain helps us to do that in the application code. It also brings chaining as you might expect to enable us to have quite complex workflows.

We don't have complex workflows in this application. But in any case, using LangChain for us, it's very familiar to how we're interacting with other large language models and other applications. And it actually makes the code a little bit cleaner.

If I change the next slide, if this works, we can see our application is now using LangChain as a library and LangChain is then handling the interaction with Bedrock to send our request and get our responses. And we can see the code is fairly similar. It's a little bit shorter. You don't have to do that decode step, which I always forget which is useful. And this is much more extensible if you want to start chaining more complex workflows.

On the end, you can see we're still using the TITAN model. We're selecting our region, we're giving the temperature and the max tokens. If you look very closely, there's a typo that's 1000 max tokens, not 100 in the previous example. And then we're giving our prompts from the templates and getting our response back.

Scoring is the more interesting part or for me, it was the interesting part because we can take advantage of something called embeddings models.

Now beforehand, we were sort of using the large language model for its normal use case of generating new content, but we can also use it to generate embeddings. Now embeddings are a vector representing the semantics of a particular, let's say piece of text, in a highly dimensional space - a space in many dimensions.

So "Harry Potter and the Chamber of Secrets", we can put that through an embeddings model. And we actually get a vector representing semantic meanings of that work of art.

We can have "Harry Potter and the Philosopher's Stone" as the guess the user has made, we put that through Bedrock and we're getting another vector. I'm putting dots there to show there are more, these are very very big vectors, it's a very very big space because they're vectors.

We can actually quite easily do a similarity score using cosine. I don't know if you guys remember trigonometry from high school. Took me a minute. But we can actually calculate the distance even in highly dimensional spaces between our vectors.

So the code is actually really simple. We can create our embedding, you see we're calling the embeddings model here and we're calling embed documents. We're giving it our the original inputs that was Harry Potter and the Chamber of Secrets and then we're giving the guess the user submitted.

We're then calculating the cosine similarity between those vector representations. There's some redundant code in here if you're looking closely because I was playing around with different vector distance algorithms. If anyone wants to compare vector distance algorithms I'm available afterwards.

But we're selecting the distance and then we actually jump it back to the similarity in the last line of code was do one minus again small detail. But the key point is we can generate vectors that represent the semantics of a particular film or book, the semantics of the particular guest and actually calculate the distance between them.

So we can see in our previous example, "Harry Potter and the Chamber of Secrets", "Harry Potter and the Philosopher's Stone", calculate the cosine similarity score. And we can see we're at 92% fairly similar as we might expect.

So putting it all together when it comes to building generative AI applications, we need to select the model and there are different criteria we can do that on and TITAN performed really well for this other models are available. Here is the remaining formatted transcript:

The model hosting - Bedrock really changes the game because we have a fully serverless interaction mode with our large language model. We have some consistency across the models we might use. We also have that larger ecosystem of cloud services. So our LLM doesn't exist in a vacuum, we can access our data lakes, we can trigger Lambda and we can use IAM to control all of that. So it actually helps us to build fully fledged applications around our LLMs quite rapidly.

Prompt engineering - more of an art than a science, at least for now. We saw a few techniques and how some very basic techniques can actually get you a very reliable output in the end.

And we also saw how inference parameters are key in making sure we control the diversity of the output, but also stop sequencing can be useful in safeguarding.

Finally, application integration - we've seen how you can pull together some, you know, fairly standard database services around Bedrock and build quite a creative user experience very rapidly.

I was hoping to go to the next slide with the QR code, but we had a DNS issue so that will be coming out next week. I'll send it out. If you follow me on Twitter or LinkedIn, you'll see the link but you can play around the application. It's a bit addictive but quite fun.

Where we went quite quickly, we do have some time left for Q&A. If anyone has any questions, raise your hand and I think someone will run a microphone to you.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值