Hi, everyone. How are you doing today? I hope you're enjoying re:Invent. Welcome to API210, Elevating Workflows with AWS F Functions and Generative AI. My name is Omar Ramada. I'm a specialist solutions architect on SO. I've been with AWS for a little over three years. I specialize in SO application integration and workflow management.
I'm here to talk about some new and exciting features of Step Functions along with the product manager of Step Functions. I'm a principal product manager on AWS Step Functions. Also super excited to be here. I've been with Step Functions a little over three years and really excited to talk to you about how we can elevate workflows using generative AI capabilities.
So we have an exciting agenda for you today. We're going to kick off exploring the problem landscape and sort of addressing some of the challenges that you have when you start thinking about how to take advantage of generative AI in your application and to do that, what we're actually going to walk you through is how to build an application that uses generative AI or more or less take advantage of these generative AI capabilities as part of an existing application. And we'll talk a little bit about how you can accelerate that by taking advantage of some of our newly released features.
So on Sunday, we launched optimized integrations with Amazon Bedrock, we also announced the launch of HTTP endpoints. And then lastly, we'll leave you with a, a different way, a few resources that you can get started and sort of implement what we talked about today on your own after the session and continue the learning beyond today.
So when you come to re:Invent sessions, you should sort of always have a goal that you walk away and you've learned something new or, or something that you can apply. And today, our goal is to address first off how you can easily integrate gen AI into existing applications using less code. The second, how you can take advantage of some of those new features I just mentioned to accelerate your use of generative AI in your applications. And then the last, how can you build resiliency from day one?
So to start off, let's cover the landscape a little bit here. So, generative AI has been taking the world by storm and we get first hand knowledge by these consumer facing applications like ChatGPT and that provides us real world experience in how the generative AI capabilities and machine learning models have really been evolving over time and generative AI is a type of AI that creates content, it generates new things like its name implies. So that means things like images, it can participate in conversations, it can create stories, it can create audio and video and music and like all AI generative AI is not magic. It's powered by these machine learning models and these machine learning models are pre trained on vast amounts of data. And this is called a foundation model or you might hear it referred to as an FM and you might hear us use that term throughout this session as well. And these foundation models have been doing some pretty remarkable things over the last several months and this is applicable across industries.
So what you might have seen is in things like life sciences. It's actually helped to accelerate drug discovery as researchers can now better understand protein synthesis in financial services. It can create highly tailored investment strategies depending on the customer's risk tolerance or their financial goals in health care. Physicians and clinicians are using generative AI to enhance images and scans to aid in better diagnosis in retail. We can see this used with the creation of product descriptions and product listings simply based off the product data. It can also help you to learn new concepts very quickly.
So um for those of you in the room with children or kids, you may also have experienced this firsthand where your foundation model is actually helping act as the tutor that's always available. Hopefully, that's also not the author of their latest writing assignment. So you have to be careful. And the large point here is that it is really applicable across industries. And this is because of the vast amounts of data that these foundation models are trained on. It's generally applicable, which means that you as part of your business can start to apply generative AI depending upon your own use cases.
So now your challenge becomes, how do I take advantage of generative AI while continuing to take advantage of the existing services and applications I already have. And to do that, you need to coordinate these services, they need to be able to communicate to each other in a reliable and an understandable way. And depending upon your business process, these can exhibit different patterns.
So uh sometimes things can happen sequentially where one thing happens and then another. So for example, let's say I have a couple of lambda functions. Lambda one and lambda two, it's easy enough for me to say lambda one, call lambda two, but I need lambda one to call lambda seven and then a service. And then maybe a foundation model that's a lot harder to do and there's no easy recovery mechanism. So if something goes wrong, reprocessing those previously executed steps is actually quite difficult.
So maybe instead we have some sort of database to manage state. So we use that as our state persistence and that's better. But we still have no elegant way of handling errors or coordinating all these services. And then not every business process is also sequential. Sometimes they happen in parallel or sometimes the input into your process actually dictates the flow of that process later on or a step uh depends upon the output of a previous step in your workflow. And so as you build these out and these workflows become more and more successful, more people in your organization are going to want to take advantage of them. So you have more people using your workflow, which means that you need to be able to handle errors as they occur, that might be retrying a call, but it also might be taking that error and choosing to go down a different path.
Now what happens if we want to introduce a human as part of our process? And although many processes are automated, there's still a need to interject a human for review. So for example, let's take our tutor example for using a foundation model as a tutor, it might not be the end of the world. Although still not great. if the foundation model provides an incorrect response, that's a very different conversation. if you're doing something like legal or financial document processing, those impacts can be a lot more detrimental or impactful on your business. So you might actually want to introduce a human to do things like spot check, review, the accuracy of what the foundation model or foundation model is providing back to you.
Another example of this would be something like a credit application. So for example, I would like a higher credit limit on my credit card in most instances that might be automated. But perhaps as a customer, i request a limit that exceeds the predefined threshold. In those use cases, maybe i actually want to send that credit limit increase to a human to review they can review the application and then determine, yeah, i want to accept it or no, i'm gonna reject that credit limit increase.
Now all these sequencing and error handling and processing challenges continue to apply as you start to use foundation models in your application. So what do we do here? Well, uh we build a generative AI application and not really. But sort of what i mean by this is we are actually going to build a generative AI application. And in doing so, we're going to address each one of these challenges.
So we're going to use a real life use case. We're going to address things like sequencing. We're going to talk about error handling. We're going to talk about state persistence and visibility and how to introduce humans as part of your workflow into that loop. And in doing so you'll start to understand how these things can come together.
What are we going to add to this existing application. Well, first, the application that we're looking at using is a video processing application called serverless video. So you may have already heard of it. If not, no worries, Uma's gonna talk a little bit about that shortly and the capabilities that we're gonna add to this are first, we're gonna take these videos that are created and uploaded and we're going to create a title and a description based on the video. So creation of new content, we're gonna send that to a human to see is what that foundation model you know generated, correct. Do we want to use it? And based on that response, we're also then going to call a different foundation model to generate an avatar or an image for that content.
So what's involved in this application or this process? Well, first off at a high level, we need to take that video content. We need to transcribe that into text. Then we need to send that text to a foundation model that can then review it and create a title and a description based on the text. We need to send it to a human somehow some sort of notification system for the human to say, yeah, i wanna move forward with this. And after all that is done, we also need to take that and send it to another foundation model to create an image or an avatar for us.
Now, there's a lot of different ways we can do this. So for example, we could start with thinking, let's use lambda and we'll do this synchronously. But all of a sudden we face these challenges and we need to introduce a new service or a new foundation model. So that's not great. So maybe we say, ok, well, we're going to pivot and we're going to go serverless. So we're going to choose to use lambda asynchronously. Now that's a better design, but we still have challenges around error handling and the coordination aspect.
So maybe we choose to use a database. So we're going to use something like Amazon S3 or DynamoDB. And we choose to manage our state that way. But now what happens is each one of those steps are making frequent calls to that database. So we're either checking the status on what previously happened or we're updating our own status ourselves. So each one of these steps, the creation of the text, the title, the description, the human. All of those are constant calls to this database which over time can become complex and costly. But it's also challenging to do from a data consistency perspective because things are always changing. There's error handling, you need to account for, you need to figure out what happens if something goes wrong. What do I do? I need to figure out how to manage the sequencing of all those systems, how to do the notification to a human and they get that response back while then moving forward to the next foundation model. So there's a lot of things to take awareness of and coordinate.
Now, we could do this in code. It'd be challenging or as we've chosen to do, we can use Amazon or AWS Step Functions. So AWS Step Functions is a workflow service that you use to create workflows. So the output of one step becomes the input into your next step. And you can configure those steps using things like parallel states or map states which provide dynamic parallelism. You can introduce loops, you can do conditional branching and logic, you can introduce wait states. And it provides this visual experience called Workflow Studio where you can drag the services on the left into the canvas. And then you can use the panel on the right to configure those services accordingly. So you can set your parameters, your authentication, etc.
AWS Step Functions is serverless. So you only pay for what you use. It scales to zero and it's fully managed. Now, if you don't want to use the visual build experience and you have a preference for code, you can do so. So Step Functions uses the Amazon States Language or ASL and that is our domain specific language.
"Uh our domains being workflows and you can uh write that it's json based. You can define your workflow or your state machine is another term for that declarative. And then you can take that file, you can introduce it as part of your C I CD pipelines. You can upload it to a repository, you can run pull requests. So it really does provide this robust way to address a lot of the challenges we've initially looked at.
But one of the things that customers tell us they really like the most is actually it's direct integrations. So it integrates with over 220 AWS services. Uh that allows you to really choose the service that's best suited for your use case, you can swap those in and out as your use case, your business evolve or add new services as you need them.
It does this by using two different approaches for integrations. So the first is Step Functions directly integrates with the AWS SDK. So as it implies, it gives you access to 10,000, over 10,000 API actions natively within Step Functions. That means all that error handling, all that visibility that we just looked at is available directly with those integrations.
It also has an optimized integration. And the difference with an optimized integration is that we've created some additional capabilities to better work with specific APIs. So an example of this would be Lambda if you're using the Lambda invoke API with an optimized integration that API output is an escape JSON is transformed into a JSON object. And that allows for better processing in subsequent steps.
Another example of this optimized uh integration are actually the call patterns that we support. So by default, all API actions support this request response and that's where you can make a call to a service. We do not block the workflow. So the workflow continues on once you've connected to that service, this is asynchronous. It allows you to be very efficient in terms of your processing, but that doesn't always meet every need.
So sometimes you actually need the response from that API before progressing in the next steps of your workflow. So for that use case optimized integration support a dot sync or a run a job pattern, it's called dot sync because you can append dot sync to the end of the API action and that's a synchronous response and that we will actually wait for the response from the API before moving to the next step in your workflow.
The third integration pattern that we provide as part of the optimized integration is this callback pattern or wait for task token and you probably guessed wait for task token because you append, wait for tas token to the end of the API action. And this allows you to uh introduce a human into the loop. So uma we will talk about this a little bit later. We'll actually use this pattern and it provides a token to the service that you invoke that goes away and does some work. And eventually that tokens returned back to the workflow at which time it tells the workflow, please continue on or you know what this wasn't successful. Let's fail that workflow as a result.
So now that we have a bit of understanding around the capability around stub functions, uma is going to explain us a little bit more about this existing application. As tanya mentioned, we took sous video for our use case you might have heard about sous video in other so sessions at the re invent. It is a live video streaming application built with sous architecture. You can not only watch live broadcast but also watch them or play them on demand.
We're gonna add some generated va a capabilities to the on-demand videos. We are going to allow the video authors to add a a generated title description and avatar to the video. So as a first step, we need to convert the speech in the video to text. We can use amazon transcribe an automatic um speech recognition service that allows or makes it easy for developers to add speech to text capabilities to their application.
Amazon transcribe offers several APIs. We can invoke those APIs directly from Step Functions. We're gonna use do transcription job API that's going to take the media file from Amazon S3 bucket, convert that media file the speech in the media file to text and output it into Amazon S3 bucket. As i said earlier, you can invoke the star transcription job directly from Step Functions. You can configure retries you can handle the errors. If there are failures, you can also send it to DLQ. This direct integration is really, really powerful.
I'm gonna explain in detail why this direct integration is powerful. With an example. Consider this example, classic example of querying data from database. Our data exists in Amazon Dynamo, DB. I'm gonna write a small code in AWS Lamda. The first thing i'm gonna do is to import the libraries that required to call the DynamoDB API. Then i configure the table name and the keys to fetch the item. Then i might write a simple function which calls the DynamoDB API and wrap it up with. Try catch the last thing i would do is to call that function within a lambda handler. I do some json transformation from object to string.
If you look at the code, i do not have retries. It has only simple error handling. I do not have logging, those are essential things, but without that, it is already 20 lines of code and bug can arise from any line of code because it is my code and code is a liability.
If i implement the same in a workflow, all i have to do is to add one single state DynamoDB get item without worrying about what library i'm gonna use. I can then configure the payload, the table name and the keys i can retry if the DynamoDB get item fails. If it keeps failing, i can send it to a dead letter queue like Amazon SQS. And this implementation is as scalable as the previous implementation. And it is also observable.
If i come back to this workflow. After three months of working in a completely different project, i can still understand what is happening in the workflow. I do not have to be a programmer to understand what is happening in the workflow.
Similarly, during development and after deployment into production, you must really know what is happening in your workflow when things go wrong. So to inspect for errors and to troubleshoot issues, the functions offers complete visibility into every single step of the workflow. You will you can understand the status of the workflow state up every single step input to the step output of the step and rows if there are any without writing a single line of logging code. And this is very, very critical when things fail and things often fail.
So with that kind of a direct integration, we have got a transcripted text in Amazon S3 using start transcription job. Our requirement is to create title and description for the video and we want to create multiple because we've always loved options and we want to choose the best, but i'll talk about why we are creating multiple in, in a minute, in a minute.
So we need a foundation model that can comprehend and understand the text that was generated from Amazon transcribe and to and generate the title and description. Amazon bedrock is a fully managed service that offers choices of foundation model. It is the easiest way to build and scale generative applications so we can use Amazon bedrock foundation model to create our title and description.
The best thing about Amazon bedrock is very, very simple serve list and it offers a very simple API to invoke. So i set out to write this code of course in lambda function. So the first thing i did is to grab this converted text from S3. This is like few lines of code. I wrapped it in a function. Then i create a prompt. A prompt is something that you ask the foundation model. It's a request to a foundation model, it will differ from one foundation model to another.
So i create this request and then call the bedrock rhe api to invoke a foundation model and generate the title and discussion. If the foundation model returns a larger payload, i do have to write some code to store that payload into an amazon s3 bucket. As earlier, i do not have logging. And what happens if bedrock fails? I do not have any error handling any retries. And if i include that's again 25 lines of code, as i said earlier code is a liability.
So we thought about how we can make this integration simpler and easier for you. And with that on sunday, we announced the introduction of two optimized integrations from AWS Step Functions to Amazon Bedrock to help you take advantage of generative AI capabilities and accelerate your own development.
So these two optimized integrations, the first one is with the invoke model and the invoke model for uh bedrock that API is how you actually call and interact with those foundation models. So the op the optimized integration for invoke model provides the ability for you to directly invoke a model and you can provide a prompt. So the prompts being sort of the the request or the question you have for the foundation model, we've also added some neat capabilities related to S3.
So that as the payload is too large, for instance, you may not be able to provide that back to Step Functions, Step Functions has a payload size limit of 256 kilobytes. But you might still want to get some of those responses back for those use cases. You can write that response directly to S3. What about if you actually want to provide a request or a prompt to Bedrock that exceeds the 256 kilobyte payload limit? We've also added the ability for you to use S3 as an input instead of putting it directly in the request.
So you can refer to an S3 bucket bucket and then write to an S3 bucket as well. And I'll show you that in a second, the second optimized integration is a create model customization job. This is a synchronous API taking advantage of that dot sync integration on the call patterns that we reviewed earlier. And this allows you to fine tune models and then immediately receive the response back when it is complete directly in your workflow. Without the need to write polling code. Is my is my fine tuning job done yet. Is it done yet? Is it done yet? Is it done yet? That's all taken care of you as part of this workflow with the dot sync integration.
Now, our particular use case needs to focus on this invoke model. So we're going to take a little bit of a look in that. And as uma mentioned, we're using this directly as part of our workflow. It provides that native error handling so you can retry automatically. It's got that visual experience. So you continue to drag and drop the Bedrock invoke model API into your canvas. You get that same familiar experience with the configuration panel on the right where you're able to define the name of that step particular to your workflow, you can choose the foundation model that you want to access. Those are the ones that are provided by Amazon Bedrock depending upon the region in which you're operating.
And then when we look at this from an Amazon States Language, uh we can see immediately that we have a type task and a task is really sort of how you get work done in a workflow and there's a resource there that's referencing the Bedrock resource that we want to call with the invoke model API we'll see the model ad there. The model id is referring in this example, we're calling meta lama"
"Uh but this might be a titan model, it could be a stable diffusion. Whatever are the foundation models provided by bedrock. You can simply specify in the model id as part of the body. You'll see the prompts. And specifically in this example, you'll notice that i'm not actually writing a question or a specific ask to the foundation model, but instead referring to dynamic input, meaning that you can do that, you can provide the prompt that you'd like to use at runtime as part of the input into your workflow. And then this api will look for whatever was provided at run time and use that as the prompt that you have.
You'll also see some inference parameters provided this is really dependent upon the model that you want to use. So it will vary from model to model. For example, with the lama model, what you'll see is temperature, which really describes the variability and answers that you'll get from that foundation model. So zero being sort of more consistent one being more variability in the responses that you'll get. You'll also see things like top p which is a probability, it's a sum of probabilities in terms of the variable of responses that you can get, you might see things like k as well related to the probability. Um max gen length is also uh this varies by model to model. You might be familiar with the term tokens and this is tokens, tokens does not mean characters. It does not mean words, a good estimate though is around six characters as a token sort of a good ballpark range. So that allows you to reduce or limit the, the response that you actually get back from that foundation model.
But tanya, you talked about s3, i don't see that here. Well, what you will see is that as you continue to move forward, you can actually use the input where you define an s3 bucket that refers to the to the prompt or the parameters or the request that you have for the model, you can pass that forward to the workflow or you can also choose to write that to a different s3 bucket.
So simply by providing the bucket reference as part of the a sr so we have the title and description generated by amazon bedrock recall, we wanted to create multiple title and discussions. So we're gonna use another foundation model to create another title and description. We can use amazon stage maker jumpstart that also offers choice as a foundation model including open source models. And we can invoke amazon's stage maker jumpstart directly from step functions. But we chose to use a foundation model that is outside aws, a model that is outside aws and publicly available, which you can access through a public api, accessing a public api is not something new for us. We've always done it. But when you think about it, there is a lot going on in there, you first need to figure out what kind of authorization is using. Is it using simple basic authentication? Is it using api key or is it using a, a authentication? Regardless of it, you need to store it somewhere in a secret store? I need to retrieve it when you call the api.
Similarly, if it is a, what kind of authorization you need to call a token api, get the token may have to store the token somewhere for a brief period depending on your requirement, depending on the type of request and response formats accepted by the api. You may have to do some io handling as well. Last but not the least you have to have retries, not just retries, you need to have graceful retries so that you don't overwhelm the api.
So your simple problem of accessing a public api might look something like this. When you actually implement it, you might have a lambda function that calls the secrets that gets the secrets from aws secrets manager. Then calls the token api gets a token and stores a token in amazon dynamo db and manages the exploration. Then it has to grab the converted text, the transcribe text from amazon s3 bucket, create a prompt, then call the public api or the foundation model to get the title and description of course, retries, error handling, logging all of those things. There is a lot of undifferentiated work going on here that does not add any differentiated value to the business. Of course, they are important for any production ready code.
So again, we thought about how to make this integration much simpler and easier for you. And so on sunday, we also announced the introduction of public htp api support natively within step functions. So this now allows you to call any or virtually any public np api. So think of things like sales force for sales data insights, you can call slack for notifications, you can call github for collaboration, you can call stripe for payment processing. And in our use case, we can call hugging face to take advantage of their inference api.
So as part of this, again, you get that visual experience where you can simply drag and drop this public api task into your workflow, you can start to configure it. Set a name for that. You can choose the end point that you need to invoke. Along with the method. You have the ability to set your authentication using a connection. A connection we'll talk about shortly is a resource from amazon eventbridge that's used for api destinations. If uh for those unfamiliar with api destinations, it allows you to send events to sass partners or public endpoints. And the connection handles the authentication for you. So that is how as part of the htp endpoint support, we are handling the connection.
When you think about some of the features that this provides or the benefits that this offers you. This direct integration is notably the ability to handle errors. So also with public hdp api s, you may receive a status code that you actually want to do something with. So you can handle those differently. A 429 or a 400 or something might actually be different for you than someone else. So you can choose to handle those errors directly as part of your workflow authorization is supported. We support o of basic and api k using the connection resource available within amazon eventbridge. So you create a connection, you specify the authentication and the authorization details. And then you refer to that connection r or that connection resource as part of your workflow. And we'll look at that in a second, we've also provided the ability for you to transform data directly as part of this integration.
So step functions being json based, you may also need to connect to a ps that don't deal with json. So one of the more common ones that we discovered after speaking with customers was form ul encoded data. So you can actually choose to transform directly as part of your workflow, json to form ul encoded and send that to that provider, that endpoint that you need to use, you can specify the content type to do so. And then lastly to help you get started even faster. We've also introduced a new test state api. This allows you to invoke a single state of your workflow validate that the authentication works as expected, the input output processing or any data transformations are applied. And with respect to the htpn points, you can also view the raw requests and the response. So you get an idea where they are headers that are unexpected. Is there something there that i need to adjust my workflow to handle?
So we take a look at the amazon state language for http end points. Again, you'll see some familiar approach with respect to what we did on bedrock. You'll see it's a task because we're doing some work. But instead of the bedrock invoke model, you'll see an http invoke, that's sort of the reference to that we're going to invoke a public endpoint. You'll see the endpoint specified along with the method that you want to use. We'll see the connection arm that i've referenced. So that's really the the connection arm that you create in amazon eventbridge is referenced as part of your workflow. And this is really great because one you don't input your secrets as part of your state machine definition. So you don't want those secrets to actually leak in logging as part of your execution history or to anyone that doesn't have permission to view those secrets. So by using the connection arm, your secrets are secret, they're safe, they're secure. Uh amazon eventbridge connection actually uses secrets manager under the hood and that's free for you to use as part of this feature.
We also provide the ability for you to provide optional parameters such as query parameters, additional headers and modify the request body. So when we're looking at how to generate a title and a description for multiple different multiple different foundation models, we use a parallel state. So that means we invoke two different tasks in parallel. One being the bedrock invoke model and one being hugging face via this inference api using our new public endpoint support. As you progress through the task in a parallel state, both branches must complete successfully in order to move on to the next state. If one of those branches fail bedrock or the hugging face api, the parallel state fails.
So what happens if something goes wrong? How do we deal with that? Well, normally we get an issue, it pops up, we fix the issue and then we recover quickly. But in a parallel state, we have the challenge of something might have completed successfully and the other branch now fails previously, this is how we would handle failures. In the example, we'd have a transcribe so that video was translated into attacks, that was a successful state. We moved into this in progress state where both bedrock and hugging face are working on providing that title and description bedrock seemed to finish faster. Great. We're off to the races now we move to let's say a transform block doing some manipulation. All of a sudden hugging faces also completed the title generation and description generation. It moves to transform block. This is fantastic. We're moving well until we're not something failed in the transform block. And now based on what you knew about our parallel state, you would actually have to rerun this workflow from the very beginning. That means calling, transcribe, calling bedrock, calling hugging face, running that transform step again. So that's both costly in terms of the services that you may be choosing to call as part of your workflow, but also in terms of time. So you might not have the time to wait or the services that you're calling might actually take a longer time to process that information. You also then need to deal with duplication.
So how do we handle that in the context of recovering quickly? Well, a couple of weeks ago, we announced red drive which uh provides the ability for you to recover from a point of failure faster. So when we take that example, on the left of where a parallel state failed with red drive, this is both available in workflow studio as well as programmatically as an api we simply red drive those failed steps while preserving anything that completed successfully. So now our transform two step is actually red driven. Our transform one step that was previously aborted because it didn't complete is also red driven. We end up with success. There's no need to red drive the bedrock api the transcribed job ahead of time. And then ultimately, that workflow completes successfully.
So this allows you to recover a lot faster. Now, this is all part of the same workflow execution. So you can easily see what happens. So when you're looking at an execution history, which is provided by step functions, so it gives you that visibility as to what happened in every step of my workflow. What state happened when a task was uh invoked when it completed where it failed, we will introduce a new red driven event. So execution re driven will be added to your execution history indicating that you have chosen to reri that ex execution right there. You'll then see all the subsequent events that took place as part of that same work for execution. If it fails again, and you need to red drive again"
No problem. You can continue to red drive over and over again. As long as you don't run into the 25,000 event history or you exceed running your work for execution up to a one year maximum duration, you can also filter down to the specific logs. Look at only the failed task. You can dive deeper into CloudWatch logs if you also need. So it really provides a full visibility and allows you not only to use retries for transient errors or more momentary blips, but it also addresses these harder failures where it might actually require some intervention, someone to go fix something or a longer period of time where your retry strategy might have been exhausted.
So red drive from failure provides really the ability to for you to recover faster from failure.
Well, we have now two different titles and description. One from Amazon Bedrock, another from Hugging Face. We want to send these two titles and description to the video author and ask them to choose one. This kind of a human in the loop or a human approval is a common business use case. For example, order processing application or a payment processing application where you might have an approver to approve the order amount or a payment amount if the amount goes beyond certain number.
Human in the loop is also an important process in machine learning and artificial intelligence. The requirement that we have is very simple. It is just one such use case. But the number of applications that you can build using this Step Functions human in the loop process can be many more.
So in order to explain how the human in the loop process works, I'm gonna explain how the integration of Step Functions with other services work. You can call an API of a service from the functions workflow. As a request response, you call the API get back the response immediately synchronously. You can call an API of a service and wait for the task behind the API to complete. For example, submit, need a batch job. A batch job typically takes longer time. So the functions workflow task will wait for the batch job to complete. And then when it is completed, it will move to the next step.
You can also call an API of a service with a special token and wait for that special token to be returned. And this is called wait for callback integration.
So with wait for callback integration, we're going to implement the human approval process. So in your workflow, the task will bend out a token and we'll give that token to an API for service that the API can go ahead and call a human uh through an email or it could be a long running process. it might execute a long running process. that's in your legacy application. The task that bended out the token will wait in parallel, the human will go complete the process or your legacy application will complete the process. When they are complete, it sends out the token to the Step Functions API along with the response.
Now, when the Step Functions workflow receives a token, it understands which execution, which workflow task generated the token. It will resume that task. When the task is resumed, it moves to the next step.
So in our requirement, we have got two different titles and descriptions from two different foundation model. So we're gonna send that to the video author through web socket. Web socket is a two way communication channel between client and the server. So the author is gonna use a web interface or user interface and will receive the title and description through web socket and they review the title and description and accept one of them. Once they accept it, they send the response along with a special token back to the Step Functions through an API.
So when the Step Function receives it, we can have a simple choice state which is a negative choice state instead of functions to decide whether we want to go to the next step or we wanna regenerate. That's another option. The author can choose to um go with one of those title or, or title or description or regenerate.
Our next requirement is to create an avatar for the video. Creating an avatar means you have a text and you give the text to a foundation model and generate an image. So we need a text image generation model. Amazon Bedrock also offers text to image generation model. So we're gonna use Amazon Bedrock again.
So we've got the author chose one of those title and description. We are going to use that same title and send that title to a foundation models. We create the prompt and ask it to generate the avatar and we will generate the avatar.
So what we just did is get a response from a previous task and use it to create a prompt for the for the task that we have now. And we also decompose a generation of title description and the avatar into multiple steps. And this is called prompt chaining. Prompt chaining is a technique of wiring multiple proms and prompt responses in a sequence of steps to achieve a business process.
Step Functions is a natural fit for prompt chaining use cases. We've seen how Step Functions simplifies the integrations with foundation model. We also saw how it takes away the undifferentiated heavy lifting such as retries, yellow handling request response transformation testing a single step. And it also offers you multiple ways to wire the prompts, like you can wire them sequentially parallel iteratively and you can have loops above all the functions is serverless and you do not have to worry about backward compatibility.
So our little walk through of our use case ended up with an architecture. And in this architecture, we have a user interface and the user interface calls in the API that is exposed through Amazon API Gateway. We chose Amazon API Gateway for many reasons including sophisticated authorization rate control and caching.
Then API Gateway sends a request directly to Amazon SQS. We could have sent directly to AWS Lambda, but we wanted to use SQS as a buffer. The reason is what if your API your UI grows in popularity when you have more volumes of requests coming in, you might inundate your model with so many requests. So we wanted to use Amazon SQS as a buffer. AWS Lambda integrates with Amazon SQS natively through even source mapping. So we use Lambda event source mapping to consume messages from Amazon SQS and then we invoke their workflow.
The workflow handles the the entire process. So within the workflow, we create the multiple title and description. After we create the title and description, we send it through a web socket communication to the UI and we use AWS IoT Core for the web socket communication.
Now what happens is because this is a task token integration. The task goes into a wait state. Now, on the other side, the UI on the UI user receives the titles and description and they choose one of them and then they send back the response. When the response is sent back, Step Functions workflow will resume that task.
Then after he resumes the task, we've got the title now the chosen title and then we use that title to generate another thought. And then we send the presigned url of the avatar back to the user and this can be displayed on the UI.
So I'm gonna show a simple demo of what we just did. So I've got the short video of Jeff Bezos and Andy Jassy, as I said earlier, we are generating title description and avatar the UI sends an API request that goes through API Gateway to the Functions workflow. And in the workflow you see already it is running and I can go to the visual experience. And in that visual experience, I see the visual experience of a particular execution that execution shows all the different states and the state transitions as well.
So we are first doing a transcription as we talked through it, the transcription job generates a text. Since the transcription job is an asynchronous call, we do a little wait loop to check if the transcription job is complete. Once it is complete, we read that transcripted data from S3. And once we read it, we send that data to the parallel state. And in that parallel state, we are using direct integration to Amazon Bedrock and also a public API. And once that, um uh once that those two integrations are complete, it will move to the next step because it is a parallel step, those two steps have to be completed before moving to the next step.
So after the title and uh description are generated from both of the foundation models, we send that um title and description back to the user. As I said earlier, we are going to use the callback integration. So the step wait for user feedback uses callback integration. So it composes the response from both foundation model and then also attaches a task token. You can see in the um input tab, you will see the task token here. It is a really long token, it is a unique token.
And so now the token goes to the UI along with the title and description. Now, if we go to the UI, you'll be able to see the responses from both models. I've used opening a model here and I select one of those title and description and it comes back to the UI immediately you can see the Step Functions workflow moving to the next step. And then we use that title to create an avatar and send a presigned url back to the UI. And you can see the avatar now shows up or in the UI.
So we set out initially to answer a few questions for you and we showed you how you can integrate Step Functions to build any application, not just generative AI applications with minimal code. We showed you some of the features of Step Functions, visual authoring experience, native integrations and human in the loop. We also showed you how you can build generative AI applications or how you can accelerate generator AI applications through some of the native integrations Step Functions offers direct Bedrock integrations, direct HTTP API integration test state.
And we also showed you how you can build resilient workflows from the start through some of the native features for handling retries, redrive and complete observable into what is happening.
Here is a QR code for resources. This will take you to AWS samples and it has a collection of resources, couple of blogs to learn more about Bedrock integration and HTTP integration and also has the GitHub sample repo for the demo that I showed you. It also has a GitHub repo for more prompt chaining use cases. If you want to learn more about the Step Functions, there are plenty of sessions. We have a few here to continue your AWS re:Invent learning. You can learn at your own pace at AWS Skill Builder. You can also demonstrate your knowledge, earn an AWS badge with that.
Thank you for joining us. Enjoy the rest of re:Invent.