Using AI and serverless to automate video production

最新推荐文章于 2024-08-09 14:16:22 发布

李白的朋友王维

最新推荐文章于 2024-08-09 14:16:22 发布

阅读量108

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134831706

版权

I'm Marcia Vis Alva. And today we are going to talk about using AI and serverless to automate video production. But I will not be only talking about video production. I will be talking about other things as well with AI and serverless.

We have quite a lot of demos ahead of us. But let's go back one year ago.

I'm a developer. I've been coding for 20 years, even if I look very young, I started very early. And I never thought in my career that I will be like unnecessary or I will need to open a bakery.

But last year when the generative AI whole thing started to explode and articles like this starting to appear in my feed and these are the nice ones. They were ones that basically said that developers will not exist anymore. I started to get worried, I don't know about you and I started to chat with my colleagues with my friends in the industry and trying to figure out what kind of skills do we need as developers in order to not be replaced by an AI.

Because at the end of the day now we know one year went forward that there's still a lot of work for us, but there's still many skills that we need to learn. So I kind of group the skills into two important categories.

The first one is developer productivity tools. So that's something that since the prehistoric times when our ancestors started to code, we have been improving those developer productivity tools like the language, the programming languages, we started with machine code. Now we have very abstracted languages that makes our life easy. We have libraries that help us. We have good ideas that help us.

And also since last year, well, this year for general availability, we have Code Whisperer. That is an AI coding companion. I got early access to this service and I was a little bit like well how it will help me I'm such an expert coder, but it was truly dramatic for me the help that it provided.

I'm the type of developer that I go to to do. I'm in the flow. Everything is good, so fast, so good until I don't know the answer I get stuck. And then I go to the browser and start searching how to use this library. And maybe I find the solution right away. Most of the times you don't find the exact solution for your problem. So you try things out, you get a little bit frustrated, then you end up in twitter and youtube watching kittens jumping and breaking things in the houses. And two hours later, you come to your code without a solution.

And the cool thing when I started using Code Whisperer in my IDE is that whenever I get stuck, this companion kind of realized that I needed some help and started suggesting me things and usually the things that it suggested were pretty good and didn't let me go out of the IDE and help me to finish my task and get unstuck.

So in this way, I became way more productive even if I've been coding for over 20 years. So this tool is something that if you can, you should definitely try it out.

Also now if you are a Mac user, we have Code Whisperer for the terminal. I don't know if you are using the AWS CLI a lot. I use it a lot, but I usually use three or four commands whenever I need to use something else. I need to go to the documentation and read it and check how to use those parameters.

Now with the Code Whisperer for the terminal, it just suggests everything on there and I don't need to start going out from the terminal. So these tools really, really help out and improve the developer productivity.

The second category of tools is the ones that are core for developers that is building applications. Now, our stakeholders want to build applications with AI in mind, want to build applications that can recognize things in images, understand and manipulate languages, create text images, music, whatever you can imagine, make recommendations, answer questions in natural language, create avatar synthetic voices and whatever you can imagine they are asking for that.

But I'm a developer. I love building applications. I don't want to become a data scientist. I don't want to start building models and maybe neither of you, you just want to use tools that are available for you to start building these applications.

The good news is that the AI landscape is pretty rich and as developers, we can basically use those services, APIs that are in the internet, in the cloud provided by different start ups right away in our application.

If you are AWS users, you might have already used some of our core AI services like Transcribe that transcribes audio into text, Translate that translate from one language to another Recognition that recognize things in images and videos, Polly that can read things out loud, Comprehend, that can detect sentiments and entities in text Track that can text from images and we have way more services in our portfolio.

Some of you are lucky enough that you work in organizations where you have a data scientist team that can provide you models and endpoints from those specific models that you need for solving your problems. So they might use H Maker to build those models or even they might use S Maker Shamar to deploy open source models and start using them and they give you an endpoint.

Or maybe you already are using Bedrock with the foundational models and your, your team is already kind of making sure that those models are enriched with your data.

And finally, there is a broad ecosystem of third parties AI services. Many startups are providing APIs so you don't need to build them and you can start embracing AI in your application.

At the end. As a developer, everything is an endpoint. So we should not be afraid to start embracing AI because we have been calling endpoints since the beginning of our careers. That's how we started building. Let's call a ticketing endpoint, let's call a booking service, let's call this registration thing. Let's use this API for this and that. So now we have AI endpoints.

Some of these endpoints are a little tricky because they require some kind of engineering on the request that we are doing to these APIs. For example, if you are using GPT, I usually need to craft a prompt. And this takes a little bit of feeling around figuring out what is the right prompt to get the right results. And because we are talking about generative AI, sometimes we are, we need to play a little bit with these prompts in order to get something that is consistent enough.

And also if we are one want to use these results in our application, we might need them in JSON with some particular attributes that can help us out. So prompt engineering is something that you might want to learn.

Also, when we talk about prompts, sometimes we need to one thing that is called prompt chaining or prompt splitting. We need to grab a prompt, split it into many ones, run all these little prompts in parallel murder results or then run one prompt over a little piece and then fit that result into the next prompt and chain these prompts together to get a result.

And when we are working with all these things, basically, what we need to do is to orchestrate and choreograph the calling of this endpoints to either do the prompt training or to call the different AI services that we want to use.

So this talk, I will not be talking about how to call endpoints. You are all developers, you should know how to do that. That's something we have been doing since forever. I'm not an expert on prompt engineering, so I will not help you with that, but I hope you leave this session feeling very, very confident about how you can orchestrate and choreograph your AI services to build really cool applications at the end of the session. That's what you should learn, how you can embrace so many AWS services and patterns from architecture to help you out.

So my name is Marcia Vis Alva. I'm a Developer Advocate in the Serverless AI team. I've been doing serverless since its early beginnings 2016 and I'm the host of a YouTube channel called FBar Zoes. So a lot of the links I will leave you at after this presentation will be hands on tutorials on how to do everything. So I will not show you how to deploy how to build. That's all in videos. Later from when you have time, you can sit down and get the code for everything.

If you have any questions, you want to learn more, you want to deep dive, reach out to me on socials.

So this presentation is breaking to free demos. The first demo is a bedtime story application doesn't have anything to do with video. But I really like this simplicity of the solution. This is a solution that my colleague Dave Boyne built. He has two little kids and I don't know if you have little kids. I have one and she always want exactly the same and they had the same problem every night.

We go to sleep. She wants to read exactly the same book. And as a parent, it's very boring, you know it by heart and you, it's always the worst book. My kid likes to read the history of farts and verbs, very educational book, but I'm sick of it.

So they create this app because he has exactly the same problem every day. At seven o'clock. A new story is generated based on characters that the kids like and scenes that the kid likes and they received an email with the story so it can be read out loud. This story is generated with AI. It also has an image that has been generated by AI. And if Dave is tired, it also has a play button that will read aloud the story so great.

This is not the story that Dave will read to his kids. This is a story generated with my colleagues. Dave Eric and James are parted from my team when we are pair programming. I hope Dave's kids still don't pair programming. They are quite young. But the cool thing of this is the architecture. That's why I brought this to you because this is a architecture that demonstrates event driven in its finest aspect for AI.

You can see that it's very linear, it goes from scheduler to an email notification and all this architecture is being driven through is events, something is changing in the system constantly and that is propagated forward and activating different services to act that's event driven architectures in a nutshell.

We have somebody that produce an event, an event goes to some kind of broker and then somebody consumes the events, the producer, the broker, they don't know each other. The producer and the consumer don't know each other, the broker is in the middle.

We call this choreography of events because instead of having an ordering service that will tell, hey, notification service customer has made an order now send an email. Now we have an ordering service that will say customer has made an order to the broker and whichever service his interest on that will react and say, oh, the notification, a new order has been created. I need to send an email. The marketing will say, oh, a new order has been created. I will send some spam and things like that. So this is why it's choreography. The the services are listening and dancing to the sound of events.

So let's go back to the architecture and analyze it bit by bit how this happens. So this is the first bit of the architecture. We have the scheduler and we have a lambda function. The producer of the event is time seven o'clock every day. The broker is the scheduler and the consumer is the lambda function.

The lambda function will grab the characters, the scenes will call Bedrock and generate a new story that will be stored in the in the story table. I don't know if you're familiar with Scheduler. This is a service we announced last year and it's how you can schedule tasks in AWS.

You can schedule millions of tasks in your AWS account. You can schedule one time task like today at three o'clock or you can schedule recurrent task every five minutes or every day at three o'clock or every Monday at two o'clock. And you can even add time zones to it. So it's not the same my start of the work day in Finland with my time zone at the start of the day that you are here in Vegas. So these type of things you can configure and it's really, really cool.

But the coolest part in my opinion is that you can reach over 200 AWS services. So it's not only triggering a Lambda function, but you can start your containers task or you can start sending an email or you can start a Glue job or whatever you need.

So let's look at an example with Scheduler that nothing has to do with AI. But I think this is an example that most of you might be familiar with - that is canceling a subscription.

I'm the type of person that I will watch one season of something, I will register to the streaming service and I will cancel it right away because I have already paid hundreds of euros in subscribing streaming services that I never watch. So I learned my lesson.

So I will go to the streaming service, cancel the subscription and I will have one month of a subscription valid even if it's canceled. So if this service was using Scheduler, what will happen is that a Lambda function will create a new schedule that says, "Hey, in one month, remove this person from the paying customer list."

Month comes, that schedule gets triggered and exactly that happens. This is a very common problem that many organizations have and before they have to recur to very complex solutions in order to address it. Now with Scheduler it is as simple as it comes.

So now we have our stories in our story table that is a static table still - it's a DynamoDB table but it's static. The data is there static. So we need to capture the change that happens in the database. For that we are using the pattern called Change Data Capture that allow us to capture events and things that happen into a database.

Because we are using Dynamo, we can enable event streams and that every time there is a change in our database will trigger an event. So now we have events flowing in our system.

The thing is that we want those events to go to an event bus so we can manipulate that event. The problem is that DynamoDB events and EventBridge events don't talk to each other - integration problems. Have you had those?

But the cool thing is that we have a new integration service called Pipes that will solve that point-to-point messaging, that integration between two different event producing and event consuming systems.

So now you can connect DynamoDB to an EventBridge super simple or you can connect Kinesis to an EventBridge or whatever you have in your mind. And the cool thing is that whenever you are connecting these consumers and producers together, you can filter and transform the events.

So my favorite example here is when we connect one Kinesis to another because usually when these data streams come in from our client SDKs, the events are all weird because the client SDK might have changed, the schema might not be standardized, we might have a lot of noise, we might have analytic events, we might have backend events, we have all kinds of events mixed together and then I can apply a Pipe and filter out the events I want, I can transform them all to be standard and looking perfect and then send it to another Kinesis data stream.

That those events from the second Kinesis data stream will be processed by my downstream services and my downstream services will need to do less work on the filtering and enhancing of the events already.

So now we have our events in the EventBridge. And what we want to do is to fan out these events to different places. We want to send the notification, we want to generate an image using Stable Diffusion with Bedrock and we want to generate an audio using Polly.

So to fan out events, the best way to do it with EventBridge is using an event bus. An event bus allows us to get events from different event sources like AWS services for free. We'll see it in later demos how we can use S3 as an event source, custom applications like the one you have here that is sending a story, a new story has been created or different serverless applications can send events into different event buses.

So event buses are like little highways where those events are navigating. So if an event is in the default event bus, it cannot be in a custom event bus. So you need to choose where your events are going.

And inside those event buses, we have rules and those rules will be the ones directing the events into different places. So we will have a rule that says all the events that are coming from S3, this particular bucket, when there is a new file created, will trigger this Lambda function. Or all these events from when a new story is created in Dynamo will go to this SNS topic to for a message to be sent.

And then we have targets for EventBridge. We have over 20 different targets in AWS and API destinations that allows you to send events to third party services. So you can connect, if you are doing, I don't know, a fulfillment service and you're using a third party delivery service, you can add that as an event and send them an event - "Hey, here's a new order. You might need to fulfill that."

So how rules work - they are very simple. An event will send them into a bus and then the event will get matched against all the rules that you have defined in the bus. The event can match on rule one, rule two, all rules, whatever. And when the rules match, then the event will be forwarded and fanned out to all the different targets.

So it can be forwarded to no one - so that event didn't happen, nothing, nobody listened to it. Or it can be forwarded to so many different places, it can be fanned out. And this is how we solve the problem.

So now we have a full architecture all driven by events and it's generating the story, it's generating the email, it's creating the image and it's doing the voiceover. So this is the first example I want to show you.

Now I want to jump into the core of this talk that is automating video production. Now this is a problem that happened to me last year. And when I started building this, I never thought I would be giving a talk on it because it's a problem I have every day.

I'm a content creator. My team is a content creator and we have a lot of content in English. But I also have an amazing community in Spanish that needs good content and deserves good content. But I cannot clone myself to create good quality content in Spanish all the time.

So I was looking Discovery Channel in Latin America and I saw these videos that we get that are dubbed and they are obviously dubbed. And I was thinking, can I do the same for my content? Can I dub it? Can I use AI to do that? And that's how the inspiration came to build this demo. But it's not a demo. I actually use it every day for my content.

I upload a video, it gets dubbed automatically and I get an email. Over the course of my development, I realized that AI is not that smart and I need to do some validations in the process in order to make sure that the quality, the results of these videos are really good because the audience deserves good quality.

So even though everything is automated, I still need manual inputs in the process to make sure that the results are good. So I will show you now the demo and then we will go into how I built it.

Now, I'm already starting to add the validation to the process because I realized that it was important an event then a notification will be sent. I will upload the validated transcription and our land, the function will run. Now it will do the trans the translation upload to free, send me an email. I will the translation and then I will have two couple of lambda functions as you saw. But instead of using state machines with lambda functions. So I was building this little lambda list. And again, my internal developer advocate was pulling my hair and this is wrong. We talk, we tell to customers that they should do lambda functions that do one thing only. And these are a little hard to test because they do basically are calling an api and i need to log console, log all the time to see what is going on. Yeah, i need to find a better way because at the end of the day, what i'm doing here in these functions in this particular functions is task coordinations. I used the function to sequence task to retry task. By the time I was developing this, there was no bedrock. So I was using open a I and open a I was sometimes not very responsive and i needed to retry and i needed to catch those errors and do something with it. Sometimes i want to do a map and over an array and apply the same function or i wanted to do a choice a switch or i wanted to run ta in parallel. At the end of the day, i was building an orchestration pattern. I was building a little controller that was managing state and he was sending it to the different end points to the different api s. And I realized that what i really needed was a state machine. A state machine is a collection of computational steps, discrete computational steps, things that do one thing and one thing only we have one starting state and active state. Every state has an input, gets an output, does something very easy to replace components that are very decoupled. And these transitions between states are based on the outputs and rules that i define. So our brain naturally as developers as as humans go to build state machines. Because these are things that we can build very simple in our flow like these flows of actions. But if you are building state machines in aws, i will recommend you to try step functions. That is our state machine managed service. Basically, you can connect and you can do it visually or you can do it as code. I will give you the code for everything. You can throw different boxes to call api s internally from aws or htpn points from your different services. You can add operations like flows choices parallelization, try catch retries, error handling. And the good thing if you do it in the visual studio code in the visual studio from um the functions is that you can get the whole definition of the state machine and build it as visually, but also keep it as code in your github report. So you can iterate on that step. Functions integrates with over 220 different aws services and it integrates starting this week with http end points. So basically you can integrate anywhere nowadays. So now we can break this little lambda list into a state machine that solves exactly the same problem. And I can reutilize some components like my signing or l function that is signing an sre object. So I can get that in my email. I can also read, utilize my topic, i can build these state machines and you will see how similar all the state machines look because at the end of the day, that's the simplicity. We want to have simple patterns that can be replicated over and over. And by the end of this, this exercise, i end up with three lambda functions that do very simple things, but very critical to my solution, very specific and i could not replace them with anything else. Well, we will see that we can replace now one but first is the dubbing of the video that's using a lambda layer and using a library called fmm peg. Then they get s or l that does some magic and gives me a swl and then they are calling bedrock with the generate video description that will do all this chason with all this code. And then everything else can be transformed from code to workflow. I can grab code that i had in my lab a function and transform it into something into an api code from my workflow. And the cool thing is that i can add all the retries and the error handling in the state machine. My favorite part is that now i don't need to add console logs everywhere because whenever you're running a state machine automatically, you will see the input and the output of that api call. So you will know exactly what was sent and what was returned. You can see the errors. You can deep dive in every error, you can understand how long everything took. And you can really improve your application by working from the state machines. And if you want to avoid functions again, one really cool feature that state step functions have is intriguing functions that allows you to do basic data manipulations on chason arrays, strings, mass operations and encoding decoding and many many other things. So now if you need to generate an identifier for some data that you want to store in dynamo, you don't need to call at lambda functions to generate that identifier. You can do it from your state machine code. So it looks something like this. I'm used this state intrinsic functions into many places. So you can see there in the parameter attribute. I'm building a new ar right? And I'm formatting that ur i with two dynamic parameters, the region and the payload. And it's as simple as that i just format that and i will get the right ur i and i can do more complex things. I can nest up to 10 intrinsic functions together to build very, very interesting results. So here you can see in the command that i'm building a a les cli command so i can copy, paste it from my email into my cli. I am that level of lazy that um we basically grab a ur i split it by the backslash and then get the third item in that uh uh result and paste it inside my aws cli command. So that command is perfectly uh built and these are three levels down, but you can go up to 10 levels. So let's look at the architecture of the solution now. And you will see the simplicity that it has. I like simple things. We have a state machine. The first one that gets triggered when an event happens, put something in a free, sends me an email. This is the first state machine it's called transcribe. Transcribe is an asynchronous call. So we here need to do a little waiting for the asynchrony to happen. So we will start a transcription shop and then ask if the transcription shop is done here. It doesn't matter how long the video is because the state machines can run up for one year. And I'm not planning to record one year of video length. So we are fine here. Then it transcribes and it puts the file in as free. And again, it does exactly what we were doing before signing the or l sends me an email. Second state machine translation works exactly the same triggers by an event. Put something as free, sends me an email. The state machine looks very simple. But because trans translate is a synchronous call for a small amount of text i can just basically send and receive the results right away. Store it in as free and signed the world and sent me an email. You are seeing the drill here. Another state machine dubbing the video get triggered with an event in s3. Put something in s free, send me an email. Now we are using poly polly is another asynchronous call. We are calling polly waiting for polly to execute. Again, this is independent from the function so it can last up to one year. So polly can take it time to process and create that audio file. When the audio file is ready, we'll send it and start invoking this lambda function that will stitch that new audio file in spanish with the old video and it will send me an email fourth uh step functions. Again, we are generating the uh results with generative a i the description titles and tags. A new event called the state machines. Put something in a free, send me an email. Now we are just calling a lada function to the and now you might be wondering why to have multiple state machines instead of a big one. And i wonder this question for my whole like i was in alicante when i was thinking about this, i'm that crazy. I was in the beach and i was like, why have so many state machines? What is the reason that my developer advocate heart make me build this that way? And i will explain you my thought process. This is the result. All state machines are invoked with an event. You saw the simplicity of that. Let me show you how you can write the code for invoking a state machines with an event. I talk about eventbridge that you can trigger things with rules from it only services. So here we are triggering the state machine from a rule when a new object is created in a specific bucket, this is all the code i need to trigger the state machines. The only thing that will change for each state machine is the name of the bucket. Then i can copy, paste it. I mentioned i'm lazy. Let's look now what happens if i will have one big state machine. For the simplicity of this demo, I'm making each state machine a nested state machine inside a big one. So there is not a lot of mess. This big state machine gets triggered with an event. Every time we upload the original video for solving this problem. You need to learn a pattern called the call me maybe pattern. My colleague, ben smith name it like that. That will pause a workflow for up to one year until a task token is returned. So we can have a task that is run with the weight for task token. And this will post the state machine for one year until the task token is returned. Generate a task token that later we can start restart the state machine if we return this task token back. So this is a pattern you need to know in order to see how we can have one big state machine. So now let's go to an ultra architectural diagram. So here for simplicity, i have only two steps in my process. The transcribe and the translate because in the middle, there is a manual approval process and i want to show you how the flow will be. So we have the new video in english uploaded that will trigger the state machine until there we are the same. We start the transcription store the thing in a free send a email. Now we say wait for task token in this step and we will get a task token and we will store it in dynamo. Now the state machine pauses. Marcia comes in, validates the transcription later on and upload that file to free that triggers a rule that will grab a lambda function that fetch the task token from dynamo and restart the state machine and do the translation. And now we need to do exactly the same all this shabang for the translation to the next step. And you can see here that there is a lot of things happening and simplicity is always the king.

We want to go as simple as possible in order to minimize problems. Because now if I want to add a new step, I need to build all these things at the side of my state machine that I need to test that I need to make sure that everything is working and it's hard and we don't want to have hard things because we need to maintain them.

Maintaining, as Bernard was saying today, building the code is this maintaining and operating is this. So we want to keep things as simple as possible. Adding a new step in my simplified process is as simple as having a state machine reacting to an existing event. So this is the simplicity that we want, we want to not create a lot of cognitive load in our developers.

So here you have seen how orchestration and choreography work together, how we can use events to connect microservices of tasks together, how we can use workflows to coordinate tasks together, how we can do workflows to strict coordinations. That is something we need in general when we want to do things in a certain order and how simple is to add new functionality with events.

Now let's go to the generative AI aspect of this talk. I mentioned a million times during this presentation Bedrock and you might be all familiar with it because nobody look at me like what she's talking about. So that's good. But Bedrock for me was a revelation because again, I'm not a machine learning expert, I'm a developer. So for me, it opened the door to build generative AI applications through an API I got access and this light is already outdated because we had announced so many things.

I got access to a lot of foundational models just through an API from the Amazon ones, the ones that are built by our partners on start ups and I could pick the right one to solve my problem. As they was using stable diffusion to generate the image. And I was using Jurassic two to create all my assets in my uh in my uh videos and, and things like that. So you can pick the right model for the right problem. And if you have a team that can fine tune that model and can do that, it's so simple as pointing an S3 bucket to your data and get Bedrock on the work to fine tune the data and start giving results that are specific for um for your use case.

So let's go to our state machine. I'll show you that we have a state machine that use a lambda function. I'm sorry to tell you that this light is already outdated because on Monday or Sunday, we released the uh connection from step functions to bedrock. So now you don't need a lambda function to do this anymore. This happens every rein event, half of my lights go out of date very fast. Um but that's good because now you can just basically drop the bedrock component pointing to the model and boom, you are ready to go. But I didn't want to remove this light because there is also one interesting thing this happened when I was building this lambda function.

So if you want to call bedrock from lambda, the only thing you need to do is to give permissions. So give permissions that are in the right action. So you can be specific and say bedrock in bulk model, not like I'm doing here with a star or you can give permissions to the right resource and the resource will be the right model that you are using. So you can give permissions to your function on the right API and in the right model and then you can write the handler of this function.

I will tell you a secret and it will not be a secret anymore because we are recorded so it will not stay between us. This function was written by code whisper when bedrock was announced, it was announced for a few weeks or I would say days only with python. I'm a node developer and I was like, I need to try this. I need to change my demo right away. So I went to code whisper and I told, hey, can you help me? I need to uh write a chason and receive something and write a function. I start giving prompts and basically code whisper helped me to build this thing without me not knowing python and I was blown away. So I use it a lot.

So the first bit here is the prompt. The prompt is something I iterated a little bit and is grabbing things from the event uh from the function that it is receiving. Basically, it's receiving the transcription, the translated file and then it's c the asset. So the prompt is there. Then we have these four parameters. And if you're a developer like me, you will be looking at those four parameters and it's like what that means. And I will tell you a thing bedrock is great because you can go to the bedrock console to the playground and you can write your prompt, you can fine tune it there. So you don't need to go and iterate a million times, you can change your inference configuration. And there, it will tell you more information on each of the configurations that are there. For example, an important one for me is the maximum uh completion length. That is the amount of tokens that this will think will return me. So if I have a very long file, I want to get a longer result maybe and then you can go and make sure that your call is right and select b API request. And this will return you a cha with the four parameters configure exactly as you need them.

One thing is that every model has different configuration. So it's important to always check that the configuration that you have for one model is valid for the next one because it will not be so always check this API request. And that's how I created that. Then you call the model. In this case, we are using bedrock invoke to invoke the model and get a response. We can also do not from step functions. But if you are using uh lambda, you can do response streaming, so it can stream back the response back into your client. So it can be more dynamic. The response there is demos on the resources and then you can get the response and send it back to the step machine.

In this case, if you want to see this in action, we have several videos. Less video is a demo that my team built in order to uh live stream video. Now we are live streaming from re invent. It's totally built with several less architectures. So if you open it, you will see people broadcasting right now and you can also watch the on-demand videos. There is a lot of sative AI going in Saba's video and it's a cool architecture built on plug in.

So this is the architecture and now that you learn about the events, you are maybe a little bit more familiar with that core that is connecting our microservices. All the microservices are connected through events. And the cool part for this presentation is the plugins that are down there. We have a group of amazing solution architects and we asked them to build plugins for this uh application. And all of them came with different solutions that they were reacting to different events that later we could integrate into our solution. So they build plugins mostly using AI.

So the first plug in was that somebody transcribed the video on demand when it was processed and you can see that the solution that they built is very simple, similar to the one that I have already implemented. The second plug in that somebody built was using bedrock for generating titles tags and so on. Until nowadays, we have a list of plugins that is really long.

So if you go to several less video, you can click in any of the videos that are there to watch on demand and you can click a button that says how this video was processed there. We explain you how we do it and you will see a lot of state machines. For example, one really good use case for video processing. If you have a longer video, you cannot use a lambda function because lambda functions time 15 minutes in. So you can use, we use ECR. So if somebody decides to stream for 45 minutes or one hour, instead of processing the video with lambda, we have another uh we have a state machine that will say, oh this is a longer video. This needs to go with ECS. If it's a shorter video, it goes with lambda and it's the same container image because that's the amazing things of these two things, they work very well together. And then after the video is processed and converted into mp4, we can then apply all these plugins that the amazing solution architects build for us.

So we will moderate the content using recognition. So if somebody is showing something inappropriate violence or nudity or something that is not right. We will not allow this video to be played on demand. Luckily, our testers are quite nice and never happened. Then we are transcribing those videos because you saw the power of the transcription with the transcription with the text of the video. You can do so many things. Then we apply the titles, then we apply, we generate tags, then we do a leader board and all of this happens inside a state machine.

If you want to learn more, we have these builder sessions going on. I hope there is one today that we will go more in details of uh power tools, lambda for video streaming applications, so you can check it out.

So let's wrap up, don't panic. Everything is an end point. There's still a lot of work for us and more than ever you need to call this MS for your applications. That's what we have been doing forever. Use AA services, embrace them. We always want to build our applications and let someone else do the other parts that is not our expertise, either our ML teams or AWS or start ups that are dedicated to us. Embrace Bedrock AI services. Orchestration is very useful to coordinate tasks. And this doesn't apply only to AI applications. But you learn today that to build an AI application that can do many things. You usually need to combine multiple, multiple services. Together transform the data and stitch them into the next service. In order to get a result, Step Functions is a great service integrations with AWS services, integrations with HTTP end points. So you can call your four parties and your start ups support for asynchronous and manual uh validations. You can integrate also with the same pattern that I show you. You can integrate it with legacy systems or other systems of AWS. Not only with manual processes, intrinsic functions to do data, simple data manipulations, event driven choreography is great to react to events in your environment. It's seven o'clock, somebody upload a file. A new video is ready to be processed. EventBridge is a great service to have in your toolkit. Schedulers is very good for reacting to time pipes. Great to integrate different services together that are producing events and we use all the time and buses that help us to find out and create rules to move those events forward.

And as I promised, here are all the resources you will find a deep dive on step functions on a eventbridge. You will find a code for the applications that I show you today. You will find the launch announcements for all the things that I told you that we announced this week that are relevant for this. Everything is here. But if you are not a no developer, you might think, oh, a lot of these things are based on node, but don't worry check the uh the theoretical part and then jump to several less land and find the solution, the pattern in your right infrastructure uh code framework in the right language for you. So you like to do it in the form in our 600 patterns, you will find the exact pattern that you're looking for. You want to build state machines with CDK, you want to do eventbridge with terraform pulumi or whatever everything is there.

That's it for me. Um we have like 10 minutes so I can take some questions but I.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Using AI and serverless to automate video production

I'm Marcia Vis Alva. And today we are going to talk about using AI and serverless to automate video production. But I will not be only talking about video production. I will be talking about other things as well with AI and serverless.We have quite a lot o
复制链接

扫一扫