Improve FMs with Amazon SageMaker human-in-the-loop capabilities

my name is romita. i uh i am a product manager in sage maker and today i'm going to be talking about how to improve foundational models with human foundation models with human loop with my colleagues, amanda lester and kate kisha, my friend and cto of crikey a i.

and we thought, let's go through the whole agenda and introductions and all of that. but i think the one recurring theme we have probably seen throughout the conference generative a i foundation models, which is at the core of generative a i. so we said, why don't we let a foundation model do that for us?

so, hi, i'm roy, do product management, growth and business operations leader for aws human and moot a i slash ml services. today, i'm going to talk to you about some of the challenges in building operational and using foundation lines, the need for human and moot a i slash ml services to solve for those challenges and how aws can help with sage maker ground truth.

hello, everyone. my name is amanda lester senior business development lead at aws. i'm excited to show you a live demo of the latest enhancements for generative a i use cases.

hello, i'm kicky, the chief technology officer at franky a i. today we'll show you how we used amazon sage maker ground tool to train our a i animation foundation model. was that cool?

but the rest of this talk, you are going to learn about how these videos generated within a matter of minutes using crikey dot a tool and sage maker ground truth for human and loop capabilities out here.

so for today's conversation, we just went through the agenda. we're going to talk about the human loop for machine learning and talk about the problems there and how stage make a ground truth can help you solve them.

amanda is going to come and show you a demonstration of some of the key capabilities and then we'll hear a customer success story with crikey and wrap up when you talk about human loop.

it's not really a new thing in especially in the field of a i. traditionally, you have needed human loop capabilities for labeling data unstructured data specifically for a i. and in the world of foundation models, you would have you need human judgment to assess the models and you need human in loop for data collection, data generation for supervised fine tuning, all the forms of fine tuning, which is called instruction tuning.

fundamentally, the problem looks somewhat like this. you would start, you know, this is in a traditional deep learning model and it's interesting we call it traditional today. it's probably just a few years old, but that's how fast up a field of a i moves.

you would take your input data, raw data, which would be images, text, something like that. you would do some labeling, which would be putting a bounding box around a car or putting a bounding box around a stop sign or some other object, take the label data, train a model and then deploy it, assess how that model is performing. have humans readjust some of the labels, add labels, edit labels again, go through fine tuning that model and this would go on it in a loop till your model gets to a production grade state. and you could deploy it in inference for production applications. and that process was and it continues, it still exists and it still goes on and it continues to be time consuming, expensive, human intensive, of course, because it's human in the loop.

now, the advent of foundational foundation models change some things. foundation models differ fundamentally from traditional deep learning models. the deep learning models are trained on label data or annotated data. you would then train the model and use it for one particular task. foundational models. foundation models are based on transformer architecture. they are generally very large in size. what is large language models which are one of the key types of foundation models are in the order of tens to hundreds of billions of parameters. and you know above a trillion parameters. in some cases, video probably a little smaller these models are trained on large corpus of unlabeled data. that data might be internet data, that data might be data from your corporate databases.

once it's trained on this data, it has context to answer questions about other areas or subjects. so these foundation models, you interact with them, you put in a prompt, a text input and it will give you a certain kind of response on a particular task.

so that's great, right? all this unlabeled data, just take data and you know you go to town with it, we are done with the whole human in loop thing. no need for labeling data and notating data. well, no, not so fast.

it turns out that the foundation models require a much more holistic set of human in loop or hill capabilities.

the process of training a model with a lot of texts, images, videos, whether it be from the internet or from your corporate databases. scientists actually call it pre training a base model once you are pre trading the base model, yes, it has a lot of context. it will answer question about something else with that context. but you want to figure out firstly whether that model is answering things accurately.

so you would evaluate the model and that evaluation may not just generally be about accuracy but about other dimensions which we will talk about as you evaluate the model you will also go through a process after the evaluation of fine tuning the model that model fine tuning. a customization can be done in multiple steps.

in one step, you would take human generated data for supervised fine tuning. scientists call this in the scientific language. it's called demonstration data. in some cases for language models, you would also do another kind of fine tuning or reinforcement learning with human feedback. that particularly means that you are going to apply inputs and rank the responses on particular dimensions like toxicity accuracy and take that ranked output to train the model or fine tune the model again.

so that that that kind of data used for reinforcement learning with human feedback or rhf is called preference data. and this kind of model customization process goes on in a loop iteratively till your model is ready to be used for your particular business use case.

some of the challenges we have seen inaccuracies, especially if you take a model off the shelf and start using expected to do answer right questions about your customized use case which may be in a financial services use case or some other domain.

another one, the model spews out toxic content or it could actually also do spew out contents with magnify the stereotypes and biases. and finally, something very interesting known about foundation models is they have a tendency sometimes to hallucinate, hallucinate means it actually gives you a very confident answer to a prompt or an input question, which may not have any context with the question asked.

so these are some of the problems we have been hearing from our customers and the ways human in loop can address some of those the human in loop services to foundation models fall under four broad windows or categories.

you would use demonstration data to do supervised fine tuning of models for your particular use case in the demonstration data types require it could be question and answer pairs generated and documents for language models for images and videos like the ones you would see from crikey later in this talk would be captions for those videos which explain what's going on reference data. this is where you would apply the prompts, collect the responses have multiple responsible prompt rank the response in the order of toxicity accuracy, helpfulness or any other dimensions. you might choose like some of some of our customers choose diversity and use that information to fine tune the model.

in some cases, you would actually use this preference ranking data to fine tune another smaller model called an instruct model, which in turn could do the preference ranking for the main model. that's another approach to do rhf model evaluation.

yes, there is a fair amount of automated evaluation capabilities, you know. but for a lot of subjective criteria like tone and brand voice, you would still need a human in the loop and even for accuracy, you would initially need to figure out what is the original ground truth. what is the correct answer before you try to do some kind of automated evaluation and a subset sort of evaluation but deserves its own window category is red teaming. this is about stress testing a model to determine any vulnerabilities to try to get if it's spewing out toxic input, try to push it to toxic or biased outputs rather see how it actually reacts in response to a prompt, a toxic prompt. because the idea is to stress test this before you put it out into the deployed business case. so it doesn't actually do it in production.

so all of these require human in loop at scale. and this is where amazon sage maker ground truth comes in. it provides you the most comprehensive set of human capabilities to improve the accuracy of your models.

you would start with model evaluation. we launched this capability yesterday both in amazon bedrock and in amazon sage maker. enabling you to have work flows to do model evaluation, automated as well as human at scale ground provides you workflows to also collect data which can help you customize your models, either with supervised fine tuning or rhf.

but we don't just provide you work flows. there might be times when you need to really scale up a workforce to do the data annotation. and we provide expert workforces with a range of expertise which could be medical legal school teachers so on and so forth. and we can scale up these workforces to do data annotations or generate data for you or evaluate models at scale.

and when we provide workflow and workforces, it's an outcome driven offering and we provide you quality and service sla's for those outcomes.

so some of the areas, key areas we have been working with our customers to provide for provide data for customization of their models. we have helped them and continue to help them customize foundation models with human in loop capabilities by creating demonstration data. some of these are customers choose to use their own workforce or they choose to use our workforce with our workflows.

these have been around document summarizations, intuitive work flows to enable question and answer generation for documents. so where you could actually write a quick q and a and also indicate which part of the passage or document you're getting a question and answer from intuitive capabilities to enable at scale labeling of images and videos. so you could point out like what is the setting of this video? what is this video about? you could write these fields of questions once you fill all of that in writing your caption for the video is actually very easy and then we can do this at scale in the order of labeling hundreds of thousands of videos in a matter of weeks.

for preference data, we provide workflows to have your prompts and responses. evaluate them and rank them in the order of you know, whether you, whether it could be ordinary ranking like a scale or so on.

the next thing which is around evaluation of foundation models. again, we provide the capabilities to do both automated and human evaluation, both in bedrock as well as in sage maker. and if you want to bring your own model and work with us to evaluate the model, we provide that capability too.

we give you insights into key metrics, robustness, toxicity accuracy through automated evaluation. but we understand that human evaluation will always be a part of foundation model, at least in the immediate foreseeable future. so we let provide you intuitive interfaces to prompt the foundation models at scale, evaluate the responses. ok? multiple options to value response which could be thumbs up, thumbs down ordinal ranking like a scale again. and once you send farm this out to your own private workforce or use other workforce, an external workforce, they could do the evaluations without actually requiring an aws id. you could just you could use their own regular emails and then you can see a dashboard and you can see the results of those evaluations.

we can provide you workforces and workflows to conduct red teaming analysis and a lot of the model reports are available. then after that for you in intuitive dashboards as well as in your s3 bucket for you to use in future. because model evaluation is not a one and done thing, you would have to do all of these periodically.

so you would always compare with your past model evaluation even if you are say using model x or model y. and you have decided you can only stick to that, there will be updates to that model. and before you start using an updated version of the model, you would run an evaluation of the past version and the current version with your own prompt data sets and your own business use cases.

and finally, it's not just about work flow. as we mentioned, the expert workforce, the mode of operation out here is that you would engage with us. we assign a program manager, it's a full service offering

They take the requirements from you, get a sign off on a quality of what exactly it means what good quality means for your use case and then manage the workforce and deliver your data annotations, whether it be q and a pairs or related preference ranks or image captions to you. And if with the quality sla's and the delivery sla's, we can deliver on a weekly basis for some customers. We deliver on a daily basis as required. But we understand that for a lot of use cases which are proprietary confidential or require some very custom subject matter expertise, you would want to use a privately managed workforce. We provide you that option too. We also have over a dozen vendors. We work with which you can select for your tasks. And finally, you can always crowdsource some of your tasks if required with amazon mechanical contractors.

The question comes, why, why sage make a ground truth? What's the difference? The difference fundamentally comes down to the three s's to deliver you quality data at scale. The first one is the scale, whether it be labeling with workflows or workforces, we enable you to scale up very easily. We can scale up to thousands or tens of thousands of data annotators with different kinds of skill sets. When we talk about skill sets, these are folks who are already doing data annotation or evaluation or question answer generation for a il use cases. They are working with our external customers as well as internal products, aim l products like textract or alexa and then for a full service engagement where it's workflow plus workforce, we have our science team work collaboratively with you, advise you on some of the right data annotation strategies and also try to collaborate, solve your problem because this is all very, very, it's all pretty new. And at this point, we are all learning a lot every day from our customers and from our internal use cases.

So with that, I'm going to pass over to amanda to walk you through a demonstration of some of these key capabilities on sage maker ground.

All right, thanks rey. Um so I'm gonna take you through today. Uh an overview of a live demo of how you can go in and create demonstration data preference ranking data. And uh specifically for a variety of different uh types of um of use cases including uh documents, taking a look at uh text ranking uh from prompts from a model and looking at the responses and ranking those responses accordingly to human preferences. And then I'm also gonna show you how uh you can leverage uh sage maker ground truth to specifically go in and uh caption images and videos.

I'm gonna play the role today of a data scientist at an advertising company. My advertising company, we have a lot of different types of assets that are really important for our respective business. And uh are we're gonna be going in? And I've already gone in and created a uh a number of labeling jobs with an amazon sage maker ground truth. And I'm going to show you what that looks like.

So I'm going to switch over now to my screen and I am in the aws console. I have gone in and already created an amazon sage maker ground truth labeling job. And in within this job, I have gone in and updated and set uh provided a path to the amazon s3 location where I have stored all of the images and assets that are really important for my advertising business, including the documents that I've already uh prepopulated. The uh the images of a number of videos as well as the um uh the, the responses from models that I've already uh inputted in there. And then I've also set the amazon s3 output location uh of where I would like all of that annotated information that's being collected from our workforce to be able to go in. And uh I will then be able to u utilize that information to then uh fine tune and train my model within the amazon sage maker ground truth application.

I can go in and i've i there's a new generative a i template that you can select and within this generative a i uh template, there's a new ability to go in and do text ranking. So this is doing reinforcement learning human feedback and this is the workflow that I'm going to use and set up to be able to uh view the responses from the model and then rank those accordingly. And then I also have this new option for question and answer pairs where I can go in and upload a number of documents and then create and generate questions and the appropriate answers that I want to be able to make sure to teach the model how to respond appropriately.

So I've already gone in and set a lot of this up. So I'm going to show you what that looks like from a uh a labelers perspective. So I'm now logged, I'm now not in the aws console. I'm now in the interface uh as a labeler. And what I can see here is I have two jobs that are already set up here. One for text ranking and then one for question and answers. And I'm gonna go in and start working on the text ranking uh response and we'll see if it loads here um on this uh particular screen, what you'll see is uh I have uh in terms of my, uh in terms of my advertising company, we have a number of different uh prompts uh that our, our sales team is gonna go in and uh input into our large language model that we've already uh pre selected. And what I wanna make sure to do is I don't want these responses to these, to these prompts to be super verbose. It's really important that we make this uh nice and clear for the executive audience that we're gonna be speaking to.

So I've gone in and I've already uh uh generated a number of different types of responses from the model and I'm gonna go in and rank the responses in my, in terms of the preference, in terms of which response is most clear uh and, and less for both. So I'm gonna go in here and I'm gonna rank that accordingly. I'm uh I'm gonna rank this as number one because it's uh most concise. I'm gonna put this one as number two because it's uh very similar and then uh three and four I think are about the same. So I'm actually gonna um pick this one because I think this one's a little bit more uh uh less detailed. And I'm gonna now submit that and this information will then be used and really important for uh myself to then uh fine tune and train my model to teach, uh teach the model uh really good examples of how to respond. And in terms of conciseness and the, and giving the model examples of types of uh responses that are uh aligned with the preferences of our business.

Um the next first thing I'm gonna show you is uh an overview of how to do this with a question and answers. So I've already gone in and updated uh the uh amazon s3 account with a number of documents uh in terms of travel and sports. And we're gonna go in and we're gonna teach the model how to respond appropriately to the key information that's important for uh for travel and sports and these types of documents. So a lot of these foundation models are trained on open source data sets, but the those foundation models don't necessarily know about our business and the documents that we have internally around uh these areas.

So I have a document here around fresno county. And on the left hand side here, I have a number of instructions that I've given both myself as well as uh the, the workforce who's gonna go in and review this and I'm gonna go in now and create uh a number of questions uh a uh around this particular document that are gonna be most common for uh somebody who's gonna go in and um review this document. So, uh this is the largest, uh fresno is the largest city in the central valley. Um so, uh where uh what is the largest city in the central valley? And the answer is fresno county and I'm going to reference uh this where that answer is found within this body of text and provide the model with the ability to understand where within that passage of text. that's uh where that answer is found. And what's important about this in terms of my workflow is uh this is basically teaching the model where to find that appropriate information provided a citation of where that is. So that way the next time that somebody is going to go in and uh prompt the model for information around this document or around fresno county, uh the model will then respond with the appropriate area within that body of text that that answer is found. Um so this will be really useful uh for that. And uh I'm gonna now submit that.

Um so the next use case, I'm gonna show you is uh around creating a number of uh advertisements and campaigns for our advertising company. We have a number of different uh advertising campaigns that we do and uh it's really important that when the advertising executives at my organization are going in and creating these really uh detailed advertisements that uh and they're using a generative model to do that, that i'm giving the model examples of the types of really detailed imagery and information that i want that model to be able to go in and produce.

So I'm, what I'm gonna go in here is use the tool to describe the key areas within um within this image that i want to uh call out. And i'm gonna provide a detailed caption of what that looks like. So, uh there is a woman in a white dress in the center of the ballroom. There are uh two other women in an orange dress and the uh the floor is extremely glossy and uh everyone is looking out the window and what's really important about this is uh in terms of our advertising executives when they're going in to create their uh these advertising campaigns, when they go in to create really detailed examples of what they want that judge of model to create. I wanna give really good examples of a golden data set and teach the model um the uh around the type of detailed information that i'm looking for. And then i'm gonna use this to then uh fine tune and train a reward model to um uh to output similar types of uh examples.

And then the next use case i'm gonna show you is for specifically for videos um in this advertising company, uh we have a number of uh a video footage that we want to be able to create. So i'm gonna go in and watch what this video is about and then i'm gonna need to then describe specifically what's occurring in terms of the activity of this video and uh show that. So we're gonna watch this really quick. Ok. So it looks like in this video, there is a man uh showing somebody how to swing a golf club and i'm going to um uh type out the keyword, uh keywords here, so, and instructing others to do the same. Um so there's a, uh and then in terms of the primary actions that are occurring in here, it looks like uh there's uh pulling the golf string back. So that's one, he's hitting the golf ball and striking it, that's two. And then there's a follow through. So there's three actions that are occurring in this and this is a valid video for uh our sports campaign use cases and i'm gonna go in and submit that.

Um so in terms of what i've shown you today, uh i've shown you how you can go in and use amazon sage maker ground truth uh to create and rank responses for models to align those with human preferences. And i'm going to then use that to then do reinforcement learning with human feedback. I've shown you how that you can leverage this for creating question answers to teach the model how to respond and answer questions appropriate in the manner that um is important for your business, uh leveraging um leveraging the question answer interface. And then i've also shown you how you can uh do this really easily with uh images and videos and caption those. So you can teach the model how to respond accordingly. And i'm gonna turn it now over to ketti key so she can show you how uh uh her team leverage this for uh their generative use cases.

Thank you uh amanda and thank you, roy as well. Uh so i, i first wanna start uh just by introducing myself. So uh i'm kicky, i'm the chief technology officer at crikey. Uh my sister, uh who's my co founder is, is also here as well. John v. Uh this is actually our seventh year of building our business. Um in short, uh we're an a i tooling company uh direct to consumer and enterprise and we enable people to turn uh text or video uh into 3d uh humanoid animation. Uh so thinking about all the different ways that you might use those animations. We'll cover that in today's presentation and we'll also talk about how we trained our foundation model for text to animation uh in partnership with the ground truth team years and through how we used amazon's make the ground through to launch our 3d a i animation foundation models.

Cool. there we go. Um so first i wanna talk about uh why is animation exciting and, and why did we choose to build our company around 3d animation? So, uh most of us have probably seen an animated movie at some point in our life or you've uh heard of animated movies. And historically, the field of animation has been fully manually driven. If we go 100 years back to the founding of the walt disney company, uh you know, animations were hand drawn. Uh we later added color. Uh and then now we have a lot of tools that help support the creation of 3d animation uh which is uh extremely complex and requires uh years of training. And in the last 5 to 7 years, we've actually seen a lot of use cases for 3d animation uh outside of entertainment. And uh we'll cover some of those today as well. And so with this in mind, uh we really wanted to or anybody to animate. Uh so probably most of us here are not animators, but just a quick question. Is anyone here an animator, please raise your hand. I'm not, i'm not an animator. That's what i thought

so there's no animators here. um it's, it's a very niche field, very few people are trained as 3d animators. and so we wanted to build a tool that would empower anyone to animate.

so with our foundation model, you can generate a high fidelity 3d animation in five minutes or less. we also have a web based 3d no code editor where you can composite and put together a full fledged 3d project with avatars uh that speak in any language, accent or dialogue. you can also customize the look of these avatars, uh add customized 3d backgrounds as many animations as you'd like and then more.

um and the goal of this is that much like youtube democratized the way people could upload video content to the internet. uh we at crikey are interested in democratizing who can produce high fidelity 3d content. and so we kind of had three goals when we started this process and wanted to build our foundation model, which is really the core of our business.

um so the first is we wanted uh people to be able to create high quality animations without any prior technical knowledge.

um we also need it as a small start up to reduce the cost and time to prepare machine learning training data. this was covered by romy and amanda as well, but it is quite complicated to find high quality data, uh clean it, uh label it and then subsequently train your model.

um and then we were also getting a lot of inbounds and requests from people who wanted us to produce animated projects for them. and so quite simply, we needed to meet these requests in a timely fashion um and make sure that we could address uh the customer demand.

um and so just to recap, um there's kind of three key challenges, right?

so the first is that manually labeling animation data is really hard. it takes a long time. amanda showed a demo of what this process looks like we train our model against video data. so for every video clip that contains a singular action and you know, one video clip can actually contain several actions, right? but for every one action, uh we need to have a label and the label can't be a short label. it has to be very descriptive, uh very rich. uh and only in this case, will we get good results uh in our model.

the reason for this is if you have a very short label, uh let's say jumping jacks uh for a video of someone doing jumping jacks, uh that's not gonna give you a great set of results for your unconsumed, right? because it means that there's a very narrow set of keywords that the customer has to type in to generate a 3d animation. so we're really trying to produce high quality labels that are, you know, 1 to 2 sentences long and can really give a sense of uh what are all the different ways that this uh 3d motion can be interpreted and described as.

um so it's, it's really hard to do because it takes time.

um it's also difficult to do at scale. we had 100 hundreds of thousands of data points that we needed labeled.

um initially, our team was actually doing this ourselves. so before we met the ground truth team myself, and a few of our uh other team members actually spent several hours a day trying to label this data set. and we quickly realized that we would never be able to label every single motion in our data set uh and do all of our other tasks in our day job at the same time. and that was when we decided to start working with the ground truth team.

um uh we do more than just ground truth our entire offering um is built on top of aws. so we use uh s3 eksr, ds elastic c ec2 amplify and more. i'm happy to talk more about this. if anyone has questions, uh we love building on aws. it's been a game changer for us and we're able to serve customers in pretty much any geography on our consumer tool.

and then we also have an enterprise offering uh that's gonna live at the end of q one where customers can actually deploy our foundation models via sage maker jumpstart and also deploy our no code 3d editor and self host it all inside of their billing instance um triggered by uh just a single terraform script for a one click deployment.

um and again, that's all possible because we've been able to use uh all of the different aws servers uh to services to run our uh offering.

um and then i wanted to talk a little bit about use cases as well. so, as we said, in the beginning, you know, historically animation was uh primarily used for entertainment. but in the last few years, we've actually seen um quite a number of use cases across uh gaming, health care, sports learning and metaverse. and because our tool is so accessible and it's so quick to generate high fidelity animations. uh we see a lot of interesting opportunities uh for many of these use cases.

so that's a little bit quiet, but that was generated inside of our tool. so uh it's a a skeleton avatar that was customized by someone and it's doing an a i animation of jumping jacks and the skeleton is speaking, this is an example of an education use case. uh this piece of content was created in less than a minute. um so extremely fast to create and looks a high fidelity as well.

and our goal of working with sage maker ground truth is to really do this at scale. so we're currently building the biggest 3d motion data set in the world.

um we've actually already achieved this goal. as of this talk, we have the biggest collection of human motions in our training data set that have now got high quality labels. and we've actually already trained our foundation model and deployed it to our consumer offering and we couldn't have done this so quickly without the help of the ground truth team, because we needed high quality media rich descriptive labels um very quickly in order to release our model to our customers.

uh and then again, just a quick overview of the solution. so uh we had an interface that was built by the ground truth team, which amanda showed you all. so we submitted our videos uh directly uh via the ground truth interface. the team would label them and we would have a weekly call, we would go over a subset of the labels and make sure that they were of the quality that they wanted.

and i think what was great for us as a small team is uh really the depth of the partnership. so we were able to have the weekly check ins or we could talk about what was working and what wasn't working with the labels in particular for us because we had such a diverse range of videos in our data set. it was important to uh constantly be updating the bar for what is a good label.

and i'll, i'll give an example of this. so um the first, you know, 10,000 videos that we submitted uh were all sports videos. so there's a certain type of label that you need uh to describe a sports action accurately. the second set of 10,000 videos were all dance videos. um and there are so many different dance genres ways that a dance motion can be described. and so the bar of quality for a dance animation labeling and a sports animation labeling are actually different. there's a different set of criteria there.

and because we were working with uh quite a big team on the ground, truth side, we were able to uh give these bar of quality assessments once a week and kind of update the guidelines as we went. and that helped us get the labeling that we were looking for. uh quite quickly,

we do also offer uh lip sync characters. so all the videos that you've been seeing, um they're all made using crikey a i. so anyone can produce 3d talking avatars with a i animation and voice and you can actually create a custom a i brand voices with spoken audio. so the voices we use today uh come by default with our tool, but you can actually upload your own voice and create a custom voice that sounds like you or like a brand spokesperson.

and because we've trained our model against such a wide variety of different types of animations, um your character can do uh whatever you'd like. so any type of action that you're thinking of, uh you can generate it inside of our tool and the character can also speak um in any language, accent or dialogue and regenerate for education, social media.

so that's a quick example. uh again of piece of content made inside of our tool, uh it took less than three minutes to generate that uh so very fast and high fidelity.

um and we have also uh because we have such a large animated label data set. now, um we've actually been able to bring our a i animations to third party tools uh earlier this quarter, which is very exciting.

um and for the first time, people who are using design tools that don't know anything about 3d content or how to produce 3d animation uh can now generate a i animations uh using a crikey integration.

um so it's been really exciting and again, uh we couldn't have achieved this in a very short period of six months unless we had high fidelity labeled data, which was possible with the ground truth team.

um and here's another quick example. no. uh so that's a talking avatar speaking in japanese. i'm a travel guide. you can see the background, there is sort of a, a travel poster as well. uh this also took less than two minutes to generate inside of our tool. uh and you don't have to have any technical knowledge to produce this. uh you can just do it in two clicks.

and so just to kind of summarize, you know, where we've been and where we're going. uh so when we started this process, we knew that we wanted to democratize how people produce 3d content in particular 3d animation by using text and video inputs and generating high fidelity 3d motion outputs on fbx files.

uh we couldn't do this on our own because we needed to label a huge training data set of videos. and so in partnership with the ground truth team, we were able to bring our foundation model to market in just three months.

it saved us more than 1000 hours in productivity, which as a small start up was extremely important to us and we saved more than $200,000 in labeling fees and also in our team's time. so we could actually focus on uh the other product task that we needed to complete to get our product to market.

um so i just want to say thanks so much everyone uh for coming to our talk today. if you have any questions about crikey a i uh or about uh sage maker ground truth. uh please ask them to us afterwards or, or come up to us. uh we'd love to talk to you further and i'll pass it back to roy.

great. so i don't know about you all. i am definitely going out there and making some animated movies. i don't know how great they will be. it depends on the creativity, but this makes it easy.

ok. so here's some next steps. you can hear this qr code, you could start a two month trial, you could review the online resources, get a demo out there.

um we have some time now about 15 minutes or so for question and answers. and we will also request you to please fill out the session survey afterwards.

thank you very much for your time.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值