Sony Interactive Entertainment: Generative and predictive AI on AWS_snoy interactive entertainment would like to-CSDN博客

本文链接：https://blog.csdn.net/just2gooo/article/details/134868060

Lisa Aguilar: Hello, everyone. Thank you so much. I'm Lisa Aguilar. I want to be joined by Simon and Francisco here in just a second from Sony Entertainment.

So a quick little bit about DataRobot, DataRobot is the only pure play open AI life cycle management platform for both generative and predictive. We've been focusing on nothing but AI for the last 10 years, we have over 100,000 users that have used DataRobot for all their AI initiatives from well known brands like Mitsubishi who has used us in terms of their AI operations to opt their entire processes to retail organizations that you may shop at every single day, like 8451 or Kroger to some of the biggest banks around the world who have standardized on the DataRobot AI platform for their governance and model compliance processes.

And it's because of our expertise both in AI and applying it as our platform that AWS has recognized as, as one of their technology partners of choice specifically around the industry, use cases that we can deliver.

So let me talk to you a little bit about the DataRobot AI platform. First and foremost, we're about helping you streamline and unify the entire life cycle. So we really don't care how many data sources that you have. We don't care what API or AI tooling they want to use and we don't really care what you want to put those AI models into production is, is really about helping you unify and standardize how you operate all of your AI assets, how you govern the workload that you do and then giving you the ability and the flexibility to build with agility.

So I'm gonna give you a quick walk through of the DataRobot platform. So you can take a look at it. So I'm gonna start off with our DataRobot documentation. This is our documentation. This is our website. We've included a new generative uh experience in this. So you can see here is DataRobot is not focusing on the ingredients. It's about helping you feel confident about pulling those ingredients together to give a right response to your users when they need it.

And we do that through a co of different ways. First and foremost, we have some safety nets, what we call our Gen AI Guard models that you can include with your answer. So right here we have an answer. The response has been graded as correct and why we believe it's correct. This is to signal to your user on if they should be paying attention to this response or not.

Plus you get all of the AI citations in here. So one of our customers, Keller Williams is actually doing something very similar with their training experience for all their real estate agents. What they're doing is they're creating this kind of search experience and then they're using different types of what each one of these is as you can think of as a vector database for their video training or their interactive training and then be able to get that ex uh experience.

The other thing that DataRobot does is we have a human feedback loop system. So in this answer, you as a user can then tell the generative AI solution. If you think this is correct, incorrect, if you think it's incomplete or if there is no answer.

So how did we get here? So first and foremost, it's about having a centralized view of everything that you have in production. So this is the DataRobot AI console. So think of what snow uh Salesforce has done for CRM data. You only have one single source of truth of all of your customer interactions.

Now, where do you go when you need to see all of your AI assets and production, it's very difficult, especially if they're built on different systems or deployed to different systems. Think of this as a 360 degree view. Again, we don't care where it was built, how it was built or where it was deployed, it's your command and control center to give you the exact visibility over the service helps the data drift and everything else.

Let's dive into one of these models. Let's see, this bedrock model right here. You're gonna get a little bit of an overview on it. You're gonna get some basic service, health information. But most importantly, again, we're focusing on helping you solve the confidence problem with generative AI, not the ingredients.

How can you feel that you can trust what it's going to be doing once it's in production? So right here, we have a couple of custom performance metrics that you can see with DataRobot. So you can see if there are like guardrail violations that are happening. You can take a look at your LLM cost and making sure that you're keeping those in control.

You can even add things like how is this? This is a really important score for some of our financial institutions that are looking at uh research papers. They want the generative answers to be really readable and then you can take a look at everything else and you can start to set thresholds and alerts so that you can manage some of these things.

So if your LLM costs are starting to get out of control, you can set those alerts and boundaries and go in if you have bad actors that are trying to actually get a response. If you don't want them to do, you can set those thresholds as well.

Now, if we go back to that human feedback loop about the correct, incorrect or incomplete answers, we're gonna just call this one correct. Now, that's great. But how do you then go back and figure out what exactly went wrong so that you can fix it? This is where data, but again, the council gives you a big visibility.

So I'm gonna give you a quick walk through here. These are all your co incorrect, no answers, correct or incomplete answers over time. Now, let's say we want to focus in on just the incorrect answers because of course, those are the things that we wanna make sure are working well. You can start to take a look, you can go ahead and pull it out. You can see are the actual answers getting worse over time. Are your users telling you that there's a signal that you need to go back and fix your generative AI model with DataRobot? You can actually see that.

So how do you then take that information and figure out what part of your process that you need to fix? Again, we make that very easy for you. So think of this as your generative AI lineage. There's no other better way to describe it, other than that, you get a full view of what the predictive score that it believes it's going to be for something that's maybe incomplete or something that's incorrect.

You have all of the actual context plus you have all the prompting strategy. Now, what about that human feedback loop? People that are telling you that something is incorrect, you can go ahead and you actually capture that information as they're giving it to you. So you can see if an answer was correct or incorrect. You can match those two up. You can actually go and take a look and see.

Do I need to go and update my vector database to make sure that my generative AI models are performing the way that they should be performing and giving correct answers. This is how you get that confidence that you can rest assured that your generative AI solutions are providing value to your users.

Now, what about building? You heard this in the keynote optionality is truly key. There is going to be no one winner today. So with DataRobot, we have a multi provider LLM playground. You can actually mix and match and choose different types of models that you want. You can actually bring your own as well. But comparing these models against each other, there is what's very tricky.

So what we have is a nice comparison playground. So I have three different titan AI models that I've built here. While those are loading up, I'm gonna look at the information as you can see, I have different system prompts. So you can try different system prompts, different chunking strategies, different vector databases.

So for this one, I said, explain it to you're speaking to a child. So this one, it says explain it as if I'm not a data scientist. And this one, I just tried it by itself. I have two vector databases here and none here. You can easily see them. Once you've picked the right one, you can quickly put that into production.

DataRobot uh piles all of those pieces together to make it very easy for you to register and then manage those pieces once you do have it into production.

So why did we build it this way? AI is moving faster than you can apply. If you think it's changing faster, you can't keep up. It's because it is R&A was a term that didn't even exist seven months ago. And now it's become very common. Hallucinations is now, I think Webster's dictionary word of choice and it didn't really exist in our language.

So you need the ability to have an insurance policy against and making sure that you're not actually getting into AI debt because you have so many different Llms you have so many different people building and they're building across different aspects. Also, most organizations have more than one cloud of deployment.

So you need to be able to unify and standardize how you're looking at your entire AI ecosystem across all of them. And this is where DataRobot helps give you a really strong, flexible canonical AI stack and backbone.

So think of this as your insurance policy against technical debt and infrastructure bloat because we can handle both generative and predictive. We can help you build with the agility and the flexibility. So you can able to pull any um proprietary APIs. You can use any open source, you can use any cloud model that you want. You have your playground and your experimentation to be able to test different strategies together with your ground data.

And most importantly, you can be able to orchestrate all those component pieces really neatly and organized together. And what's best is you can hot swap those components out without breaking your enterprise grade production pipelines inside a DataRobot. So you have the flexibility to pick and choose as you like.

We hope you operate with confidence and control. I showed you just a few of the safety nets that we have involved that human feedback loop, those generative AI guard models, the ability to see all of those pieces together and then deep dive into each one of those pieces to understand where you need to go and investigate, to adjust.

And of course, it's about being able to help you govern with full visibility and transparency. So again, like I said, we don't care where you built it, how you built it, what tools you built it with or where you built it out. Our governance layers are really helping you get full visibility of all your projects in flight, all your projects at rest. So those are the ones that are getting put into production and then all of your projects once they're in production.

All right. So with that, let me pass this on to Simon and Francisco. We'll deep dive a bit more.

Simon: Hello, thanks Lisa. So uh my name is Simon. I'm just gonna talk a little bit about our uh ML pipeline journey and where DataRobot fits in.

Oh, so when we began, you know, 56 years ago in our fraud system, we were entirely on premise. We had only a rules based system. We didn't have any models at all. So when we wanted to move away from that, we wanted to keep uh one of our founding principles is using managed services.

So AWS obviously is a big part of that. We all use managed services wherever we can. But DataRobot uh their AI platform really uh allows the data scientists to focus on what they do best.

So the DataRobot platform, it automates and optimizes a lot of the things that the data scientists do. And uh there's no more hand coding of, of models from the data scientists.

So it's, we have a joke in our team that whenever a new data scientist joins, they always think that they can hand code a model and that it's gonna be better than the DataRobot output model. But so far we've had no winners uh and everything that we built in our, in our pipeline that Francisco is gonna show in a second here.

We always try to make it fully automated and reusable. So not just for one use case when not just fraud, but i gonna talk about some of the use cases in a second. But let's make sure that we can reuse it for other use cases.

So some of the use cases for, you know, our, our predictive AI, that's, you know, our, our bread and butter, what we've been doing for a while, we, we typically do the, you know, the account security stuff. So uh account takeover fraud and we do financial fraud blocking uh people from using stolen credit cards on the platform and we do that all in line and in, in real time and then the same components we've used over and over for other things.

So we're now using them to do forecasting for revenue for the store. We're using them to do payment optimization for increasing the acceptance rates for uh our credit cards. And then with the, the generative AI use cases that we're currently looking at, they, they fall into four different categories.

So obviously, everybody's aware of the, the search, you know, have a search chat bot or a chat chat bot, sorry. Um the search one is interesting. So as, as Lisa said, we, we're doing basically that, but we're doing it in two ways internally. We're using it for better uh aggregating our documentation and allowing people to search on it better, more intuitively and externally.

We're taking, you know, all of the content that we have about all the PlayStation games and really allowing the user to search for a game in a way that's more intuitive to them. Instead of the typical recommendation engine where we're trying to predict what games they might like.

See the LLM with all of the data we have about all the games and then allow the customer to, to ask questions and, and really pry into why this game, why they might like that game, why they might like this game. Here is the remaining transcript formatted for better readability:

Simon: And the other one, the associative reasoning is another interesting use case. So we have uh chats from our customer service agents um and our customers and we can do things like what is a common reason that people are calling in that maybe our typical metrics aren't finding out or is there something nefarious going on in that chat? Maybe they're trying to bribe the customer service agency of them access to a PlayStation account so we can do that kind of things with the uh gen AI and you know, some of the ways that we've gathered interest in those things is, is just doing a hackathon and getting some POCs going in the company and, and generate some interest that way.

So regardless of it's uh you know, AAA generative AI or, or predictive AI project KPIs are the most important part. So before you start the project, you gotta know what your baseline is. How, what, what is the baseline that we're trying to improve from? If you do the KPI, after you've done the project, it's, it's too late.

So these are some of the KPIs that, that we've used internally uh to justify our AI projects. Uh obviously cost savings, the business impact, the revenue has been the biggest driver. But things like employee retention, something you might not think about. Like if we automate some of the mundane tasks, uh the employees do, uh it'll increase their, their retention in, in some of our departments and obviously customer satisfaction from getting the answers that they want in a, in a better fashion.

And then how, how do we operationalize these, these new gen AI models? Right? So we gotta think slightly differently from our, our current predictive models. So the infrastructure is, is different, right? Uh we're used to in our current inference system doing, you know, sub 10 millisecond uh responses but that's not appropriate for, for gen AI.

So how, how do we make sure that we can scale that infrastructure out uh again, measuring the quality of the gen models. How, how do we know that they're, they're performing well against the KPIs that we, that we want guardrails are really important, obviously, not only against, you know, uh unpredicted model output, but hallucinations and, and wrong answers. How do we control for that audit and compliance audits obviously in compliance uh upgrade.

So once you put a model into production, how do you know that a, the model is still performing well and hasn't degraded? And b how do you know that another model is gonna be better than that model? How do you do that actual measurement?

How do we know that um this one should be promoted to production and then finally monitoring and learning. So how do we know if something's going wrong with the current running system and, and how do we get notified about it?

So now I'm gonna turn over to Francisco to talk about some of the uh building block that we, that we have.

Francisco: Thank you, Simon. Uh yeah. So I would like to share uh how we leverage DataRob in uh in the blast in our uh mail pipeline and PlayStation. Uh well, basically, we have all sorts of data sources in the company. And the first step of course is to prepare our modeling data set. Uh nothing to be said here. Uh the fun actually begins uh with DataRobots when uh we upload our data set. Uh there.

Here I have a use case for fraud. So I have to specify the column. I'm trying to predict. I have my label with good transactions and bad transactions. There are, but you can already identify that we are talking about uh classification problem and uh it shows us the distribution of the classes.

And from here, we have to basically click on that start button. Uh we don't really click anything. We leverage the robots, Python API to have everything automated in our jobs. But also you click on that button. Uh uh DataRobot is going to start a process which is called uh AutoPilot.

That process is like a tournament between models. It's going to start with dozens of model configurations and uh just a small portion of the available training data and as the iterations go, it's going to keep only the best performing models and retrain them with more and more data.

Uh here, I have an example of what that process looks like on the right. We have all the models that have been trained uh just to show you what one of those models look like. Uh this is what the robot calls uh model blueprint. So it's not really about the model. It's also about the other steps that a data scientist usually has to perform, like dealing with uh categorical variables with uh missing rows, performing hyper parameter tuning.

Uh all the steps that really takes a long time for data science to complete. Uh after AutoPilot finishes, we are going to have this list uh with the models uh ranked by their performance. Uh we can expect the models in more details if needed. But at some point, we're going to select a model.

Uh we leverage another feature from the robot which allow us to export uh not only the model but the model blueprint as a jar java application basically. And we do this for a few reasons. The first one is to perform what we call a bad story for our use cases. We have to simulate how the model would have behaved like in the past weeks or even months.

And in order to cope with the scale that that requires, we take that jar and we embed it in a EMR jar uh with the SPS in place, we can compute the evaluation matrix and compare how the new models uh look uh based on the existing ones at some point, either a data scientist or our automated jobs is going to figure out if a new model needs to be deployed.

And if that's the case, we're going to take that same jar that we use for bad sparring and send it to our CID pipeline to create a new version of our sparring service. When the new version is deployed, we can start sending requests from our in line decision in service. And most of the features for the model are already in that request.

But we also have some extra features that we fetch in real time. Some of those features are in and some are computed on the fly in our aggregation service with all the input in place, we can actually get the model scores. And based on our business thresholds, we will decide, for instance, if transactions should be accepted or not in the store, the decision is fed back to our data sources so we can improve future iterations of the models.

That's the overview. Uh I think uh Lisa has a few final comments, one closing slide,

Lisa: Great. I think final, final takeaways. Technical debt is an actual real threat, especially because things are changing. Um you really do need to look at how do you get an insurance policy to make sure that you're open and flexible in terms of being able to keep pace with the face of change? If you just look at the last six months, what is the next six months going to actually bring us? We don't know. So you have to be able to have that agility.

The other thing is there's a new level of preparedness that you're going to have to take on when it comes to operating and observing all of your generative AI model. It's very different data drift is easy, that's been solved, figuring out what went wrong and the answer that was provided to you, that's a lot more difficult. So it's a muscle that you're gonna have to develop.

And last, uh we heard about this before, it's cost in ROI and the value of these use cases is really going to be very important. There is a huge cost when it comes to generative AI it's not like predictive which is just time and labor. So making sure you're picking the right use cases and then you're managing those costs over time effectively is going to be key for you to scale.

If you have any more questions, I know you can, we, we can meet you guys outside, but we also have a booth 180 all the way in the far left corner. Very easy. So thank you for your time.

Thank you, Lisa Simon and Francisco. That was very informative. Thank you and thank you for joining.