Democratize ML with no code/low code using Amazon SageMaker Canvas-CSDN博客

本文链接：https://blog.csdn.net/just2gooo/article/details/134786878

Hello, everyone. Hope you are having a fun time at re:Invent. My name is Raj Singh and I'm the General Manager of SageMaker Local No-Code team at AWS. I'm joined by two of my co-speakers, David Galilei. He is a Technical Leader and a Senior Solution Architect within AWS. And Ram De Vli. He is the Data Architect with AI and BI Platform teams at Thomson Reuters.

Today, the three of us are going to talk to you about how you can democratize use of machine learning using SageMaker Canvas as part of this session. We are going to share what is new with Amazon SageMaker Canvas. We will show a couple of demos and post that Ram Dave is going to talk about how Thompson Reuters is using Amazon's Maker Canvas to democratize use of machine learning.

Before we begin the session, I'm curious to know who do we have in our audience today? Please raise your hand if you are using and building machine learning models for several years. Only a few, please raise your hands. If you consider yourself to be a beginner in your ML journey, quite a few please raise your hand. If you are a business executive and you are looking to roll out your ML initiatives within your organizations. Great, excellent. We have exciting updates to share with all of you. Let's jump right in.

At AWS, our goal is to put machine learning in the hands of everyone. And the reason for this particular goal is that we are hearing from customers, that machine learning demand in every organization is skyrocketing and there are not enough people who can build machine learning power solution within their organizations. And they are looking for low code and no code solutions as productivity tools.

Let's look at the challenges which they tell us for rolling out ML initiative in their organizations. The first one is ML experts are oversubscribed. There are many ML projects which the business teams have but they are not enough ML experts within their organization. As a result of it, projects wait for several weeks, months, quarters or up to even years to get prioritized. And some of the projects are not even prioritized even after multiple years.

And second problem is the domain experts who understand the business problem which they are looking to solve using machine learning. They may not necessarily have technical and coding skills required for machine learning. And even if they acquired some of the machine learning skills, the available tools within their organizations do not foster collaboration with machine learning experts. They tell us that collaboration is critical because they want to get feedback from experts before using those kind of models or solutions in production.

That is the reason why we launch Amazon SageMaker Canvas. Amazon SageMaker Canvas is in no code workspace for business teams to build, deploy ML and generative AI models. SageMaker Canvas provides ready to use models which are pre-trained models where you bring your data and start generating predictions. You don't have to build any machine learning model for using ready to use models. If those ready to use models are not enough, that is the time you can build your own custom models using your data.

In addition to that SageMaker Canvas supports collaboration with machine learning experts. There are various modes in which SageMaker Canvas enables collaboration with machine learning experts. One such mode is you can build a machine learning model and share that with users who are using SageMaker Studio. SageMaker Studio is an IDE for expert machine learning practitioners who prefer a coding environment to build machine learning based solutions. Canvas provides an integration with SageMaker Studio. At the same time, you can use Canvas for looking at the code which Canvas has used for building the machine learning models.

Let's look at the number and the type of problems that you can solve using machine learning using SageMaker Canvas. The short answer is a lot. Let's take a few examples, using generative AI capabilities of Canvas, you can do document Q&A, you can bring your own documents, start asking questions and querying them. You can do content generation using tabular models capabilities. You can solve problems such as customer churn prediction, credit risk assessment, using computer vision capabilities. You can do visual defect detection, object detection or text detection, using natural language processing capabilities. You can do sentiment analysis, entity extraction and using time series analysis. You can do demand forecasting and sales forecasting. So there are variety of problems that you can solve using Amazon SageMaker Canvas.

Let's take a look at ready to use models which are pre-models available in SageMaker Canvas. SageMaker Canvas offers a large selection of pre-models which are powered by AWS services. Some examples are Amazon Bedrock. Amazon Bedrock is a service for foundation models. As a service that service gets integrated, comes integrated with SageMaker Canvas. You can come to SageMaker Canvas. Click on the integration with Amazon Bedrock. You can start using Amazon Bedrock models.

Similarly, we have other services such as Amazon Textract. Amazon Textract is an intelligent document processing service where you can bring your document extract information out of those documents that is also available via SageMaker Canvas. We have natural language processing based services such as Amazon Comprehend, you can bring your data and do sentiment analysis and various other capabilities which Amazon Comprehend offers.

Similarly, we have computer vision services such as Amazon Rekognition. All of those services are available as the pre-trained models on Canvas like the way you see on the home screen of Canvas. So we'll show you a demo when it comes to the demo section.

Now let's look at foundation models. We launched foundation models in Canvas this year around October time frame and at as of the date, you can, you can use Bedrock foundation models within Canvas as well as SageMaker JumpStart models within SageMaker Canvas. As you may be aware, SageMaker JumpStart offers publicly available models such as Falcon, Flan-5, NPT, Dolly v2. And we will keep adding these models to the list.

Let's take a deeper look at various capabilities here. You can use SageMaker Canvas which is a no tool to access and evaluate various different foundation models either from Bedrock or from SageMaker Jumpstart. So you get a graphical user interface where you can select a model and Canvas offers you recommended prompts. You can use the recommended prompts or come up with your own prompts and start using them.

In case you are not sure which model to use or you may want to use multiple models and compare them side by side, Canvas also enables you to compare the model output side by side. All you have to do is select the model, different type of models, ask a question. That question is sent to all the models, we do prompt engineering behind the scenes and Canvas gives you an output which you can see side by side.

In addition to these capabilities, Canvas also offers a no-code rack solution. Using no-code rack solution, you can extract insights from documents using generative AI. All the comparison capability that you saw on the previous slide, you can use that even for your document insight extraction needs.

When you upload your documents to a vector database, as of today, Canvas supports Amazon Kendra, but we'll keep adding more and more selection and different vector databases. Here, all you have to do is to index your documents to your vector database and point Canvas to it. And once you point Canvas to Canvas to it, the foundation model capabilities of interactive usage and conversational interface gets enabled by default just by a flip of a button.

So these are the capabilities which were there even before re:Invent. At re:Invent today, we are very excited to announce the fine tuning of foundation models within Amazon SageMaker Canvas. So you can pick a model which can be fine tuned from Amazon Bedrock or from SageMaker Jumpstart. You can select those fine tune, select those base models and bring your your data which are labeled examples and use the fine tune button on SageMaker Canvas to just fine tune those models.

Canvas runs different fine tuning jobs behind the scenes and gives you various different model evaluation metrics. In addition to that, it's not only the performance metrics, it also gives you the metrics for responsible AI - toxicity, bias and things like that. So these are the capabilities which help customers evaluate different type of fine tuned foundation models for their needs.

You also get access to model leaderboard. You can take a look at the model leaderboard, compare different type of evaluation metrics and make a decision for using the model that you have already fine tuned.

So with that, we will move to the demo to see this capability in action. And I would like to invite David to come on stage and show us the demo.

Thank you, Rajesh. Awesome. Alright. Very happy to be here with you all to present to you one of the features that we have launched right now at re:Invent like a couple of hours ago. So really excited to be walking you through a demo of this.

So let me just go ahead and move my mouse over to the other screen. Cool. So in this demo, what we're gonna be seeing is how to use the SageMaker Canvas generative AI uh interface and chatbot to actually solve a business use case. In this case, what we would like to use it for is to get some investment recommendation.

Imagine being someone that works for investment fund of course, as we will see by default Canvas will use the knowledge of the model itself to try and generate or elucidate an answer. But these results might be generic. So what do we do in that case, one of the solutions that we have is to go ahead and fine tune the model.

Now, this is done in a no-code fashion in SageMaker Canvas. It is possible to just click a button, create a new model that we want to...

We want to train. Of course, provide a data set that we want to use. In order for training. In this case, I have locally a CSV file available on my own computer which I upload here into the SageMaker canvas. I, but you can connect to 50 plus data sources. And of course, this has to be a data set which is made of a prompt completion pair. And when I say prompt completion pair, I mean the input prompt that you want to give to the model and the output that you expect out of the model.

Now Canvas is gonna load this data set into memory. And then what happens is that I go ahead and select this data set and I all set, I don't need to do anything else. Of course, aside from deciding which models I want to train, it's one button away from going ahead and fine tuning a large language model. Trust me, there is no easier way than doing this.

All right. So you go ahead and select which models you want to use and you have a list of models available from both Amazon Bedrock and from Amazon SageMaker JumpStart. Once you have chosen the ones that you fancy, in this case, Titan Dolly and Falcon, you go ahead and select the input and output column that you want to use. If you want to be fancy about it, you can go ahead and select the different hyper parameters for the tuning, but this is not required. But if you are someone who is a little bit more expert, you maybe wanna control that, maybe you want to control the data split.

Once you're all set, you're one button away from fine tuning your model. Now Canvas will take a little bit of time. Of course, we don't have to wait two hours today just for the model to be trained. Luckily, we have it already here available and you can check the performances of this model by analyzing a perplexity curve, a loss curve. The hyper parameters used the artifacts that were generated and all the metrics that are associated not just with the best model, but with every model that was trained as part of the fine tuning.

So everything becomes from a black box into a white box because you have ability to use all of these models and you can just see the performances, but also go ahead and test these models in the same UI that you've seen before. Interestingly, because we have to deploy your own custom model Canvas will deploy that for you. And if you're not using that model, it will shut it down automatically in two hours.

So let's go ahead and compare the original performances of the model with a new fine tune model with a similar question to the one that we asked before. So what are the recommendation of investment for this specific profile? And as you can see after fine tuning the model, you can actually see here on the left that the results are customized, they are exactly what we gave them as part of the training data set and they follow the same kind of format that was available in the training data set. While on the right side, you can see the original generic results. Of course, those are not necessarily wrong results. There are very interesting, let's say inputs, but maybe those are funds that we don't offer in our list of funds for this specific use case or maybe we just wanna have a more short and more compact answer so that we can use it downstream with other services.

All of that is really easy to achieve through fine tuning of a model and it's even easier to do that when we do so without writing any single line of code, right, Rajneesh, back to you.

Thank you, David. All right. So you saw an example of how easy is it to fine tune a foundation model using SageMaker Canvas. All the fine tuned models are essentially the custom models that you have built based on your data set. Now, let's look at custom model building capability and dig a little deeper.

So building a custom model requires you to prepare data. That is the first step Canvas offers connect us to 50 different type of data sources because your data may reside in various different data sources. All you need to do is select a connector here provide that authentication information and Canvas connects and pulls your data within Canvas interface. Canvas offers connections with Amazon S3. Amazon Redshift Snowflake Salesforce Databricks and many other providers.

Once you pull your data in Canvas post, that Canvas enables you to extract data insights. These data insights are powered by machine learning and the and the way these data insights help customers is looking at these data insights. Customers can make decisions if they want to modify that data transform that data to build machine learning models. In addition to that Canvas softwares built in visualizations, visualizations such as correlation metrics, visualization, such as bar charts, scatter plots and many more at the same time.

Once you get an idea about your data, which you have imported Canvas also enables you to use 300 plus built in transformations to modify and transform the data so that you can build machine learning models. One of the transforms here is a custom transform. A custom transform allows you to write a snippet of the code. And you can use that snippet of code within Canvas and use it for building either your data pipelines or preparing data so that you can build the machine learning models.

We are very excited to announce a visual way of preparing data in Canvas. This is a feature that we have launched at re:Invent. It gives you a visual representation of the steps that you have taken for preparing data. Whether you want to take this data and build a data pipeline or whether you want to take this data and build a machine learning model out of it. We are going to see more of it as part of the demo.

Another feature that we have launched at re:Invent is preparing data using natural language using natural language. You can ask variety of different questions. Those questions could be as broad as show me the data quality issues in my data Canvas is going to invoke the right APIs and functions to extract that information for you. You can also run different type of visualizations using the natural language query interface. This is all integrated in the visual DAG that we saw in the previous light.

Great. So once the data is prepared, what next? After data is prepared, you can build and evaluate different type of models that Canvas can generate for you. So all you have to do is you need to choose a model type. There are different type of model types that Canvas supports. The first one is for predictive analysis, it supports problem types such as classification, regression, all time series analysis. You can also do text classification, you can also do image classification. You can also do fine tuning of foundation models. The example that we saw in the previous time.

Once you select the problem type and you point Canvas to your data, you can just by click of a button, you can build variety of different models behind the scenes. What Canvas does is it uses auto ML capabilities and it picks up various different algorithms for you and trains different type of models and picks up the best model for you. It also enables you to select or override that model selection in an auto ML fine tuning process.

You also get access to model leader board in case you want to override the best selection that Canvas has made because of the reason that hey, you don't necessarily worry too much about accuracy. But inference latency is more important for you. You can make that selection and override those selections you think Canvas.

So once model is built, Canvas enables you to generate highly accurate model predictions, there are four different type of patterns that Canvas supports within the Canvas app itself. You can do what if analysis you can change the value of different features and generate a prediction right there. You can also bring a batch of your data, upload that data and generate the predictions right there. The second pattern is automating prediction if you are happy with the model and you want to automate this prediction generation when your new data dump is ready, that is also you can do in Canvas directly from the same interface.

The third pattern that we support is one click model deployment. In case you want to use the model which you have built using Canvas for a programmatic access to integrate with yet another application, you can do that using one click model deployment Canvas invokes a SageMaker hosting endpoint and that endpoint is up and running.

In addition to that Canvas also enables you to share these predictions with tools such as Amazon QuickSight. You can also take the model which Canvas has built and transfer over those that model to QuickSight such that you can do predictions. In QuickSight, you can do dashboarding visualizations and various different BI analytics needs.

So once the models are built and you want to collaborate with pro code users who are using tools such as SageMaker Studio Canvas also enables you to do so. It comes with integration with SageMaker Studio, you can share model artifacts with the SageMaker Studio users just by one click. You can also integrate the models that you built using SageMaker Canvas with your MLOps processes. And the way you do it, you take the model, registered it with the SageMaker model registry. And from there onwards, your MLOps team can pick it up and integrate with your MLOps pipelines. This is all done with a few clicks.

Now we looked at various capabilities of Canvas. We hear from many of our customers that hey Canvas is a feature rich, no code AI application and a workspace. And I don't necessarily want to give permissions to all my users for all the capabilities that Canvas offers. So Canvas also enables the administrators to restrict those permissions. It also allows you to do single sign on set up such that users can use the single sign on capabilities within the organization, just click of a button, they can invoke Canvas. They don't have to come to AWS console for using Canvas.

In addition to that, we have also launched several capabilities to automate shut down because what Canvas is doing behind the scenes, it is procuring an instance for you and keeps that instance available for you till the time you need it. Your data stays on that instance. It is part of your VPC. Your data does not leave. So some users have told us they would like to automate, log out especially the administrators. So we have launch idle metrics within SageMaker Canvas which enables you to do cost management using automatic shutdown.

So with that, let's take a look at demo on how you use custom model capabilities in Canvas. So I would like to invite David back again on stage to show us a demo.

Let's go ahead and get started. Whenever we start with a use case, it always starts with the data. Whenever we want to import data, we can choose whether we want to get new data from 50 plus different sources or select an existing one. Here is a list of the 555 precisely available data sources ranging from Salesforce to SAP to S3, to Redshift, etc. Just have fun and find some of those on your own. Or for example, you can use the sample data sets that are already available in Canvas.

In this case, what we do is that we select two of the pre-existing data sets that are available in Canvas and they all talk about a use case where you're trying to predict whether a specific customer given their information and their demographics, whether they're gonna get a loan or not. So it's a very typical FS use case if you work for a bank or if you have applied for a loan for a bank, you know exactly what I'm talking about.

Once the data is imported, then you go ahead and you prepare the data. Now some transformation may include standard transformations like dropping values replacing missing values, detecting outliers or like in this case, joining two different data sets. And as you can see everything in Canvas is very straightforward, everything happens in the UI, everything happens without writing a single line of code.

In this case, you can see me pulling two different sources joining them on a specific ID and then going ahead and creating a third data set that comes out of the two. Now a very good practice as you start off with a new data set is to run what's called as a data quality report. Think about it as having insights into what your data set looks like.

So according to the kind of problem that you're trying to solve, then you want to extract information such as do I have too many missing values, do I have enough information so that I can actually predict what I'm trying to predict here. And as you can see, it's a couple of lines, so a couple of clicks away and very soon you start that information like data set statistics, the most important priority warnings, like in this case, the quick model score, which I'm gonna come back to in a second as a very low score.

But generally what you do get is that you get a lot of information about how your data set is behaving and what would happen if you were to go ahead and train a model based on that data set. Now, some other information that are available in this data set report include the number of missing values as you can see the top priority warning, the number of duplicate rules if there's any anomalous examples, of course, you know, best for your use case.

So we suggest some anomalies that we find in the data statistically, but that might not be necessary for your case. We also train a very small model behind the scenes. It's a quick model which allows you to have a baseline information about the accuracy of said model. Note that the data quality report is always there and it's always part of your data flow.

Now today at re:Invent, we announced Chat for Data Prep, think about it, taking data prep to the next step. Think about how everything these days it's happening through a chatbot interface or through natural language. And we wanted to make sure that data preparation also follow the footsteps.

So the idea here is that you have a chat interface against which you can start asking question. "Hey, what do you think of the quality of my data?" or "Hey, what transformation should I do?" or "Please do this transformation!" Always remember to say please to your AI overlords, right?

So here the idea is that whenever you have a step that you like, you can just go ahead and add it. And as you can see on the right, the step gets added to the data flow. Now, if we go back to the previous visualization of the data, you will see all of the additional steps being built as you go and all of them, they record, they provide some code so that if you want to go ahead and change the code that is suggested by the algorithm, then you can go ahead and do that.

Now, here's another example, dropping two columns, the ID columns. As you can see, it's very straightforward code, nothing too fancy. But for someone who has never written a line of Pandas in general in Python, this might be new. Or what about something a little bit more advanced, like plotting a scatter plot? I can tell for sure how much time I've wasted on the documentation of Seaborn and Matplotlib.

Now, I don't have to do that anymore. All I have to do is just ask Canvas, "Hey, can you plot a scatter plot for me?" Just tell it what you need to scatter plot. In your case, send the query away and in a matter of seconds, you will get all the plot.

Now, here I even ask you to color by loan status so that whenever this is done, just take a couple of seconds and the code creates the scalar plots here. Of course, this data set is fake. So don't look too much into the actual values that are available there. But you can see that the plot was generated, you can zoom in directly in Canvas. And most importantly, you can go ahead and download this plot so that if you wanna make it part of your PowerPoint presentation or for example, you want to use it for some downstream use case, you can absolutely do that.

Now, let's take it back a step and go again through the other data preparations that we can do. Of course, you can do data preparation automatically through the natural language interface or you can do it manually where you can choose among a list of 300 plus data preparation steps that are available.

In this case, I'm using it very simply to drop a couple of missing values. Nothing too fancy, you know, just to improve the performances of my model. But of course, I can go ahead and do many more complicated transformations like replacing specific values, vectorizing specific columns. Why not one hot encoding specific columns? For those of you who are familiar with what one hot encoding is.

Or if you're working with time series, we also have a lot of data preparation steps which are specific for the time series datasets like resampling and imputing missing values.

Data quality report is one of the available analyses. Now we have a whole different list of available ones including feature correlation, multicollinearity. One that I particularly am fond of is the target leakage. Sometimes, especially when people who have a little bit less experience with machine learning, they start doing machine learning, they use their actual target as a predictive feature.

Now, if you're a data scientist, you know that this is a big mistake, but this is where target leakage can come in and explain that. For example, "Hey, maybe you're giving away your information with one of those features."

Of course, building a data flow is not the end, right? This is part of your data preparation. Once you're done with building your data flow, you can select where you want to export your data. For example, in this case that I'm using here, I'm using a dummy bucket to export my CSV file in an Amazon S3 bucket. But of course, you can imagine this being very useful when you want to build downstream workflows.

And the destination is also over there. Once you have set up a destination, you can create a job that will generate this transformed dataset. Imagine that here in this app, you're gonna be using not so big datasets, but once you want to scale out horizontally to multiple instances to bigger instances, you can use terabytes of data all at once and even schedule this data preparation.

If you're happy with the data preparation, the next step is to go ahead and create a model. It's a single button away and all you have to do is click the Create Model, give a name to the model that is gonna come up. Like in this case, we are training a predictive analytics model, which is a tabular dataset model, but we are also supporting time series forecasting.

We can use images, we can train a computer vision model, we can train an NLP model and the process is gonna be the same for all models - select your target, configure your model, and finally just launch the training.

If you know more about your data science, you can go ahead and configure even which kind of algorithm specifically you're gonna go ahead and train - CatBoost, XGBoost, neural network, search, etc. You can configure the data split, you can configure how much time you want the training to run or you can just leave everything as default. That's the beauty of AutoML. You don't have to configure anything.

Once that is done, you go ahead and train your model, either a quick model or a very performant model, which we call Standard Build. Canvas goes behind the scenes and validates the performance of the model on the dataset, sees if there's too many missing values, if there's something that doesn't work, if your time series are missing points, then it goes ahead and kicks off an AutoML process.

Now, this AutoML process can take up to 45 minutes. If your dataset is smaller, it can be less. If your dataset is bigger, it can take more, let's say on average it's 45 minutes, maybe half an hour.

Once the model is trained, you have the accuracy metrics, you have information about the column impact. You have additional advanced information like the ROC curve plot of the predicted values versus the actual values. You can go in and see exactly which metrics have been generated during the process and even take a look at a confusion matrix if you really want to get into the details, not just for the best model, but for every single model that was trained as part of the AutoML process.

And not just every single model that was trained, but also every single ensemble of the models that was trained as part of the process. And if you specifically like one of these models, for whatever reason, maybe the inference latency is lower, maybe the one of your preferred metrics is higher, you can even go ahead and choose that one as the default model.

Now the final step is to go ahead, take this model and use it to generate predictions. One of the nice things is that directly in the Canvas app, once again no code needed, you can go ahead and generate a prediction on a batch of data. Imagine having incoming CSV files all the time, you can either predict on them manually or schedule a job, whenever a CSV file is generated, to trigger predictions.

When you generate predictions, those predictions can be easily seen in Canvas and they can also be sent directly to Amazon QuickSight. So if you want to enrich an existing dashboard, a BI dashboard, you can go ahead and specify your favorite QuickSight user and just send the data back to the QuickSight user.

Alright, so let's take a deeper look at all the different predictions that are available and all the different metrics that are available. As you can see here, we are running through once again all the different content that is available.

Another option that we have here on the table is to use the model directly in the Canvas app to generate predictions on a one-by-one basis. In fact, we can go ahead and change each of these values and see by changing each of these values, what it does in terms of generating the single prediction.

So as you can see here, I'm one by one changing each of these and trying to see, okay what if I change this value, what if I change that value, how does that impact the generation of my prediction? And as you can see from this example, this will change a little bit the percentage of accuracy for a specific value that we're looking into.

So in this case, it's still gonna get the same prediction, but the confidence score of that prediction is changing one of the values. Once we are fine with those predictions, then we can go ahead and deploy our models by deploying our model.

I mean one of four different patterns.

The first one is what you're seeing right now - going ahead and deploying the model to a SageMaker endpoint. When we go ahead and deploy the SageMaker endpoint, we choose which model we want to use, we choose which instance we want to use, we choose how many instances we want to use.

And as we can see after a little bit, we have our model which is available here in service and we have our deployment URL which our developers can go ahead and use to generate predictions.

If we want to look into what kind of code can be used to call this endpoint, it is also available as sample code inside SageMaker Canvas. You can also go ahead and update the configuration associated to that endpoint in case you want to change it.

Now, other possibilities in terms of production patterns include adding to the model registry. So this is really interesting because it allows you to specifically add a model to a model registry and provide all the information associated to that model to the model registry.

So that an ML Ops engineer can just pick up the artifacts, can pick up the container associated to the training of the model and can pick up basically every information associated to the process of training the model and use those as part of an MLs pipeline.

Now, this is great because if you already have an MLs pipeline set up in your own account, then this sits right inside that because model registry acts as a buffer between the training process that happens in Canvas and the deployment process that happens with the ML Ops pipeline.

We already discussed about sending to QuickSight, we can send predictions to QuickSight, but we can also send models directly to QuickSight.

Another option that we have is to view the notebook associated with the AutoML process. And as we can see here from the demo, basically what happens is that you will get to see the full notebook that was used during the generation of the candidates. And you will see all the code that we are running behind the scenes for you in order to make you, to get you to those OML candidate definitions.

Exactly controlling which hyperparameters you're using, controlling which models you are about to go train. And that's basically it, you have access to all of the white box, all of the magic that happens in AutoML with the code. If you want to take it to that, to that point.

Alright, with that, we have covered basically everything that there is to know in terms of custom model training for Amazon SageMaker Canvas. I hope that the demo was nice, that you had fun and that you learned something new.

And one of the things that I like the most about Canvas is how it makes it possible for everyone and literally everyone to start getting into machine learning, even the people that don't have any machine learning knowledge.

I was happy to see that a lot of hands were raised before, before, when we asked how many people don't have a machine learning knowledge? One great example of that will actually come in a second here. And in fact, I want to invite Radev to the stage to talk a little bit about how Thomson Reuters actually uses SageMaker Canvas for events and even in their daily production systems.

Thank you, Rande. Thank you, David. My name is Radev. I am a data architect on the AI/ML platform for Thomson Reuters.

So a brief context set, Thomson Reuters is a content driven technology company that develops products, serving media, legal tax and compliance professionals. Its history with AI goes back to the early 1990s where as a company, it was amongst the earliest to develop a commercial application called Westlaw Natural Language, which is a legal research application incorporating natural language processing for search.

Since then, Thomson Reuters has increasingly incorporated, leveraged AI capabilities to improve its product offerings through techniques like named entity recognition and class resolution, classification and recommendation within its products.

With this organic growth of AI technologies being included, it became imperative that we build some kind of standardized process to enable the use of best practices across the business functions and to also allow for different personas across the enterprise to be able to do that.

And as you can see over the period, we have been doing several things and lately we have been acquiring companies and so we have to kind of consume their processes and kind of come up to a standard way of doing things with all of the advancements which have happened over this past year.

We have seen an increase of interest in developing solutions among citizen developers. We have enabled access through the AI platform to a variety of low no code AWS services such as SageMaker Canvas, Amazon Kendra and SageMaker JumpStart empowering users to experiment and build solutions with these tools.

While securely using Thomson Reuters data within only a few clicks the platform abstract away the complexities of security and infrastructure provisioning from the user so that the user can focus on the data science problem.

As you can see on this diagram today, the AI platform that was initially tailored to data scientists is empowering novice AI novices as well. Its services span the whole model life cycle from secure access to data, to annotating and labeling the data, to secure access to AWS services and experimentation tools to registering models in a central model registry which allows for governance of the models and then deploying it into a production environment and monitoring it for observability.

A key to its design is the flexibility that it offers through the various microservices, enabling it to swiftly integrate the latest generative AI frameworks and the ability to separate between customer data and the operational data. While enabling interaction across accounts, which provides various teams with the separation that they need. While at the same time encouraging collaboration and reusability of the models.

Through its central governance solution the team is continuously evolving the AI platform to empower low low code/no code generation of the generative AI through generative AI chains, the AI platform, as you can see is broadly categorized into two phases - the development phase and the operation phase or the production phase.

The development phase comprises of the data service, the annotation service and the workspace service. The operations phase consists of the deployment service and the monitoring service. The bridge between them is the model registry which enables the models to go through governance and risk assessment and everything else.

This is an example of how a user goes about creating a secure space within the AWS account to start building AI solutions. One only needs to fill out a few key elements and the platform takes care of provisioning a secure environment in which the user can then get access to data, spin up EMR clusters, these two instances or SageMaker Studio or SageMaker Canvas instances just through a few button clicks.

Users can also collaborate with others, other scientists or other users in the same space. This architecture allows the AI platform to standardize approaches in securing access to the services as well as to the data.

Over this past year, we have been conducting multiple hackathons across the enterprise with each hackathon comprising of real problems that are being addressed within the organization. The purpose of the hackathons was to foster innovation and allow the users to explore AI solutions for problems. While at the same time introducing the AI platform to the users across the enterprise.

In May of this year, we conducted a hackathon that was very specifically focused on the use of SageMaker Canvas. It comprised of, we posed three different problems - a call center forecasting problem to manage resources during peak seasons, a time series forecasting problem, a conversion prediction to predict the likelihood of purchasing a subscription after a trial where consumers might subscribe for a trial and then see how well they would do to convert it into an actual subscription, a binary classification problem. And then finally, a customer profiling problem to uncover customers activities and insights and patterns which is a clustering problem.

So different personas participated in these hackathons ranging from subject matter experts to data scientists and researchers to data analysts, to MLps engineers, to software engineers with no AI/ML background.

The feedback we received was very positive and indicated to us that these services we are exposing through the AI platform were indeed welcome and were required. As you can see the some of the users who have never used the platform started using SageMaker and it became the go to tool from there on.

So looking deeper into one of the use cases that actually won the hackathon, the team proposed a solution developed in Canvas for managing resources, customer call center resources during peak seasons. The challenge was to provide a more cost effective solution and optimal solution than what humans were able to do.

The team comprised of a storage engineer, a business analyst and two software engineers, all users who had no prior experience with AI techniques.

So the problem basically consists of trying to access data in a particular source in Snowflake. So the users basically created a custom, as you saw earlier, Snowflake was one of the data sources that Canvas could connect to, and so they connected Canvas to Snowflake, got access to the data that they were supposed to use for the hackathon and start analyzing the data.

They then proceeded to perform feature engineering, not that they knew that it was feature engineering, but they did feature engineering. And then trying to identify the relevant features and identifying patterns that impacted the call volume.

Using the custom model training that David just demonstrated, they proceeded to create a custom model, custom prediction model. The AI platform's integration, the governance solution were also leveraged to get the model registered.

So what he noticed was that in David's demo, the user could go in and register within the model registry and then approve the model to trigger downstream workflows even though within the hackathon that was not really leveraged. The model was indeed registered into our central model registry. And which kind of proved to us that this is a possible route for users to go through when using Canvas.

The ability - so basically this, the, once they did finish training the model and registering the model, they then deployed the model and did some predictions and the predictions basically just ended up winning them the hackathon.

The process enabled the team to learn about solutions AWS has provided for addressing and solving AI problems which they were not aware of before. They learned how to implement an ML model, understood how to process and treat the data. They came to the realization that it is relatively simple to train a machine learning model and that this process was indeed novice friendly.

So over the course of this year, with the multiple hackathons and everything else, it has been a great working with the AWS team to get their support and we were able to provide feedback to them in terms of these are the features that we need from an enterprise perspective in terms of how do we secure, how do we balance certain processes so that we are kind of standardizing all of that at the enterprise level. And we are looking forward to the same level of collaboration in the years to come.

So that's it. I give it over to Rajneesh and thank you.

Thank you, Radev. This is such a fascinating story where folks who are not even machine learning experts, they won the machine learning hackathon. That is the power of tools such as SageMaker Canvas.

So if you want to have the same type of examples within your own organizations, and you want to use SageMaker Canvas, you have access to a variety of resources.

So the first one is a course on Coursera which is a course on Coursera which is there to help customers understand how they can translate their business problem into a machine learning problem. Because that is a feedback which we hear from customers, which is domain experts are familiar with what is the business problem they want to solve and how do they translate that business problem into a machine learning problem? So that is what pushed us to create a course on Coursera. It is available for all of you.

The second reference is a hands on lab. A hands on lab is also available. It is a self paced lab. There are a bunch of different examples which are available. You can go through those examples as we talked about, there are sample data sets which come packaged with Canvas and using those sample data sets, you can build variety of different machine learning models and get a feel of this application with that.

We would like to close the session today, but we would leave you with a couple of sessions we have planned for tomorrow if you want to further deep dive into various generative AI capabilities including fine tuning of foundation models as well as natural language way of preparing data.

I would encourage you to attend a session which is called AI-339. It is available in Caesars Forum on November 30th which is tomorrow from 2:30pm to 3:30pm.

There is another session in case you want to look at variety of different demos and various other examples. We have a chalk talk for SageMaker Canvas. It is also there tomorrow from 4pm to 5pm at Mandela.

So if you have your peers who are looking for similar kind of capabilities and you found today's session helpful, please let them know so that they can also benefit from the sessions that we have planned for all of you.

So with that would like to thank all of you for your time today and thanks for listening and I look forward to all of you using SageMaker Canvas.