Scale complete ML development with Amazon SageMaker Studio

Hello everyone. Thank you for coming on a Thursday afternoon to Manley Bay. Um my name is Sumit Harker. I'm a senior manager of product management for Amazon SageMaker at AWS. And today I have with me, my colleague Giuseppe Porcelli who is a principal solutions architect at AWS and Mark Neumann, who is a product owner for the machine learning and AI platform at B&W.

Together we are going to talk about how to scale your machine learning development using SageMaker Studio. Thank you, gentlemen.

Ok. So before I begin talking about how to use SageMaker Studio to scale your machine learning development, I wanted to do just a quick recap.

SageMaker Studio was launched back in 2019 and it was launched as the first fully integrated development environment for machine learning. It brought together everything that you need for ML into one single unified visual interface, which means you have purpose built tools for every step of the machine learning workflow accessible within the same visual interface.

So tools like data labeling, feature engineering, building models using code editors and then for tuning, deploying and managing the end to end machine learning workflow - you can find the tools for every step in the same visual interface. This also means you can quickly move back and forth between these steps, make changes, observe the outcomes and iterate fast to quickly roll out your new machine learning capabilities in your organization.

And today, tens and thousands of customers are using SageMaker Studio which gives us a highly privileged opportunity to work with these customers to understand the trends that impact the machine learning developer experience. And I'm going to discuss some of these trends.

So one of the first trends I want to talk about is just the increasing pace of AI adoption. One of the key indicators of AI becoming pervasive in our economy is just the growth in AI investments. The AI Index report published by Stanford University early this year estimates that the global corporate investments in AI grew by a staggering 13 times over the last decade and the trend is not stopping, it's going to accelerate further.

Because of generative AI, what it means for developer experience is that developers are asking for tools which are highly performing and makes them more productive in quickly getting from data to model to insights.

The next trend I want to talk about is the proliferation in data and AI specific job roles and specializations. Given that machine learning is so complex, it's highly unlikely that one person possesses all the skill sets needed to develop and deploy a model.

For example, a data engineer might specialize in pulling together data from multiple different sources, integrating them together and then transforming them into a form and shape which is useful for analytics and machine learning. While a data scientist may specialize in doing feature engineering on top of this data and then choosing algorithms to build an experiment with models.

And then a machine learning engineer might have the skill sets to take this model, deploy it into production and automate the process of deployment and retraining the models at a cadence.

Given this rapid proliferation in AI related roles, more and more machine learning developers are asking for tools which are purpose built for their specific task.

And the third trend I want to talk about is the continued impediment in taking your machine learning models from prototyping to production. So a Gartner survey estimates that only a little over 50% of the models go from prototyping to production.

Now the tools, the skill sets and the processes for building a model are substantially different from that of deploying the model and developers are asking for tools which are more self-serve. So the model builders can quickly take their prototypes into production to make it easier for all of us.

I'm going to summarize these key trends in the form of four Ps.

The first two is performance and productivity. As I said to keep up with the pace of AI adoption, developers need tools which are highly performing and make them productive by removing any undifferentiated heavy lifting out of the way.

The next P is preference - given the proliferation of data related roles, developers need tools which are purpose built for their specific tasks.

And finally production - model builders need more self-serve tools to take their prototypes into production.

And in order to address all these four Ps, the four key pillars of the developer experience, I am super pleased to announce the launch of next generation of SageMaker Studio that comes with a blazing fast start up experience, an even broader selection of code editors, AI based assistance for all your ML tasks and new tools to automatically convert your machine learning code into pipelines.

Some of these tools were launched in our leadership keynotes over the last two days, with the latest one being the code editor which was launched in Swami Sivasubramanian's keynote this morning.

We are going to do a deep dive on each of these innovations one by one. First, let's talk about the performance on how the new blazing fast start up of Studio meets the performance standards of a modern day machine learning developer.

We have done a couple of things to improve the start up experience. So first of all, the initial setup of Studio has been significantly simplified, which means in one single click you can now set up a SageMaker Studio domain with IAM based identity and access management, fully managed connectivity to public internet and underlying compute and storage for building your machine learning models.

Setting up Studio is now literally as simple as flipping a switch to turn on a smartphone.

Now, once you have set up the Studio, you can launch Studio with a simple click and it loads up the application in your browser in less than five seconds, giving you instant access to all the tools in your browser.

Now, one of the first things you might want to do is to start writing code and for which you want to spin up a code editor. So what you do is you go and you create a SageMaker space. A SageMaker space is like a workspace in cloud that stores your compute, storage and runtime preferences for your development environment.

The space comes with many capabilities. For example, you can stop a space, come back to it later, restart it again and resume your work from where you left off. All your work is automatically saved on the underlying S3 storage. You can also scale up and scale down the compute and the storage of the space with a simple click.

So for example, you could start a space on a lightweight CPU instance when you are doing data exploration and then immediately flip to a beefier GPU instance as you begin your model training. The space also comes with several capabilities to share and collaborate artifacts with your team members.

You can attach a shared file system like Amazon EFS onto a space to share your code, datasets and artifacts with your team. You can even create a shared space in which you and your peers can come together and review and co-edit the files in real time.

Once you have the space set up, you can launch an IDE such as JupyterLab or RStudio from the space. The JupyterLab loads up in your browser in a couple of seconds and it comes preconfigured with SageMaker distribution.

SageMaker distribution is a data science and machine learning runtime, which comes pre-configured with several popular data science and machine learning packages. So you can quickly get started with writing your machine learning code and we are going to give you a demo of this entire experience very soon.

But however, I'll show you a couple of slides about how the visual experience looks like. So this is the home page of the new SageMaker Studio. You can see on your top left, there's an Applications Gallery from where you can pick up a code editor of your choice.

Let's say you pick up JupyterLab. It immediately opens the model in the center of the screen to help you create a JupyterLab space. All you need to do is to select the compute for your space, storage for your space and the runtime. And you can see defaults for each one of them where the default for runtime is SageMaker distribution.

Once you have made these choices or reviewed these choices, you can quickly launch the space and get started with writing your code. You can run multiple spaces in parallel each with a different combination of compute, storage and runtimes if you are working on multiple different projects or multiple different experiments at the same time.

You can come back to this page, review all your running spaces, terminate a space, stop a space, resume a space - you can do all of that from this one central place. And as I said, we'll walk you through the demo of this experience very soon.

Let's talk about the next P - preference - where developers are asking for their preferred tools for their specific tasks. As you saw, SageMaker space comes with support for both JupyterLab and RStudio, but our customers have been telling us they need more choice. They need IDEs with a much more robust support for debugging, refactoring and deployment tools.

And one of the IDEs they have been asking us for is Visual Studio Code. So I'm super excited to announce the launch of new code editor in SageMaker Studio built on top of the Visual Studio Code open source. This is built by AWS and it uses the Visual Studio Code open source code, which means all your familiar VS Code shortcuts, terminals and debugging and refactoring tools work in code editor.

It also comes pre-installed with an extensions marketplace powered by Open VSX from Eclipse Foundations, which gives you access to more than 3000 VS Code compatible extensions. And one of the popular extensions pre-installed in code editor is AWS Toolkit that gives you an easy access to many of the AWS services like Lambda, S3, Redshift, CodeWhisperer.

So you can quickly get started with building and deploying applications on AWS. And finally similar to JupyterLab, the VS Code runs on a SageMaker space. So you get all the benefits of the space - you can attach a shared file system, you can resize compute and storage on the fly. It comes preconfigured with SageMaker distribution so you can quickly get started with writing code.

Ok. Now with the broadest choice of IDEs available within SageMaker Studio, your machine learning workers can come in and pick and choose a tool of their choice for the task at hand. As an example, a data scientist might come in and pick up JupyterLab from the App Gallery and experiment tracking tools from the Tool Gallery and then begin building and experimenting with models.

While a machine learning engineer might come in and pick up the VS Code from the App Gallery and a pipeline tool from the Tool Gallery to start deploying and automating model pipelines.

Next, I'm going to talk about the third P - productivity - where developers are asking for more efficient ways to get from data to insights. The new SageMaker Studio comes built in with generative AI based assistance for all your ML tasks.

Your editors like JupyterLab and VS Code, both come pre-installed with automated code generation powered by CodeWhisperer, which means you can now generate Python and data science code by simply adding a comment in your code in plain natural language.

You no more have to go about spending hours in looking up code samples from the internet, debugging those samples, remember API syntax - you can just get started coding in the IDE itself. In our internal studies, developers who use CodeWhisperer were found to be 57% more productive than the ones who didn't.

The IDEs also come built in with an AI based chat companion. So you can have a natural language conversation with the chat companion to do tasks beyond code generation. So you can ask for advice, for example, to explain a code sample, to debug an error or to even ask recommendations to refactor your code.

The chat companion is very powerful and highly customizable. You can pick and choose the foundation model that runs and powers the chat companion. So for example, you could choose to run Claude v2 model developed by Anthropic available on Amazon Bedrock. Or you can choose to run Stardust developed by Hugging Face available on SageMaker JumpStart model hub.

The JumpStart model hub comes with now more than 150 open source models for multiple different tasks. And all of those models can easily be deployed from JumpStart interface onto SageMaker endpoints. And from there, they can be used to power the chat interface so you can play with those models right in your IDE. It's a pretty fun experience. So I would definitely encourage everyone to give it a shot to try out different models in the IDE itself.

Next, I'm going to talk about the final P - production - on how to build tools which enable model builders to self-serve, get their prototypes into production. Here, we have done a couple of innovations.

We have significantly changed the SageMaker Python SDK, we have introduced a new set of decorators in Python SDK using which you can annotate your Python code and automatically run that code as a job on SageMaker. The decorators automatically determine the runtime needed to run the code as a job based on a requirements file or a YAML file that you provide at runtime.

What this means for you is you can take a code that you have written on your laptop and have been debugging for a while and you are satisfied with the code and quickly scale it out to a beefier instance on cloud without making any code changes just by adding decorators. And this works not just for one Python function...

If you have multiple Python functions in your code, where each function is designed for a specific task, you can stitch them together into a pipeline on SageMaker. All you need to do is annotate those functions with a @step decorator in Python.

If you're a notebook developer and want to run your notebook as a job, it's even simpler in SageMaker Studio. You go to SageMaker Studio, choose the notebook and you can create a schedule for running this notebook as a job. SageMaker automatically triggers the job, determines the dependencies which were used in authoring the notebook, and then uses it to create a container and run this notebook as a job on SageMaker. It pushes all the outputs of the job back into SageMaker Studio for you to review and monitor.

You can even orchestrate multiple such notebooks in the form of a pipeline. You can actually create a DAG of many of these running notebooks. So for example, you might have a notebook which takes in data from S3, performs some feature engineering, another notebook which takes these features and performs hyperparameter tuning, while another notebook which takes model candidates and does model evaluation and generate evaluation reports. You can stitch them together into a pipeline using the new SageMaker Python SDK.

Finally, we have dramatically simplified the model deployment process. Now you can deploy a model onto SageMaker in just two simple steps. The new ModelBuilder class in SageMaker Python SDK allows you to pick up a model artifact and call out a function model.build() where SageMaker automatically determines all the ingredients for deploying this model onto an endpoint. This includes determining what inference server to use - could be a Triton server, could be a TorchServe server depending on the type of model artifact.

It determines what would be the appropriate compute for running this model. It also automatically generates the code scripts which act as a glue in passing the incoming inference requests into a form that is suitable for consumption into the model, and packages the response generated by the model into a REST API output.

SageMaker then takes all these ingredients and deploys them onto an endpoint which is high throughput, low latency and fault tolerant.

Now to give you a demo of all these awesome capabilities, I'm going to invite Giuseppe back on the stage to help us with a live demo.

Thank you. That's great. So before we get into the demo, let me spend a couple of words on the use case that we are trying to solve today. We want to solve a pretty simple problem, which is detect failures in machinery with machine learning. Well, it's a hard problem but it's simple based on the data that we're gonna work with today.

We are going to choose the AI4I Predictive Maintenance dataset from the University of California Machine Learning repository, which is made of a few features that you can see here - like the type of machinery, temperature, the rotational speed, the process temperature and well, the variable that we want to predict is the machine failure variable, which is a binary variable. As you can imagine, we want to train a binary classification model here to try to address this use case.

What we're gonna do is use SageMaker Studio to build, train, deploy the machine learning model. Plus, we will use Amazon SageMaker Pipelines to orchestrate the end-to-end entire machine learning workflow.

Great. Let me move to the demo. As Summit mentioned, we have extremely simplified the onboarding experience to SageMaker Studio and you can literally onboard to Studio in one click. If you click on this button here, "Set Up For Single User", what will happen is that a SageMaker Studio domain will be configured automatically for you with a single user profile being created with a set of default configurations - with an Amazon SageMaker full access execution role, default networking configurations, and so on. And you can soon get started with this domain.

Otherwise, if you want to set up Studio for organizations, we have also simplified that process with a wizard-based user interface that allows you to customize how SageMaker Studio is configured in your organization.

For this example, we already have a pre-created domain, just this one. Let me get into the domain and then open SageMaker Studio for this specific user.

Great, we are now in the new SageMaker Studio user interface. As you can see, we have the Application Gallery on top left, the left navigation and the landing page here, the main screen, where you can choose whether you want to work with Jupyter, JupyterLab or the code editor. And you also have the Getting Started resources accessible.

Since we want to train this binary classification model, we want to solve this problem as a data scientist. The first thing that I want to do is use JupyterLab to explore my data, preprocess my data, then build a machine learning model. So I'm gonna create a JupyterLab space here - let's give it a name - and the space will be ready in a second and we will be able to customize the configuration, as Summit mentioned.

So we can configure the storage that we want to allocate to this specific space. We can choose to add lifecycle configurations to customize the way the environment is configured through shell scripts which are executed at bootstrap of the instance, of the underlying instance. We can attach a custom EFS file system, as well as we can choose a variety of compute resources including CPU and GPU instances, as well as fast launch instances which will run in less than a minute.

So let's select one instance for example, and we use as the image, as the runtime image, the SageMaker distribution here, which is set by default.

Now let me run this space and I want to spend a couple more words on the SageMaker distribution. Back in May, we launched the JupyterCon SageMaker Distribution project. This is an open source project we have. What we have done is we have put together in a single distribution the most common frameworks and libraries - libraries for machine learning. We make sure we have built this as a Conda environment, we make sure that the dependencies and the versions are correct.

In this current environment, let's say, and also we have delivered the distribution in the form of a Docker container. So you can go to GitHub, even pull this Docker container, even try using the SageMaker distribution locally. The SageMaker distribution powers both the JupyterLab applications and the Code Editor spaces in SageMaker Studio. So this is pretty important because we can move from one environment to the other using the same runtime, keeping the runtime consistent without having to make sure that we customize the runtime when we move from one place to the other.

Great, as you can see in less than a minute we have our space ready, but I already have a space with all the code that we need pre-created. Let me open JupyterLab here.

So JupyterLab loads, as I said, this is a repo that I've cloned. For this particular example, let me get into the first notebook. As a data scientist, as I said, I want to make sure that I can explore this data, preprocess this data, then fit a model, train a model.

As I mentioned, the SageMaker distribution already contains the most popular frameworks and libraries. So here I want to work with Pandas, XGBoost and Scikit-Learn. What we're gonna do is we're gonna use Scikit-Learn and Pandas to transform the data, to do the data exploration and preprocessing. Then we are gonna use XGBoost to create the binary classification model.

We can definitely customize the runtime environment, installing additional libraries as you would do. And in this case, we are installing a visualization library which is Seaborn.

Then we download here the dataset from the University of California Irvine ML repository and we start with some exploratory data analysis. I mean we can run some of these cells here. What you will see here, we are just taking a look at the dataset, then describing it, then checking the value counts for the machine failure - so how many positive versus negative examples we have in this dataset. Then we are plotting this information, standard plotting operations.

Then at some point I want to show you, hey, I want to compute the unique count of values for columns in this data frame. And maybe I don't remember the API that I need to call. And what we can use is Amazon Code Whisperer, which is running on the JupyterLab now, which is providing us the suggestion for this function.

Another thing is I want a function that removes a column from a data frame. Okay, so here we see the Code Whisperer suggestion, I can accept these suggestions and I have my function ready. So it's a pretty powerful tooling that can help us developing within the notebook itself.

Great. Let's move on here. What we're doing is we are removing some columns that we don't need for this specific problem. We don't want to include them, then we are plotting some of the variables to check potential correlations using Seaborn pairplot.

And then we get into the data preparation code, the data preprocessing code. So this code is defined as a standard Python function, as you would expect, which is called preprocess(), which takes as input our data frame with all the data and does a few operations like doing train/validation/test splitting.

Then it applies some transformations using Scikit-Learn - specifically we are scaling numerical attributes and we are one-hot encoding the categorical ones to then fit the model later on. And then this function returns, as you can see, the transformed data - so the preprocessed data - plus the model itself, this Scikit-Learn model made with Scikit-Learn, which is able to transform our data.

So I can run this function and yeah, well, we have our new model, the feature model here, this is the representation of this model, and we can look at the data that has been transformed - standard preprocessing operations here.

Then we move into the model training here. Again, I have a train function which takes as input the preprocessed data - so the training and validation data - plus a few hyperparameters for the XGBoost algorithm, and then it fits the XGBoost model. That's what it's doing here - we are training here the model, the XGBoost model. The return value of this function is the XGBoost model that we execute the function, we are fitting.

We are also computing some metrics like the accuracy, precision, recall which are pretty convenient when we are working in binary with a binary classification problem.

Great. Now, last thing that you would do as a data scientist maybe is using these two models to run some inference on the test dataset, which is what we are doing here in this function.

Great. Now let's say we are satisfied for what we have done here in the notebook and we want to scale this code on different compute infrastructure. Why would I want to do that? There could be multiple reasons - it's because maybe my dataset size increases and I want to use, you know, a larger instance or a set of instances, you know, to train my machine learning model. Or maybe because I want to automate the execution of my code.

So I am in a notebook but I have to make sure that my code can also run as part of a pipeline, for example.

Great, how we can do this? There are at least a couple of options here. One option is use the integrated notebook scheduling functionality which is available in the JupyterLab environment. So what you can do here is, hey, I want to schedule the execution of this full notebook either now or on a schedule - so either I execute it on demand or on a schedule - and this runs in the form of a SageMaker job. So it will run on SageMaker infrastructure, okay?

The other option, which we're gonna show you here, is the @remote decorator that Summit mentioned. So what you can do is just with one line of code, we are importing here the decorator and decorating the function, the Python function. What we do is we run this code and then exactly the same function that you're running here in the notebook is moved, is executed as a SageMaker training job on separate infrastructure.

So what the decorator is doing is taking this function, understanding the configuration of your runtime environment, and then executing this function as a SageMaker training job. If I want to monitor the execution of the job, I can go back to SageMaker Studio, I can go to the Jobs screens, look at the training jobs here.

"And as you can see, there is one job that is executing, which is this preprocessing job. The preprocessing code is now running as a separate job on separate infrastructure via SageMaker training.

Great. You can look at the information of these jobs like the input artifacts and some other configurations as well as what I want to show you is that the runtime environment here has been picked automatically. So this is the SageMaker distribution again that has been used as the docker container to run our training job.

So the runtime environment consistency here is on both the JupyterLab environment, the code editor environment, we will see shortly as well as the jobs. So when you move to the jobs keeping the environment consistent allows you to reduce the amount of work that you have to do in terms of configuration configuring this time.

Great. Our jobs are running. We don't have to wait for the execution. Now, what I want to do is let's say that I have executed my two jobs. One for data processing. The other one is for the actual training of the model. And I have my two models, the Fizer model and the actual XGBoost model that does the classification.

What I want to do now is I want to work with the ML engineer to deploy this model to a real time endpoint. What can I do is I can go back to SageMaker Studio and go open code editor now. So I'm not going to create a space in this demo. It's the same procedure that we have seen for the JupyterLab. We just opened code editor that is already running in my environment and we will see the code editor environment loading here.

So great. Um let me open. Well, I can do it from here. Let me open the specific folder that I have on my user home directory with this repo again cloned and we move to the deployment module.

Great. I have defined deployment as a standard python script. The deployment procedure at high level consists of the following steps. We load the two models to memory, the Fizer model and the XGBoost model, then we create two SageMaker models. So a SageMaker model is represent some extra metadata that you need to deploy the model to a real time endpoint to build this SageMaker model we will use now the new model builder class that has been introduced by Summit.

Then what we do is after we build the two SageMaker model, we combine the two models is what is called the serial inference pipeline in SageMaker, which is just one of the ways to deploy machine learning models to real time endpoints in SageMaker.

Why I want to do that? Because I want to make sure that the inputs that will provide to the first model will be just a raw data that come maybe from the field. Then this model will transform the data and then pass to XG model for the actual inference.

And finally, we deploy the model. What I want to focus on specifically is how a model is built. So let's look at the new model builder class. The model builder class takes as input the actual XGBoost model. Then we tell um a few other parameters like like the requirements that are required for inference. And the most important thing we pass the schema builder.

The schema builder defines through an example in input and output through the schema builder. We are able to automatically compute the marshaling functions for your model. So how data has to be serialized. And this at the same time, you can customize this through extra classes. In this case, I'm using the request translator because I also wanted to further customize the way the inputs are provided to my model.

You might be asking, hey, what happens if I want to customize the way the actual inference happens? I have done this for the scikit-learn model where through um by overriding the inference logic, you can also customize the way the actual inference works in through the model builder. Why? Because for the scikit-learn model, I didn't want to call the predict method. I wanted to call the transform method to transform our data.

Great. That said what we can do is we can open a new terminal, we go to the deploy folder and we do python deploy.py. What what's happening here is the model builder class is packaging. our models is understanding automatically the configuration of the container images that we need to use for inference is setting the appropriate configurations for the serving stacks, the underlying serving stacks and is now deploying the machine learning models.

So there is an endpoint indeed that we are creating, which is called SageMaker-bt-d endpoint plus a suffix. How can we verify this? We move back to SageMaker to the UI we go to the endpoint section and we see that there is an endpoint running here that is that is being created in creating state. So this is the endpoint that we are building right now.

Great at the same time. If we want to test some inference, I'm taking an endpoint that is, that is already up and running what we can do, we can do minus one inference, which is an extra argument that i have provided to this script to execute some inferences and see the results for a positive and negative example um for our predictive maintenance problem.

So based on the attributes, i believe it's the rotational speed which is influencing mostly here, we are getting um a positive and negative result here.

Great. Um so we have seen how to do that exploration preprocess data, build the model. Then we have done a model deployment step. Now we want to put all together a stitch pole together in a SageMaker pipeline. So how can we do this using the new remote? So sorry, the new step decorator that has been introduced by Summit, let me move to the workflow here and open the pipeline.

But before going into the pipeline details how the pipeline is defined. I just want to show you the steps that we are going to create. So the first step is as i said, the preprocessing step as you can realize this is exactly the same code, the exactly the same preprocessing code that we we were running in the notebook. It is the same function that i have moved in a python script.

Then we have the train function, train function. Again, it's the same function that we are. We were running in the in the notebook, moved in a python script, no specific sdk code to be written here to define the step itself.

Then i have an evaluation function which is very similar to the test function that we had in the notebook. But we are also adding, we are also returning an evaluation report from this function because we after we train these two models before deploying, we want to store the models in the SageMaker model registry and we want to attach an evaluation report so that i can then access the model registry, look at the specific version and check the performance of my model.

Finally, we have the register model step. This step consists of, first of all, we want to create the two models, second learns maker model and XGBoost SageMaker model. These are the same functions that we used for deployment. It's the same code. And then we put together these two models here as a pipeline model. And we register the model in the StageMaker model registry, attaching the evaluation report which is here

Last step in the pipeline is about the deployment. This is very simple. What this is doing is taking the model from the model registry and deploying to a SageMaker endpoint.

How do we put all these steps together. Now how do we stitch it together? We have to take a look at this function which create the steps. Um what we do is we are using the step decorator. A decorator is a python function. Here, we are not using the decorator with the annotation, let's say syntax, but we're using the decorator as a function. And the reason is that we are implementing a modular code, we are engineering our code. So we are implementing more modularity.

Here, we are passing the various functions that they have that we have seen so far. Preprocess train test register, deploy to the decorator plus, we are making sure that the outputs of the previous step are passed to the subsequent steps accordingly. So with standard python code, finally, what we do it's a standard SageMaker pipelines code, we create a pipeline, we pass the steps to the pipeline and we define a few parameters like a max depth and deploy model which is a bull variable to decide whether we want to deploy the model or not. And then we insert the pipeline, we insert the pipeline definition in the system and we start the execution of the pipeline, right?

Let's see how this works. So what the decorator is allowing us to do now is to convert the python functions that we have seen into SageMaker pipeline step automatically without worrying too much about writing X-ray sdk code. It's now we are now adding the pipeline into the system and starting its execution.

And if we go back to SageMaker studio, what we can do is look at the pipeline screens. This is my pipeline, we have one which is in executing state which we started right now. Then we can take a look at the graph of this pipeline, which is indeed a preprocess strain evaluation model registration in the model registry. And finally deployment of our model, if we take the specific execution, we can see where it is. Now it's in the preprocessing stage. And we can also take a look at the parameters that were used to execute this pipeline plus additional information and additional metadata related to the specific execution.

Last thing that I wanted to show you is about the SageMaker model registry. So one of the steps of the pipeline would register a new model version in SageMaker model registry so that we can decide whether we want to promote this model to production. By comparing this version maybe with the previous versions of the model.

We opened the so called model package group which lists all the versions of the models that that i have produced. So far we take for example, this version 10 here and here you can see the evaluation report that i had attached to the model in the model registry. So with the matrix, the accuracy metrics for the XG model as well as you can look for example, at how this model has been uh would be deployed for real time inference.

So all the information about containers environment variables, the how the, how the serving st has to be configured have been applied automatically through the model builder class? So great. This concludes my day more. And um I'm very happy to uh invite here on the stage, Mark Neumann from, from BMW Group who will tell us how BMW is a scaling machine learning development using Amazon SageMaker Studio.

I think you know that BMW Group is a multi brand, automotive actually, of course, we are known for BMW cars and motorcycles, but there are also other brands, there's like uplifting lives with like our Mini brands and the Rolls-Royce luxury automotive for BMW Group as well. And did you know that in North America BMW Group is actually the largest automotive exporter.

So there's a plant in South Carolina. I'm going to be there next week and they're producing, they are the SUVs the X series vehicle models and distributing them worldwide. Of course, a large proportion has been sold in the US, right? But they are distributing this worldwide.

So BMW Group, my team is responsible for, for building on providing the AI and machine learning platform services and we're working a lot with different business segments to help them build the actual AI solutions. And for example, there's one team, we work really close with them. They are responsible for the automated quality inspections in our worldwide plans."

So as well in Spartanburg and what they managed over the last years, they ramped up a global solution that achieves this high quality, that we need these high quality inspections in the plants. And what they actually do is they equip the production lines with sensors like video cameras, acoustic recordings, other sensor measurements, they do those recordings and they use AI models basically to evaluate.

So what you can see with the picture, right? There is a camera pointing towards the production line and every vehicle that's passing an image is taken. And in this case, it's about presence detection. So this AI model that is getting the image is checking for the part that is highlighted in green, whether it's there or not, there are other use cases checking for the right part that's been used, checking for any anomalies, right? Or checking for distances between those parts.

And why is it relevant? Right? Because they doubled the number of models over the last years each year. And now they have hundreds of models in the in the production lines that need to be trained and developed. And then there are any changes, for example, regarding lighting in the plants or the production set up changes, they need to retrain those models. It's quite some effort. And for this, they need a capable environment to do so. So they would like to focus on their model development, not on setting up the environment. And that's where we come into play.

And for use cases like them and the growth that we have at the company we needed to redefine our approach. And that's the story I wanted to share with you today, how we did that. But to understand that's not the only segment where we benefit from AI, they are more like just some examples.

So for example, in product development, we're using machine learning and as well to design the features and models that our customers admire. For example, what we can use with machine learning. There is to identify the control levels that have an effect on the performance of our vehicle functions. Just for explainability. It's not about using the model and productions, about understanding basically the control levels.

There are many other cases of course. And in supply chain and logistics, it helps us to predict our demands, more accurate and reduce the planning efforts. Very important in these times. Looking at the last years with the semiconductor shortage, for example, to manage these scenarios, vehicle production, we've seen the quality inspections. There's more like predictive maintenance, for example, and vehicle production and then in sales and after sales, it helps us to improve like the interactions that we have with our customers.

And even for our workforce AI helps us to be more efficient and what's important in a large enterprises for employers is like to find the right information. And of course, you've seen all those generative AI recuse cases here at the conference. This of course is also something that we are trying to, to leverage to scale AI and those business processes.

We believe that there is a strong platform so that the individual use case teams can actually focus on their use case and don't have to do all this set up work. And that's why my team is providing these platform services.

And before we do this deep dive on machine learning AI development, I wanted to give you a very big overview on the platform services that we have, how we look into scaling. So one thing we have is AI business services is basically a cluster on services that are not domain specific but can be used in all the domains. So for example, there's an API on doing text translation and an API on extracting information from documents. And what's good with that is they don't need to re invent the wheel. They can simply use the service, integrate in their processes and all is good.

And then there's more the custom machine learning AI part like we've seen with the vehicle production, they need to build AI models. And for this, we provide the tools for like data scientists and business domain experts as well. And we dive deep on that and then we have the management for unstructured data like images and audio data, we need to record it, manage it and provide labeling tools and labeling services to annotate the data.

And then for those digital assistance, they benefit from a platform where they can develop those assistance and operate them in a safe way. And with MPS, we provide the best practice solutions to industrialize the AI models that we have developed we've seen in the demo that will be more smooth integration between AI development and production. That is something that we are going to leverage in there as well.

And then the last part which is especially becoming more and more relevant. now these days is like governance for AI. So we need to have this overview on our AI use cases, right? We need to know where they are what they use and we have a central repository for all the AI models to have like a safe revision proof storage for it and all the metadata associated with that. And we even have now risk assessment processes to assess the individual AI use cases. So that's like the overview in the portfolio.

But today I wanted to deep dive on the machine learning development part, like the code centric part for data scientists, machine learning engineers. And for us like way back, it started all with specific purpose environments. So looking back like seven years ago, we've seen like an increase on use cases that work with unstructured data, with images, with audio data, with texts. And what they mostly did is they took larger pre-existing models and did transfer learning to our data to solve those use cases. And what they needed is sufficient GPU performance GPU memory in decent amounts.

And that's why we ramped up this deep learning platform for them. And they could use Jupyter notebooks, they terminal in a containerized environment that we've run on premises in our data centers and manage this environment for them. And then there came those use cases that wanted to use more sensitive data. And then we saw the limits of the current platform approach and ramped up a new platform which provided the necessary level of security for them.

And then there's of course many data scientists, machine learning engineers that work locally, do their experiments there and many different other local solutions with other like software vendors and the challenges that we had with this fragmented approach of course, is like the multitude of operational and governance overhead for us in the end and for users as well. If they're working on different use cases or different aspects, they had to use different platforms.

We had real challenges to manage the compute demands because we need to plan ahead. We need to purchase hardware, ramp up things. We have to like exchange existing hardware and so on really tough. And we had those peaks where users were unhappy and we had those times where the hardware was not properly used. Like on weekends, for example, it's really hard to manage.

And that's why we decided to overcome this fragmentation and challenges and come up with a solution like a one fits all solution that would replace those platforms and motivate even the local users to switch on a central solution and bring all their workloads there. We coined the name for the solution Jupiter Managed because we wanted to make clear that there's a one fits all approach now which should take in all the combined requirements from the different use cases.

So starting with tools, we needed to provide all the tools that they need and there will be a growing demand in that of course, and everyone needs to be able to access their data. So may it be data in our group by data lake in the cloud data hub or data in on premise databases or uploading local data? We need to supply that. Of course, the new solution needs to be elastic and scale with the demands. That was one of the major requirements and have a low operations effort.

From a security perspective. we had to bring in our like corporate standards to have like security incident event reporting in there and integration with our identity and access management systems, malware scanning for the people that upload data for example. And we needed to support of course different teams up to different legal units having people from the BMW bank on there, people from the automotive in there and separate clearly their environments in a compliant way.

And last but not least, we wanted to foster collaboration in the teams with sharing data and sharing code inside tooling. And we looked at different alternatives. And finally, we came to SageMaker Studio and said we want to use that solution. And because SageMaker Studio already fulfilled many of the requirements which allowed us and my team to actually work on the integration. So we didn't have to build the platform from scratch, we could work on the integration.

And I wanted just to provide you a glimpse on this provisioning and on boarding process for teams, how we do that at BMW now. So what we have is a BMW specific service catalog for IT services. So a new data science teams goes there and requests a new workspace and if for example, have to specify their cost center because we need to charge the costs internally to them, right?

And then an automated process kicks off and this automated process will create a new AWS account for them that is managed by us. And we provision all the infrastructure into that like the SageMaker Studio domain, for example, our custom kernels that we need the life cycle configurations that we need and some of the security features that we add on top.

And then once that's set up all the users can enter a central portal and access their their workspace basically. And when they're in Studio, they can directly import data from our data lake. They can integrate with the on premise database and basically create value directly and they don't have to mess around with all the configuration and the set up and the security and all of that, we are taking care of that.

So as I said, SageMaker Studio allowed us to focus on this integration. And I just wanted to mention like a few features that we really value. But there's a lot more, for example, like choosing different instance sizes for the different workloads is very good for us because we have one platform for all the different cases, they can go SageMaker training with distributed training jobs for larger workloads.

We can bring in our custom kernels with like BMW specific libraries to interact with our data lake, for example. And um there's the option to schedule notebooks. We've seen it before in the demo. It's very important for some of our users. So all in all looking at the solution now with SageMaker Studio is very good for us because we benefit from those new features features like we've seen today, features that are added constantly.

We also value that SageMaker Studio is building on open source components like we've seen today with VS Code with Jupyter Lab. for example, it's very good for us. We really value that.

So what's next for BMW Group with SageMaker Studio? So we will continue on boarding teams to the new solution, not everyone is on board yet. So it's still ongoing. And what we are working on is integrating our MPS solutions for AWS, which we already have for some years. So we already have a SageMaker based master solution and we are going to integrate it for smoother experience.

We of course would leverage the features for increased developer in data science productivity, like code generation. Many users look very ambitious to this faster start up times of notebooks. So this is really valued and last but not least, of course, we would provide access to large language models and other foundation models inside our development environment. For example, for people being able to analyze texts doing clustering, for example, with large language models.

So with this, we see as well prepared and I really appreciate the new studio release. And with that, I would invite you back on stage to. And so really thank you for the for the game and the release. I really appreciate that.

Thank you, Mark. Thank you for giving us insights into how you're using Studio at BMW. Um and you have been a great audience by the way. Thank you. Thank you for coming today for this session.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值