SaaS DevOps deep dive: Automating multi-tenant deployments

最新推荐文章于 2024-07-11 12:08:29 发布

李白的朋友王维

最新推荐文章于 2024-07-11 12:08:29 发布

阅读量127

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134824643

版权

all right. well, welcome everybody.

um thanks a lot for joining our session today. so today we are talking about sas and dev ops and more specifically, we are focusing on the automation best practices in a software as a service for sas based model.

as you can see here, my name is anu sharma. i'm a solutions architect here at aws. and with me is my colleague, alex, who is also a solution architect and we both are part of a team called s a factory. and basically in our role, we help aws customers build design and architect solutions on aws.

so today's talk is divided into three sections, right?

so first, we will talk about like what problem statement we are here to solve today anyways, right? so i'll talk about what's more interesting and challenging in automating sass workflows.

and then we will have alex come on stage and he will give us how to solve that use case in a containerized cases environment.

and then i'll come back and talk about the same problem statement and how to solve in a server less example, right? so the idea is to give some concrete examples and at the end, we also have the repository get a repository rings which we will share with you so that you can get your hands dirty later. just keep in mind this is a 300 to 400 level session. so we're not talking about like eks or sur fundamentals here, but rather trying to solve an advance use case using in in a space work flow.

all right. so let's talk about what a basic sass automation workflow even looks like, right?

so now when most of us think of a sas application, we think of something as some sort of centrally hosted environment where our customers come and try to access our sas application, right? so we are basically selling a service to our customers pretty much, right? and we think of some sort of shared environment, some sort of shared resources that our customers will be accessing at the same time, right?

and typically you would use some sort of deployment pipeline or infrastructure as a code to pretty much create this as environment right now, this s a application is useless unless you have customers on boarded into this application.

so typically what s providers will do, they will build what we call it as a tenant on boarding service. and the role of this service is to onboard customers into, into your s a application. and what that means is creating some tenant configurations, creating some users, creating the meeting and billing profile, et cetera.

now, once you have on boarded some tenants or customers into a sass application, then your tenant users will come and start accessing your application, right?

and at the same time, you need to make sure that your deployment pipelines are updating your environment consistently. then i say you us a sis provider, make sure that your s application architecture has been consistently updated with all latest deployments and releases and it should ideally happen with total transparency from your customers, right.

so the s providers, customers will not idly know that there are releases going behind the scenes. in some cases, the sas providers will also build what we call it as tenant off boarding service. and the role of the tenant off boarding is pretty much the opposite of on boarding, right? basically removing all those configurations and users and all those profiles from a system.

now we wish and we hope that most of these as providers can automate their work flows in this way because you see if you are able to share your resources across tenants, what that means you are first of all getting that cost efficiency, right? cost savings. and then you are having that operational excellence because there's only one resource, one environment to update.

but what we have seen over the past few years working with multiple customers, it is not as basic as, as you would think. in fact, we see many times sass providers creating per tenant per tier resources.

now, you might think, why would you do that? right. so the main reason actually there are a bunch of reasons like, so sometimes what we see is that sas providers have an on premise software deployment model and they're trying to migrate that to, to a sas based environment. and the software itself does not support multi tenancy, like it does not really have the support for multi tenancy built in and your business is after you like you got to go to a sass as soon as possible and you as a technical person really don't have much to do but create those silo or pertinent resources at that point, right? and go to sass while you just keep rearchitect your a applications behind the scene or it could be just guaranteeing sls to a certain tier of tenants.

um so that's exactly what we are showing here. we have a basic tier where we have a pool environment where all your basic tier tenants are coming and trying to access your s a application, but i have silo resources or dedicated resources for certain premium tier tenants, right?

and many times your architecture is designed in a way that you have no other option but to create those silo resources to guarantee certain sla's right? and at this time, you might have even different tenant or tier based configurations.

so if you look at this example specifically, right. so we have a port count of two memory of 2 52 56 for a ba a tenant. but for a premium tier tenants, you might have more uh you know, better infrastructure, like more high availability, low l latency requirements, et cetera.

and many times we have seen that you have to like create those dedicated resources just to support those sla's for certain tier of tenants. again, goes back to your business drivers, right? how you want to go to market with your s based strategy, then compliance is another factor. like regardless if your application is architected in a way, it it can support multiple customers. there are many times we have seen that you have to have a deployment in europe, in america, in canada regions, right? just because you have to support those local laws and regulations.

now, obviously another factor could be just blast radius. so you might just want to, you might have like hundreds and thousands of tenants and customers and you just want to have a limited blast radius if something goes wrong, right? so you might end up creating those parts of s deployments.

now, in these circumstances, obviously, you're compromising on cost savings, you have multiple resources and it will cost you pretty much right? you compromise on operational efficiency too because you have more than one resource to update. so every time you are releasing a feature you're releasing across all your s deployments basically.

so given this, let's look at a more realistic automation workflow.

now, in this case, given our example of tier based deployment where we have a basic tier where you are pooling certain tenants in basic tier, you might think of just creating this basic tier to begin with as part of the bootstrapping of your application because you know you always have some basic tier tenants, right?

and now when certain tenants are trying to on board into your application, your on boarding service have this responsibility of making sure that i i know this tenant is a basic tier tenant. so i need to on board this into my basic tier, pooled environment or basic tier pool resources. and as part of doing that, i'll create some configurations, create some kind of users um add some meeting and billing profiles, etcetera.

but what about if you are now provisioning a dedicated resource for a silo tenant, right, or premium pay tenant? now you need to add another function into your on boarding feature, which is we call it as tenant provisioning. and the role of this provisioning is basically at this point, creating those silo tenant resources.

and at this time, you also might want to look at what is the configuration of the silo resources which i need to configure, right? and and this is very important, your deployment pipeline will assume that additional responsibility of keeping all these resources in sync.

now keep in mind in a s environment, you're offering a service to a customer, right? and you need to make sure all your tenant deployments, all your tenant resources are from a single code base. that's pretty much how you will be typically a s a provider, right? so everything will be from a single, consistent port base and all the tenants and tiers has to be in a single release eventually.

so that's the goal we are trying to achieve today. and this is the problem statement that we will look into our examples. like how do you achieve that seamless provisioning experience? how do you make sure your deployment pipelines is able to consistently keep all these environments in sync. obviously, the talent off boarding also takes an additional responsibility now because you might have to delete certain tenant resources.

so having said that now let's jump to the examples and i'll have alex come on stage and talk about how to typically implement this problem in a container example. thanks in a bar.

thank you. ok. hi.

um so before i'll dive into a specific implementation details of the tenants on boarding and the deployment pipeline, i guess we should first take a look at the application deployment model. and uh in this case, uh we'll have again those two tiers that an above mentioned before.

so the basic tier and the premium tier now for basic tier customers, we're gonna use shared resources. we call that a pull deployment model. uh in this case, they will share uh certis namespace uh running on amazon es cluster and uh they will share dynamo db and sqs resources for the premium tinte, premium tier tenants. sorry, we're gonna use dedicated name space in certes, that's like a arbitrary decision for a specific use case. and of course, every use case uh can have other requirements.

and we're also gonna create dedicated dynamo db and sqs resources. so that's gonna be our application deployment layout and that's what we'll need to provision and update. so let's take a look.

so uh i also would like to cover what kind of automation tooling we're gonna be using here. and hopefully you saw that in the abstract because these are quite specific choices, but uh you can potentially use other tools in a similar way.

so um we need to have two specific pieces first. so we need to define our tier configuration uh like an above mentioned, that's how we essentially translate business requirements, essentially our pricing and packaging models or something like that into technology specification.

so let's say i decided that for my basic tier like an app show before we're going to use that and that amount of memory and that and that amount of cpu and a certain number of instances or container. and uh for the premium tier tenants, we can use more memory to have lower latency and so on, right?

so that's our tier configuration template. and when we on board tenants, we're gonna create a specific instances of those configurations and that's what we call tenant configuration in this case. and we'll have multiple of those.

and uh in this talk, uh we're gonna show you how to achieve this using terraform, which is infrastructure automation and provision in two. and um helm that allows you to manage applications in kubernetes. and we're gonna use helm chart and he releases where helm chart describes the deployment model of an application in kubernetes cluster.

so it can be resources such as ingress and deployments and those kind of resources. and uh he releases are instances of a hem chart that we can deploy multiple times mimicking are essentially basic and premium tenant resources.

now, next, we'll need to implement tenant provisioning and deployment pipeline. and here we're gonna use argo argo workflows, which is a workflow orchestration tool, an open source tool and flux, which allows us to define continuous delivery and progressive delivery workflows.

so that's gonna be our examples here. but uh you can of course use other tools like an rb cd and and so on and so forth, right? and and so on here was using a cd k. so instead of terraform, you could use the cloud development kit and and so forth.

so first, i would like to walk you through how a tier configuration would look like. and let's start with terraform implementation.

uh by the way, those uh screenshots are from a builder session uh that you can, we will also point you to at the end so we can later after invent because it's already done, i think uh look at the slides and uh run through it yourself. so that's, that's the repository structure from there.

here, we have our terraform templates and you can see that i have files per tier. so i have a file for silo um resources, which is my premium tier

I have a file for my pool resources, which is my basic tier. And if we pick inside the silo resources file, so what we find here is that this is a Terraform module and this model acts like an application because it references another model called tenant apps, a very specific version of that 1001. And that tenant apps model essentially defines all those ingress, deployment and other certis resources. Uh sorry, it was resources um like DynamoDB table and SQS.

And uh we also provide some parameters like uh deploy consumer and producer. So those parameters that I pass on to the model, they infect my business configuration, right? I can say for my silo customers or my premium tier customers, I always deploy dedicated resources for those producer and consumer components. But for my basic tier customers, maybe I will not deploy dedicated resources for this. I deploy them separately. So that's exactly where I can play with this configuration and make those adaptations.

So that uh was for Terraform and you can see that I also have a tenant ID as a placeholder later on, I'll show you how we replace this tenant ID during on boarding. And we essentially capture a very specific instance of this configuration for each and every tenant.

So uh now let's take a look at Helm which covers uh ceretti resources in this case. Uh here we have again, Helm release templates for every tier and let's speak into the silo one. So here again, tenant ID is a placeholder and you can see at the bottom that I have two configuration values with concrete um specification, deploy apps and deploy in interest.

So deploy in interest in this case, takes care of routing. And maybe in our scenario, if i use subdomain per tenant, for example, it doesn't matter if i deploy a basic tier customer on board or a premium tier customer, i will always need to deploy in interest to provide them with a subdomain. But maybe for basic tier tenants, i don't deploy apps each and every time, right? We share resources here. But for this silo deployment model, i will deploy applications for each and every tenant.

So that's again where i define concrete parameters, but i tempt the tenant id configuration. And if our business model changes that we would need to make those decisions to turn the changes into a technology specification.

Now, let's look at tenant configuration which essentially as i mentioned before an instance of a tier configuration. In this case for Terraform, we have another folder where after on boarding tenants, i each time i create a file. So in this case, let's take a look at tenant one which uses a silo deployment model. You can see it's still the same and file that you've seen before. But in this case, i just have a concrete tenant id and you can, we'll see later on how we can use this version of mainly for him to do staggered deployments, how it can help us to maintain this inversion control.

And if i look at Helm again, i have those Helm release files for each tenant that i on board, regardless if they are silo or pool. And uh what matters is essentially the values that i capture inside the Helm release file. Again, Helm release is an instance of Helm chart that describes the deployment layout of my coat resources in the cluster. Ok?

So these were the files or the way we designed tier configuration and tenant configuration. And now let's walk you through the process, how we use them. Ok.

So as i have mentioned before, we have any tenants, we need to bootstrap our system. We need to have certain resources created before we can on board, any tenant, right? Like creating a account, for example. So um but that, that's i guess quite obvious, let's take a look at other additional resources.

So first we need uh source code reports here, right? We need a way to maintain our service source code. Then we need some to have some build process that turns this source code into uh let's say a container image and we need to store that container image in the repository. In this case, we're using Amazon, sr Amazon Elastic Container Registry. Since we're using HE to manage our certis applications. Uh what we'll do, we'll also create a HAM version and push it also to a repository to use it later on. And finally, we need a way to manage our tier and tenant configuration. Essentially our configure pro um to store all the references to those HE charts which in turn reference the container images.

We also have a Kubernetes cluster, right? So in this case, we need to prep provision Amazon ATS and uh it can be kind of another application or a central team can manage this cluster, right? So it's kind of multiple levels of tenancy here, but you need to have a cluster ready before you on board any tenants uh within that cluster.

Since we use Flux in Argo CD, we need to install Flux controller in UC namespace in this case. And uh we also need to install our workflow controller. And finally, um here's the choice. Uh what we decided to do here is to prep provision the basic tier resources because the intended experience we want to have is that whenever a basic tier customer signs up, we want that to have an immediate response, right?

Maybe for pre to your customers, uh there is a different sales process where we can interact with the sales team and uh while we close the deal, we can provision those resources. But for basic t let's say free trials and those kind of things, we want that to be immediate. So in this case, we decided to prep provision, but that's indeed a business decision when you do this. Ok.

So looking at this, uh now let's see how we do tenant on boarding on top of those resources. So we already have our uh certis resources and ls resources that i showed you before. We have our config report with the tier configuration templated configuration there and uh we install flux con con controller, sorry.

So what flux controller does here? It watches the configuration repository for changes in tenant configuration. Whenever we want to add a new tenant, we want flux to know about that.

So let's assume we have an external tenant management system that notifies our application that there is a new tenant that signed up. And in this case, we have a premature tenant with a tenant id one. And usually there are some kind of unique i ds for tenant just an example.

And uh we have a provisioning component here that receives this event and triggers argo workflow execution to orchestrate the tenant provision. So what argo workflows does generically, it reads the tier configuration, those template files i showed you before, replaces the placeholders with actual values both for terraform and for he and first deploys the resources because usually those are prerequisite to the runtime code running in cert cluster like database for cues.

And then uh what workflow does it writes back the concrete 10 and configuration back to the uh configure repository. So both terraform and he flux watches for changes in helm resources. So um sorry, recognizes that there is a new tenant and essentially it reconciles the certis resources by creating a new name space and the service um deployment.

Ok. Now, if i want to do provision our resources for another tenant, another premium tier tenant, let's say tenant two, it's gonna be kind of the same thing, right? It will execute the workflow. Again, the workflow will pull the tier config template configuration, replace it with a tenant id value and uh deploy the west resources plus commit back the hand release configuration back to the configure point which in turn will trigger flux to reconcile the cobern resources and create a new name space for this tenant because it's a premium tenant and uh deploy the service.

So um that hopefully showed in a bit more detail how argo workflow executes. And so now we talked about tenants on boarding, right, how we on board new premium tenants. But now let's look at the other part that uh an i mentioned deployment pipeline, right, how we continuously update those resources in an effective manner.

So we already have our two tenants provisions. And of course, we also have the basic tier, right? And just not shown here to save some space. But we also added an Amazon ECR report stream that i showed you in the bootstrap step. So that's where we publish our hand charts for continuous deployment or for deployment pipeline.

We need flats to also look at the changes in hel charts whenever we push a new service version and uh i leave infrastructure updates out of scope here. And above will touch on that in the serverless example. But it's a similar approach with the other tools in telephone, right?

So let's look at our pipeline. What i need to do is i need to configure flux to watch again the cr repo. And if as a developer, i push a new change to my service source code, uh what i need to do, then i need to uh build this source code, right? Push the container image to the Amazon ACR repository that is not depicted here. And i also need to create a new version of the hem chart. So it was 1.0. Now it is 1.1 and um we configured flux to watch for minor version changes.

You can see it's highlighted in like a light blue one point x. So whenever i push a ham chart version that matches that pattern flux will automatically reconcile all resources matching that semantic version. So essentially what it will do, it will apply those changes in a random order now because we have basic tier tenants and premium tier tenants.

And we may have many tenants. We usually would like to have some control over this roll out because maybe we want to test those changes first on a small amount of tenants monitor for metrics, provide some big time to see if the system behaves well with the change and maybe roll it back before we have a big blasts.

And uh the amazon, we usually call this staggered deployments where you can essentially deploy changes in waves. So you 1st may deploy the basic tier and then to premium tier but not all premium to tenants, subset of them, et cetera.

So what I want to show you is how you could achieve this in this example. So in that case, what we do, we don't put a wild card on the version selector in the releases or in tenant configuration. Instead we put a concrete version 1.0. So if again, we create a new version of a container image and we push it to Amazon ECR repository. And now we have version 1.1 in this case, Flux will not do not, will do, will not do anything because the version doesn't match. So it will just ignore it.

Now here what you can do, you can implement a workflow that uses Flux notification controller to first push changes, let's say to tenant one change the he uh sorry, the helm release configuration of tenant one to version 1.1 and then listen for reconciliation. Um either success or failure, Flux ent controller, it sends an event if reconciliation is successful or raises an alarm if it fails. So it's asynchronous flow. So you need to have implement some weight loop here. But once you receive this event is everything is successful, you can also promote the version of the 10 and two configuration. So and you can do a similar thing for Terraform, right? If you're a member, we had a concrete version of a Terraform model we use for the application. So you can do a similar progression also for infrastructure resources at each and every step.

Ok. So uh that was containers example, how we can build uh tenants on boarding and deployment pipeline for uh containers and now and above, we'll complete the service example. Thank you. Thanks Alex. Thanks Alex, excuse me.

So now we get to see the same problem statement and how to solve it in a serverless environment. The application deployment model again, we are going to refer back to the one that we showed before for our container example. And as part of the problem statement here, we have a pooled environment or pool deployment where all the basic tier tenants are sharing single set of resources. And we have siloed environments for our tenant premium tier tenants so let's look at how to solve that whole provisioning and deployment problem using service services, by the way, sorry, I forgot to mention that the overall architecture of the SAS application itself is also serverless as you can see. So we have API Gateway as our API we have Lambda functions for compute and we have DynamoDB as our data store, right? It's just that the single SAS application is deployed across different tenants and tiers.

Now, in terms of the tooling, what we will use in our case will be AWS Serverless Application Model or AWS SAM for building our infrastructure as code again, as Alex mentioned, you can choose whatever you want, you can use CDK or Terraform. But in our example, we have used SAM which is specifically designed to, to define infrastructure for serverless applications. In terms of the deployments, we will be leveraging CodePipeline, which is again a managed service for building your Code deployment pipelines. And more interestingly, we will be actually orchestrating the whole deployment workflow using AWS Step Functions. And I'll show you how that Step Function works and the definition of Step Function. I'll show you how to stagger deployments across multiple tenants and tiers. As we go along.

In terms of configuration, we are leveraging DynamoDB as our key value data store for keeping all tenant and tier configurations. Provisioning service is basically a very simple Lambda function behind an API Gateway.

Alright. So let's start hooking all these services together and see how it works, right? So first thing as if you might remember I mentioned to you and as I was describing the problem statement, our idea and our goal is to have all tenants share single code base and single, single releases, right? And what that means is we want to have a template which we can leverage to build the infrastructure for all those tenants, right?

So this template YAML is exactly what that's supposed to do. And it is a single template for building all tenants and tiers. But the idea is to be able to parameterize this template, right? So and that's how we support different configurations for different tiers. As you can see, we have a parameter called provision concurrency. And what that means is you should be able to provide higher provision concurrency for certain premium tier tenants and be able to guarantee lower latencies. Because when you use provision concurrency features for Lambda functions, it basically pre-warms certain instances of Lambda function and avoid those cold starts. And that's the idea of parameterizing and using this parameter so that we can still use the same code base, but configure the infrastructure differently for different tenants.

So now let's look at how the bootstrapping works. So the first thing when it comes to bootstrapping is we have to basically create those shared services and deployment pipelines right. So all those provisioning onboarding services, your deployment pipeline itself that has to be created first before we start building the SaaS application plan. So in our case, this boost.yaml, it's again, infrastructure as code built using SAM will be used to build all those shared services.

Now, the next thing similar to what we did for EKS would be to bootstrap that basic tier pool environment. And this is where we are leveraging a DynamoDB table and we call it as deployment configuration. And I'll show you how this deployment configuration kind of glues all the provisioning and deployment in a single place.

Actually, the next thing what we have done is we have used a Lambda trigger to provision tools to basically provision or start an AWS CodePipeline, right? So the idea is as soon as you create an entry into this DynamoDB table, we'll have DynamoDB Streams enabled and that will start your CodePipeline.

Now, this CodePipeline is doing a bunch of things. The first thing is it will clone that source code repository which includes the template.yaml as well, right? And then it will use SAM build and SAM package to produce our deployment artifacts which are basically zip files, which contains all the code for all the Lambda functions and place them in an S3 bucket. And then is when we start the deployment step. And as you can see that step is actually a Step Function.

So what we have done is we have integrated the Step Functions workflow as part of the CodePipeline. I will explain to you how at a high level this workflow works and then we'll go into the details and we will look into the definition of the Step Functions in a few minutes.

So what this Step Function workflow is doing is it is actually monitoring that deployment configuration. So as part of the workflow, the first step is hey go look at the deployment configuration, DynamoDB table and see if I have a new stack to create. In this case, it happens to be stack basic for our basically a tenant because we are doing bootstrapping and it also goes at the same time and look at this tier configuration DynamoDB table. And if you might recall, we have all those settings per tenant and you can just keep adding as many columns as you want into this DynamoDB table. And then you basically use these parameters, use this configuration as parameter to that package.yaml that was built in the previous step, right? And the output is basically your basic tier stack which includes in our case, a Cognito user pool and API Gateway, your Lambda functions and DynamoDB tables. So at this point, you have your basic tier stack up and running.

Now when you are trying to onboard customers into your basic, your stack, it it it you already have this stack up and running, right? So all you have to do is maybe just add some configuration at this point. Now, in our case, what we are doing is we are creating a user pool group within that shared user pool every time we onboard a basic tier tenant and we will create a tenant admin user. The admin user gets an email that hey, you are created now you can go and start accessing your SaaS application and the tenant admin user can now go and add more users and do that self onboarding on the basis.

But what about your your silo model, right? What about those silo tenants? And how do you orchestrate the onboarding of silo tenants in this case? Because if you may remember we are creating dedicated resources for those silo customers, right?

So we will again kind of hook up into that same deployment configuration DynamoDB table. So what our provisioning service will do? So if you remember like we had that onboarding service, which was kicking off the provisioning services to start the provisioning process, the provisioning service will create an entry into the same DynamoDB table depending upon tenant name, tenant ID, whatever you want to choose, right? It will again, trigger the same pipeline and the pipeline will start orchestrating the same steps again. It will again do the build, it will again, you know, clone the repository, do the build and start the deployment using the Step Functions workflow. And again, it will look into this deployment configuration table. And this time it will know that you know what I have a new tenant, one stack tenant one to be created. So let me just go ahead and create that stack using the package.yaml. And this time I will use the premium tier configuration, right? So now I know that tenant one and by the way, I'm not showing all the columns for all the tables here. Just assume that tenant one is a premium tier tenant. And we need to use the premium tier configuration to pass to this package.yaml to build the stack, which is very specific to that premium tier tenant. And again, the output is you get a dedicated set of user pool, you'll get your APIs for for the premium tier, you'll get your Lambda functions and your DynamoDB tables all created for your premium tier tenants.

Now, what about your deployment pipeline though? Right. So again, the idea would be that you want to be consistent with how you deploy changes across all these tenants. Right? Now, the only difference between provisioning and deployment is the triggering point changes because now you're committing your source into, into your source code repository and what this CodePipeline will do, it will just monitor that code that, that the changes to the repository and it will trigger the CodePipeline as soon as you make the changes to your source code repository, which could be like a Lambda function change or whatever changes you are doing, you will again build using the same SAM build and SAM package and you will produce those deployment artifacts. But now your Step Functions workflow is responsible for updating all those environments consistently. Doesn't matter how many they are, it could be five, it could be 10, it could be 50 right? And the key is again going back into that deployment configuration and looking at like how many stacks do I need to update? Just using that as a hook to kind of glue everything together and and using that deployment configuration and those tier configurations to update your all those tenant and tier configuration stacks.

Now there is one thing which Alex kind of touched base at a high level is the concept of staggered deployments. Now, the idea is in many cases, what we have seen many SaaS providers doing, they use, they deploy their features to certain basic tier tenant. First, they see how it goes and then they roll out those features to certain premium tier tenants, right? And the wave number column here is basically being leveraged to implement that feature. And the idea is to deploy the changes in waves, right? So what you will do, your wave number one will be deployed first and then wave number two and so on and so forth. You, you might have x number of waves depending upon your use case. So this is how staggered deployments look at, look like at a high level. What you will see here is that your deployment pipeline is doing a source build and first it is deploying to the basic tier tenants, right? As part of wave one.

Now, what do you want to do? You want to monitor that deployment for certain time? You can do that in multiple ways. You can maybe look at your error logs, your CloudWatch logs and see there are no errors for that basic tier and you can do that for certain x amount of time or you can just build some sort of manual workflow, some sort of manual approval process where your business teams can get some feedback from your basic tier tenants and once they are comfortable, they can push these changes to your to your premium tier tenants.

Ok. So I've been kind of talking about you know, the Step Function and how this works. So let's look into the details of the Step Function. And let's look at how this whole workflow is orchestrated in this Step Function workflow.

By the way, for those who don't know what Step Functions are. Step Function is an AWS service which allows you to orchestrate workflows. And there are different tasks within the Step Function.

Different states that you can define. So in our case, the first task is actually a lambda function, right? So what we do is the lambda function itself is looking into those deployment and tier configuration that we define and discussed before. And the output of this lambda function is actually an array of stacks. And this is basically the array that you have received or you have built using that deployment configuration that we, that we have seen before.

And if you just kind of pay attention to the elements of this stacks, stack name, template parameters and wave number, what you will see is these parameters are again coming from those deployments and tier configurations that we built stack name was something that we defined as part of the deployment configuration table. And and that's exactly where it is coming from.

Template l is actually being passed by our code build step. So the code build step is providing us the location of all those build artifacts from the s3 location and your parameters are exactly from your tier configuration. So, so basically depending upon which tier this tenant belongs to, you are now providing those parameters from your tier configuration. And wave number again is something i mentioned if you remember is part of a deployment configuration right now.

Once you have built this array within this lambda function, the next thing you want to do is you want to deploy your waves and you want to deploy that in parallel. So in order to deploy something in parallel in staff functions, we have something called as map state. So what map state does is it will just execute all tasks in parallel. And the idea here would be to first deploy the changes for wave number one.

Now i'm not showing the exact code here. But basically, there's some logic within the map state which basically filters down your your stack array and all and only kind of takes the air elements which are part of wave number one. So assume that in the first iteration within your workflow, you are now deploying first wave.

Now what is happening within this map state is we are using the sdk support and the feature that functions provide for us to call other aw services using sdk and actually using the cloud formation sd to check whether that stack exist or not. And by the way, this is how we were able to leverage a state, the same state machine, the same workflow for both provisioning as well as updates, right? And in your use case, you might have separate workflows. I don't think that's important. What is important is the deployment and tier configuration which is enabling all this workflow.

Now, if this stack exist, then it will update. And that's the use case when a code developer, when a developer is making changes to the code, if your stag does not exist, you are creating that means that's a use case for provisioning in our example.

Now, the next thing that you want to do is you want to wait for your next wave to be deployed. And this is where you build that approval process. Your self function can wait up to one year in this state. So you can actually build a workflow which can wait up to one year. Obviously, that's not the intention here. The intention here is that once you have deployed the changes to your basic tenant, what you want to do is you want to send a message.

In our case, we are using an sqs which is again a managed service that you can use for sending and receiving messages. And, and now what you can do is that you can build a process on top of it, you can subscribe to those messages and you can build a process which can monitor those logs in cloud watch or whatever choice of observable you have and make sure there are no errors in those logs or your business teams can just monitor your environment and make sure that there is no problem. What you have is consistent with what the business was required to what you, what your business asked for you, right?

And what now you have to do is once you have done the due diligence, you send a call back to your staff function. I'll show you the code of how to call back into a staff function. But the idea would be to now move to the next wave which is the premium p tenant in our, in our use case. And now what we will do is we will increment this loop. And again, this is a lambda function that we are using at this stage. It's not shown here, but it's a lambda function which is again incrementing and doing this iteration for us.

So we are using multiple patterns. As you can see in the staff function, we are invoking lambda functions. We are invoking stk a p. We are looping through the map state and trying to achieve this overall workflow that's specific to our use case. And now once you set the next wave number to two, your next wave of tenants will start getting deployed. So this is how you can build what we call a staggered deployments across tenants in our use case, which is, which is basically multiple environments if you think about it.

Alright. So let's look at some of the definition and some of the internals of this staff function workflow. As you can see on the left side, what we have is an example of how you can use the the task to invoke a lambda function from a staff function workflow. And in this case, all i'm doing is i'm providing ok, which lambda function i need to invoke and the staff function workflow will invoke the lambda function. It will do whatever needs to do and then it will produce the output that can be passed to the next step in your in your state machine.

The right side basically is an example of how you can use sdk integration within your staff function workflow. And in this example, as you can see, i am using the aws sdk for cloud formation to create a new stack within your workflow. And also similarly, you can use that for updating your stack as well. This is how we implemented that staggered deployments and that weight task as i was mentioning.

So in a step function, when you use a keyword, what we call it as weight for task token, the staff function knows that i need to wait for a call back to come back. So in this case specifically, if you see we have send message, do wait for task token and we are passing the task token, which the staff function will generate for you to the message body.

So when somebody is subscribing to that staff function, it will receive the message and it will look at the body and see that i have to approve and monitor my basic deployment. And you can obviously send all those parameters as part of your staff functions, workflow. And and now once you have done their due diligence, as i mentioned before, you can do a call back using the simple c command as you can see here or an sdk command and send stars success by using the same task token.

So now you are basically instructing the staff function to move to the next step, which is basically the next wave of our tenants, which is the premium tier in our example.

Ok. So we looked at two concrete examples and we looked at how to automate the sas workflows in a, in a containerized and s environment and in the surve environment. What are our key takeaways from today?

The first i would say is automation is the key to sass agility. If you have per tenant per tier resources does not mean that you compromise agility of a sass environment. So if you're building a sass application deployment model, and if you have this scenario where you have different tiers and tenants, you still should be able to build those deployment pipelines and all those automation to not compromise the agility and not. And basically the the growth of yours solution, which depends on on frequent releases, right?

We just looked at like how your application tiring model drives your provisioning and deployment design. We just looked at like having silo deployments, having dedicated resources for silo customers, having a basic tier pool, how that affects you, how you build your provisioning and your deployment design. Again, in your use case, you might have it slightly differently, you might have a different tier mechanism. But i think the idea is that you should be able to design your work flows based upon your tier model.

Always maintain a single software version for all tiers. If there's one key takeaway, you want to take today, maintain single software version for all tiers, we looked at how you can implement staggered deployments and gradually they will roll out your releases to tenants. But that does not mean that you leave certain tenants 10 version behind. That's an anti pattern. Uh we, we do not recommend to our customers typically a s a based model moving between tiers requires further automation.

So one thing which we do not see today is that you might have more automation use cases, you might want to move your tenants across tiers. We did not talk about the off boarding service. So all these variables can play a role when you have to build more automation work flows in a sass environment.

Here is the links for the repositories that we discussed. The one on the far left which says s gets workshop is basically that s serve s environment that s example which alex shared with us. And the one on the far right is the server as as reference solution which basically have the server as example, which includes system function implementation.

Now we do have a couple of more interesting links that we have shared here serves a workshop and amazon eks a reference solution. So if you are building sass solutions, please do check out these repositories and they will give you really good guidance um around how to automate your sass workflows and just general guidance around tenant isolation. Um and, and basically security and everything has is also present in those reference solutions which we did not touch base today.

Our team is also running more sessions. In fact that bwp 301 is very much relevant to what we kind of discussed today. It's, it's an example from buy with prime team from amazon.com around how they build their operational footprint, how they implemented tenant isolation testing, et cetera. Um but the rest of these sessions might be relevant for you if you're building s a solutions on aws.

And we also have more chalk talks and workshops. In fact, sas 403 is something that we are operating on thursday. It's a workshop that we build, that talks about how to build tenant aware dashboards, how to build observable in your a application. Um it talks about how to build those, the mechanisms and, and limit your tenants certain actions and talks about various details like 10 isolation testing, et cetera.

I mean, the easiest way is to just filter by sas in your session catalog and you'll get all those. So thank you so much. That's all we had for today. Um we'll be here for a few minutes if you have any more questions. Thank you so much for joining us today.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SaaS DevOps deep dive: Automating multi-tenant deployments

all right. well, welcome everybody.um thanks a lot for joining our session today. so today we are talking about sas and dev ops and more specifically, we are focusing on the automation best practices in a software as a service for sas based model.as you ca
复制链接

扫一扫

SaaS DevOps deep dive: Automating multi-tenant deployments

“相关推荐”对你有帮助么？