Automate everything: Options and best practices

Aaron: My name is Aaron Lima. I'm a Principal Solutions Architect from New Jersey. I'm here presenting with Emily Arnao, who's a Principal Solutions Architect out of Melbourne Australia. And then a little bit later, we're gonna hear from Ashiesh Wadhwa who's the Director of Cloud Engineering at SunLife.

In terms of the agenda, we're going to look at an example from the past and see what we can learn from that example. From that example, we're gonna learn why automation matters. Then we're gonna look at a day in the life of a cloud admin and then we're gonna revisit that day in the life when they've implemented some AWS services to automate tasks within their organization.

We'll take a listen to SunLife about their journey on the cloud and how they built automation into their processes. And then finally, we'll have a summary of next steps.

I think we can all agree that the invention of the car has changed the world. However, when it first came out, it was a luxury item that only a few could afford.

Henry Ford had a goal of making the car affordable to every family. On a visit to a Chicago meat packing plant, he noticed how meat flowed conveniently on a conveyor belt and how labor was used efficiently to create affordable meat for families.

So he started to think maybe this could be applied to manufacturing cars. Ford thought through the workflow of producing a car and he was able to implement the automotive assembly line. By 1924 the price of the Ford Model T dropped to $260 and it made it attainable to most American families, achieving Ford's business goals.

The key part was it also gave Ford a competitive advantage as the automotive industry headed into the Great Depression. In 1919, there were some 2000 American motor companies. And by 1940 there were only three companies that accounted for 90% of all US car sales - Chrysler, GM, and of course Ford. Each of those companies had adopted the assembly line and automation which allowed them to thrive during the headwinds of their time.

Ford's employee staff went from 14,000 employees to 52,000, so there was a lot more employment. And their wages doubled per day. So automation impacted the employees as well and gave them different career opportunities moving forward.

The past can inform the future and help us set a vision for our organizations. Automation changed an industry and allowed those three car manufacturers who embraced it to weather one of the worst financial crises in the Great Depression. And it changed the work life of employees and gave them new opportunities.

So let's fast forward to today - why we're all here and how this applies to our industry of information technology. Despite being in an industry leading with automation and AI/ML, there are still a lot of manual processes in many organizations.

We have to ask ourselves, 100 years after Ford moved forward with automation to improve his business, how are we doing with automation to truly thrive in ours?

According to one report, 94% of knowledge workers said they spend some portion of their day performing repetitive, time consuming tasks. And according to a Deloitte report, 80% of developers focus on operations and maintenance rather than innovating.

I was speaking to a customer a couple months ago about getting visibility in their AWS estate. When we talked about services to turn on, their immediate reaction was “Wow, I have to go to each account manually and turn it on.” When I asked about their automation strategy, they didn't have a plan.

Between the pandemic, the Great Resignation, and indicators of an economic recession, there’s no better time to start thinking about automation and how it can impact your organization and your career.

Why should we automate it all? Let's look at some statistics from a 2021 State of DevOps report. In this report, organizations were classified into four categories, but we'll focus on the "elite" - those with a high degree of automation.

Elite performers have a 3x lower change failure rate - they can make changes without disruption. They can go from code to production in under an hour - security checks, QA, deployments are automated. They are more agile and secure. And they can recover services within an hour - so less nights and weekends recovering from incidents.

Elite performers create a culture around three things:

  1. Everything is codified - high visibility into their environment so they can respond automatically.

  2. Automated deployments - code to production in under an hour

  3. Fast automated rollbacks - less than an hour to recover

This is about impacting your work-life balance. Automation means less nights and weekends working.

The good news is adopting automation now is easier than ever thanks to cloud APIs and services. Automation at scale is available to all organizations regardless of size or skills.

Looking back 10 years ago, making a change in production was a big deal - months of planning, weeks if you were lucky. Even provisioning a VM took 90 days. The automation was harder with the technology then.

Back then, changes involved detailed plans, guides, installation instructions, lots of clicking through and documenting. We had some scripts, but they were fragile. And it was near impossible to have good visibility into the environment.

So working on these changes meant giving up nights and weekends. Lots of manual effort in planning, documenting, executing, communicating. And it was harder to recover when things went wrong.

Emily: So that was typical before the cloud. With lots of manual steps, being asked to do a production change was stressful. All the planning, documenting, executing, communicating - a lot of work. Giving up nights and weekends for changes was typical. And it was harder to recover when things went wrong.

So today, if you're using the cloud, this means that, you know, you don't actually have to be involved in all of these manual processes. Software infrastructure changes are a lot easier to plan for, to execute and to repeat if you need.

So for me, this is why I'm at AWS. When I saw how things could be done differently, I was like, yes, sign me up, I want some more of that infrastructure as code. I want managed services to help me manage provisioning, scaling and more.

And so most customers when moving to the cloud, you know, they are taking advantage of infrastructure as code, they are using infrastructure code to provision scalable cloud resources, but it's not just about infrastructure as code, there's so much more that you can do to make your life at work easier.

So think about it, what else can you do to allow more focus on creating business outcomes and value and less time on maintaining or recovering systems?

So today, Aaron and I actually still experience customer situations like this. We have customers who are using cloud but still have a lot of these traditional IT processes that have been brought over. So those organizations are gaining some of the cloud benefits like flexibility and elasticity, but these operations tasks many times they're still manual and we were discussing and we were like, why is that a case?

Is it true that organizations just don't want to be elite performers because it's actually really accessible for you all? I don't know, all we could think of is, you know, maybe some other people aren't as excited about weekends and holidays as Aaron and I are, we're always looking to try to optimize and automate.

But even if that's not you, as we heard earlier with that growing need for IT skills, it's not the time to settle for suboptimal levels of automation. Just think about it for every hour that you spend converting some manual effort into automation, that's an hour that you can spend on creating or improving something else.

Aaron mentioned that State of DevOps report. So we talked about the metrics that elite performers achieve. Let's focus on two. So we've got that change failure rate and that time to restore service.

When you're combining cloud services and DevOps processes, this is great. You can make a lot of changes more rapidly. So it means you can deliver things faster. But every time you make a change, there's that chance that sometimes things go wrong, right?

So it's actually really important to be able to recover quickly from that failed change that's critical. Now, we heard before everything is code. If you've got all your steps documented in code, if you can find, what went wrong, what went wrong easily with telemetry, this allows you to quickly recover.

So with all of this making that change is a lot less stressful, a lot less nail biting and a lot less likely to ruin your Saturday night, Aaron, you were telling me about a great recovery example here. Can you share that?

Yes, I've been here at AWS for seven years. My first year, I was a TAM and I covered a customer who had a fully automated process. They had the ability to tear down and, and build up their environments and there was an S3 outage that occurred and I called them up and I'm saying, hey, there's, and they told me it, we already know we've already moved to another region because everything was fully automated in their case and I was extremely taken back and I recently shared this with another customer recently and the customer said, oh, they must have been all serverless. They weren't, they were using servers, they were using Redshift, they were replicating the data to the other region. But because they had that ability to spin up and spin down with automation, it was easy for them to also recover.

Let's see. So if you're listening and you think, oh, we're a small team. I don't think we really need it. You might think, hey, we only run a few workloads. Maybe we can manage to get by with, you know, less automation for a while.

But you know, you actually don't need a large team. You don't need to be running 100 workloads on cloud to get all the benefits from using more automation.

So our goal, all three of us actually today is to help you realize that, you know, whether you've got two engineers or 2000, we want to show you that looking for opportunities to automate, this is gonna make software delivery more reliable, more visible, really important when you're communicating to all your stakeholders and frankly easier, right?

So this is gonna be vital for all organizations to be able to thrive and to be successful into the future. So let's illustrate this a little bit with a fictional customer. Her name is Mary Major.

We're just gonna go through a little bit of a week in the life of Mary. So she leads a cloud platform team, maybe some of you in the audience are also leading a cloud platform team. And her job is to provision and manage AWS accounts and set and govern cloud standards for her organization.

They've got AWS for their website CRM systems. They're running some analytics and some ML workloads and some other things. They've got infrastructure as code. So they do have some automation but it's Monday, right? And Mary's come from a meeting and the business says, hey, we've got this new initiative and in that meeting they all agreed. Yes, this work, these workloads need to be isolated.

So that means two new accounts at least, right? Prod and non prod. Mary's team manage a number of AWS accounts and every time they get a request for a new account, that means pulling together their account configuration checklist, confirming the controls, confirming the permissions that they need to apply, they need to configure logging, monitoring integration to third party tools and also apply some policies on the usage of AWS services.

It's a big checklist they're running through now, Tuesday, her team are also trying to maintain visibility of what's happening in all of their environments. Being asked to add these two new accounts and go through all these checklists. This, this takes a bit of a time.

So unfortunately, whilst working through that manual account creation checklist. The engineer got called away. Someone said, hey, actually, we're just interested in the compliance, report of all the existing accounts.

So he went away and spent time running that manual report across all of their accounts came back and he'd already handed over the accounts, those new accounts. And then he realized later in the evening, wait a minute, I forgot to include those new accounts.

So the engineer ran through those compliance checklist again and realized that they had actually missed some of the controls. And this would potentially allow some malicious actors to access their environment, miss some controls for detecting public S3 buckets miss some controls for ports, miss some other controls on security groups.

So in this case, the engineer stayed after hours enabled those security controls and updated the team. But this, this is like a whole day just spent doing this.

So Wednesday Mary's team has to hassle one of their workload teams CRM team, they use a commercial off the shelf product, a COTS product, this runs on a number of EC2 instances. Now every time there is an operating system update as part of the shared responsibility model, her team has to make sure that all of the different workloads update their EC2 instances.

Typically she'll send out some email reminder to that team and hassle them. Hey, patch time update and then regularly they'll run some periodic reports to see. What's the patch compliance?

Now, an OS update has been ready for a while. CRM team is behind that patch schedule. Again, the engineer had been blocking out Wednesday to do patching. But the business said, hey, can you help with some functional changes instead to the environment? And they didn't get around to patching?

So, if the engineer doesn't want that CRM system to be behind yet again, on patch compliance, maybe they're gonna have to stay back that evening.

Aaron, this was a sore point for you. Patch compliance.

Yeah, patch compliance is a sore point for me because early in my career, I had one job where that's all I did was just to make sure that everything was patched and report on patching. And it wasn't an extremely exciting job until I, you know, recommended to my employer. Can we, can we automate this? Can I just automate this? So I don't have to do this all the time.

And then that was a fun process. And, you know, a boost in my career in terms of next steps. But yeah, I wasn't doing that same thing day in day out, manually patching, correct.

So, let's get to Thursday, right. There's another COTS workload that needs attention. So the team has raised a ticket to Mary's team and they need an engineer to help update the software that's, that's running for that COTS product.

So these requests come in quite frequently vendor has a software update and they need a way to remote log on to that environment to apply some configuration changes or maybe run a script.

So Mary's team are maintaining bastion hosts on EC2 for all of the workloads that need that remote logon, that means that all of those operating system patches also need to be applied to those bastion hosts. So this is just more infrastructure they're maintaining.

Now, she's got these requests for these new accounts, getting a lot of new things in and just thinking, hang on, how many tickets do I have that just relate to patching, remote, log on and so on. She's starting to feel a little bit tired about by this tedious work.

Gets to Friday, Fatima. It's almost the weekend we're almost done. Hey, new requests come in. A lot of teams want to use cloud and this time there's a team that actually wants to start running some workloads for one of the international subsidiaries in another region.

So that means that Mary is going to have to enable a new region and new accounts and extend governance there during that meeting. This same lead also says, remember we're also close to closing an acquisition on another smaller company and they actually already use AWS.

So we need to bring all of their workloads under our governance as well. By this stage, Mary is realizing that some more automation is gonna give her some greater visibility and control. She knows her team can get the work done.

They've got the right skills. But with all of those current manual processes, she's really not that confident to say, "Yeah, yeah, we can do that. No problem." Instead, she says "I'm gonna need some time to estimate this." This is effectively making the business wait. This is just not a good position to be in.

So a team had a long week and what have they been doing? Right? Just keeping their heads above water with business as usual activities. Um Mary and a team, they want to help, they want to be involved in creating awesome outcomes for the business. But all they're doing is just operational tasks manually, right?

So they could decide to work the weekend with work to try and get ahead. But you can see, hey, this is just something that's gonna keep happening. So let's take a look at a framework that Mary and her team can use to help add some more automation and hopefully make their week at work and they're weekend a whole lot more enjoyable and rewarding.

Aaron, can you share? So if you've attended another COP session this week, you might have seen this slide. This slide represents the Cloud Ops journey or the Cloud Ops operations model. And so it requires three basically steps or pillars.

The first is set up. So this is where you're setting your cloud foundation down. Um where you have all the governance and compliance built in from the start. And then once you have all those in uh you have the confidence that you can migrate workloads in a secure and compliant manner into AWS. And then finally, the third is once it's existing inside of the cloud, then you're continuing to operate that through application, monitoring, enhancements, performance and then detecting and remediating compliance. I mean, doing that in an automated fashion and why this CO Ops model.

Well, this is based upon again, another data driven uh thing from AWS, an independent research of 1500 AWS customers um where there's a return of investment of 241% over three years, a staff productivity increase of 62%. Um and then um the more all important carbon savings of 88% coming into the cloud. And if you took a look at a bunch of the keynotes, you know that that's something that we're continuing to work on here at AWS.

So again, to bring us back to what we're talking about in our session, it comes down to having a culture, building, a culture from the top down around three things, right? Everything should be codified, having that visibility into the security and performance of your environment and your workloads and then automating and responding not only on a provisioning standpoint, but also responding to the incidents that are happening inside of your organization.

So let's talk about this in practice. And let's take a look through Mary's week again. But this time, she's implemented some AWS services and some of the things that we would recommend for AWS customers to do. And so we start with Monday.

Now, imagine as we heard, Mary had some infrastructures cloud built out. But now Mary's team decided to take that infrastructures cloud and start implementing it in Control Tower. And they've set up the account factory in Control Tower as well along with some customizations.

So now they can spin up an account. And when they spin up that account, a Security Hub is enabled, Config is enabled, CloudTrail is enabled all out the gate automation um from uh Control Tower. They all have IAM roles that they create for necessary automation processes inside of their account and their VPC security groups, all of that is created on the fly.

So not because they have that automation in place when the business comes to Mary instead of opening up a request, she has enabled self service. So he allows the business to create those accounts without having to interact with Mary's team. And she knows that the full set of governance controls and guard rails are going to land in that account.

Now, let's take a look at what this might look like in practice with a real quick demo. Can you walk us through this demo, Emily?

Sure. So here I'm in Control Tower and you can see a few of the actions, lots of different things, few updates this week for Control Tower. So I highly recommend you, check it out. Uh so you can see here you can apply your uh controls as well. And uh in your, in your landing zone, you can easily specify the regions that you're operating in. So you can hear see here the ones that are governed and non governed. So Mary could make use of that.

You can easily see your entire organization and all of your accounts and you can easily create using account factory as Aaron mentioned a brand new account. So there's examples that we heard from Mary, this is an easy way to solve all of that. And now also with Control Tower having APIs, you can also programmatically uh do that uh via outside the console as well.

Now we come on to Tuesday. So now Control Tower has been implemented and as we had mentioned previously, it's enabling AWS Config and AWS CloudTrail in every single account inside of the organization. Now this built in is again, as we mentioned, is automation out of the box, but as part of that account vending mechanism, but now also in Control Tower, you can do this um out of the box.

She's enabling Security Hub and GuardDuty. Now, GuardDuty helps with this automated advanced threat detection and response on AWS. So if things happen and they become a security finding inside of an account, you can have an automatic remediation to either shut down a security uh ec2 instance or change a security group.

Now, when you're managing multiple AWS accounts, having this in essential account um is important and that's what Mary has done. So in this essential account, she's given access to her security team um so that they can see what's happening and they don't have to say, hey, Mary, what's, what's going on in the account? Give me a report, they have access to that information.

So by automatically enabling these services every single account in the multi-account environment um through this mechanism in Control Tower, they have the high visibility into their AWS estate.

So on Tuesday, Mary just goes and checks to make sure that that account that was created on Monday by the business has all the security control security posture and now she can report and show the business that it is a compliant account according to the guard rails that have been created.

Now in building these detective controls, she knows that these accounts are secure. We'll look at this in practice.

Go ahead. Sorry, i jumped out. That's all right.

So uh in in Control Tower, Mary is able to provide access to the security team. They can log into that audit account and here they're going to go to Security Hub. Now, it wouldn't be fun if I showed you Security Hub with everything looking green. So as part of this demo, we played around and we deployed a whole bunch of uh resources that were non compliant to the standards to see that if this were the case, Mary would easily be able to see, hey, what are these best practices that have been applied? And can i see exactly for all of the different accounts in my environment? Uh what is their compliance like to standards which of those resources that are most out of standard? Um and you can see here just some examples of the kind of security controls that you can easily apply have that in one place again for all of the accounts and not have someone in your team doing manual checks to try and figure out, you know what's compliant and what environment you can easily filter and you can also build automations and integration to uh to actually improve and uh remediate some of these findings as well.

And the point that i forgot to mention the other slide is that Mary now knows that there's no open S3 public buckets or any kind of open security groups.

So now let's advance on to Wednesday. So on Wednesday, we were talking about patch management, my favorite topic. Um so Mary's team are especially grateful that they've looked at AWS Systems Manager and that they've adopted it. And Systems Manager is a superstar for operations, automation and management of ec2 instances inside of AWS.

Um and there's other things that it has also where it can automate things inside of an AWS account. But before using Systems Manager, they had an infrastructure just to manage and maintain patch management. So now they've eliminated that infrastructure, having to manage that infrastructure of scale inside of AWS. And so it's much easier for them and they've eliminated cost.

And so because they were able to uh eliminate all that, but the way that they did that is they created a state manager association in every single account as part of their bootstrapping process, that scans accounts on a weekly basis to see how it meets to their patch baseline. And then Mary has created a maintenance window that goes off every Saturday that remediates any of the non compliant ec2 instances inside of her state.

Now because Security Hub is also on those non compliant instances or compliances and patch management automatically get reported into Security Hub as findings. And so then the security team who's looking at that has a good understanding of what's compliant, what's not compliant. And they can only um bother Mary if there's any non compliant resources.

But what Mary has done is that she's also automated, the sending out of that report to management to show that she's compliant. So there's a way in Patch Manager where you can export a report and then you can shoot that report so that you can show people here's a set of ec2 instances across my estate that are compliant to my patch baseline.

And let's take a look at what this might look like in practice. Uh so here we're taking a look at Systems Manager, which is the operational cockpit uh inside of AWS for managing your ec2 instances. And i should click one more time to make sure that the video plays.

And what we see is that Mary has gone into Quick Setup. Now in Quick Setup, there's the ability to create a configuration across your AWS organization. So we see here, they've created a quick set up across the organization that has set up the foundational uh components for Systems Manager to operate inside of these accounts. And Mary can take a look at what type of, of accounts have been configured if they have any failed configuration or not.

And then she goes back to the console and now we're going to take a look at Explorer. Now in this environment, what we have is that the Explorer is aggregating the data across all of the organization. So we've created what's called a data sync. Um that's an organizational data sync, that's gonna go into each one of these accounts and collect what's how many ec2 instances we have, what software has been installed in those ec2 instances.

So we can see that we have 34 instances. Now, when you click in these 34 instances, the key here is that these instances are not in just one region. So we see that they're in multiple regions that are being managed and governed by marriage team and they're also in different AWS accounts.

Right. So this is across the estate getting a holistic view of what's going on. Now, Mary is going to go back and we're going to take a look at how we're doing against patch compliance. So she's going to go back to the Explorer dashboard. She's going to go down and take a look at the patch, non compliant instances for patching. And we noticed that there's three instances that have been non compliant in the past 15 days.

Mary's going to click on that and she's going to find out whose accounts those instances are in. And so she can identify which accounts which regions are non compliant pretty quickly and easily. And so this is everything that has been set up. And so it's automation that she has just using Systems Manager and Patch Manager.

Now, if we go on to Thursday, uh Mary's team are again thankful that they started using Systems Manager. So they just not only eliminated their patch infrastructure but all those bastion hosts that they had previously, they got rid of those too. And so now the CIO is extremely happy with their team because there's a lot of cost that's been eliminated inside of the organization.

And what they've started to use is Session Manager. And the other thing with Session Manager is now they've used this as a mechanism to give people access, both SSH and RDP to the EC2 instances in their organization. Mary's not only made the CIO happy but she's also made all the security folks happy because now we're shutting down uh port 22 and port 3389. And they've also instituted policies where Mary can restrict what set of commands are actually run on these boxes using a Session Manager policy.

Mary's also set up configuration where all the commands, all the things that are happening inside of these EC2 instances are being dumped to an S3 bucket so that they can take a look for forensic analysis to see if anything's happened. So the because this has all been created, Mary has also been able to give the line of business the ability to run commands against the EC2 instances that they've provisioned.

So in Service Catalog, they checked out a product and then they were able to go to a run book and while they're working with the coats vendor on site, they were able to do that update because Run Command in an automation, run book, took a snapshot first and then applied changes so they can revert back if they need to without any complication for marry.

So let's just take a look at what the Session Manager uh experience inside of Systems Manager might look like in practice. So here, um we're in the Systems Manager console yet again and we're going to go to Fleet Manager and Fleet Manager is basically where you can do a lot of the things inside of Systems Manager. Uh in this case, we're going to click on a Linux instance and we're going to start a terminal.

And once we start that terminal, we can see that i'm right on the command line and I can do an l ss command. But again, uh Mary can put a Session Manager policy that prohibits me from doing an ls command and discovering what's going on in the system.

Now, we terminate state, all of that's now been logged through CloudTrail. And so if the security team needs to see what happened when somebody logged in, they have that visibility as well.

We're also gonna go ahead and Fleet Manager and go into uh RDP into a Windows distance. So we're gonna put the administrator name, which is not something I would recommend, but in this case, in a demo environment, logging as administrator putting in the passport and we're going to connect. And so as all things, Windows, it's gonna take a little bit. Um and then we're gonna see a window screen in just a second here on the console. And if you, if Mary, if you want to, you can expand it and do whatever administrative things that you would need to do in your fleet of c two instances.

So now we've worked through Monday through Thursday and now we're here on Friday and Friday. Uh Mary's team is able to answer a confident yes and yes to the additional requirements that the business has asked her. So now she can expand with Control Tower to another region and expand those governance and control guard rails that she has. And she can also uh bring in the additional acquisition account and apply the governance controls uh inside of her organization to that acquisition account.

And so because she was able to say yes and yes and not really do much work on Friday. they also her team takes the advantage of that time because everything's been automated to start thinking about additional automation in their toolkit to help help the business build uh or better manage change in their expanded environment.

Now, one of the pillars of the AWS Well Architected Framework is the Operational Excellence pillar and one of the recommendations in the is to dedicate some time and resources for continuous incremental improvement to evolve the effectiveness and efficiency of your operation. So now that Mary's team has automated many of their manual processes, they no longer are running behind on meeting the current requirements. So now they're taking a look at how they can evolve and improve their operations.

So some of the other things that Mary team is thinking about is they've only used Systems Manager, Patch Management, as well as Inventory and Session Manager. So they start to take a look at maybe we can implement Incident Manager to manage our incidents or Change Manager to automate uh change management inside of our organizations with integration with our partners such as Service.

Now, they also think about using Secrets Manager because they want to get more secure. And there's a lot of sharing of passwords and plain text with developers and, and teams. So they decide, let's take a look at Secrets Manager and how we can vault this and again, get the benefits of CloudTrail integration to see who is uh taking secrets out and using them um in any kind of fashion.

They also consider it uh doing a workshop or running a uh kind of workshop with other or teams inside of their organization to tell them about the benefits of automation, how to use these API s inside of AWS so that those teams can also be successful in what they're doing. And really at the end of Friday, they're not thinking about work on the weekend.

So Mary is able to enjoy uh her pets, her family going out to dinner in the city that she's in um playing video games or binge watching on Netflix and catching up on shows. But now let's talk to a real life customer. Um and so I want to welcome Ashish WWA from Sun Life to talk about their journey in automation inside of Sun Life.

Ok. Thank you. So I am a real life married major and honestly, we didn't give them the use case when they were preparing the slides. But when I went through those slides, I actually thought that this is what we are. And eventually like if we have managers on the in the audience as well, right? Like they should be going through the same problem that we go through in our organization.

But before I go into a use case on how we have developed cloud at SunLife, I would quickly go through who we are at Sunlife. So we are an insurance provider and we have been rated as top 100 asset managers in the world. We are in a business of taking care of people. So we have more than 30,000 advisors and distributors and partners across the world and the biggest challenge that we have, which is good for business. But from my perspective, as an engineering lead is that we are across the world and it's not a small setup.

Our company is set up usually in US, usually in Asia as well as in Canada. But setting up cloud in Sun Life is not an easy journey because in my personal opinion, it's not difficult to set up cloud. What is difficult is to set up a cloud in financial institution. When you regulate it, you need to be compliant and you need to make sure all your guardians and controls are in the in place. Because if I consider ourselves as Sunlife, we are the data guardians and we are the data caretakers of our consumers and how do we implement that and make sure that we are maintaining the pace of cloud within the organization is quite important.

We have a cloud first strategy. This is a note from our CIO who clearly mentions that she really wants us to be very innovative and bringing cloud solution faster to our consumers. So that's the biggest buy in we got. So our, our CIO completely agrees to this that there should be cloud first and we need to motivate every application within some life to move towards cloud.

We also are very focused towards being the digital leader in our financial industry. And we really want to provide bigger solutions to our consumer faster and uh in in a more secure manner, how i relate more with what aaron and emily says is because as i said, we are global cus we have a global customer base and as a senior management is more focused towards going on cloud, we really want to make sure that our a developer sitting in hong kong and a developer sitting in us should have the right guard rails available. But the challenge is now we are telling the business that guys you need to go on cloud and you need to prepare a migration plan that this is your application looks like that's how you have to move to cloud. But how do we make that pace so that we are available for those businesses because at the end of the day, i don't want to be on that receiving and where we are not ready from the cloud platform perspective.

So our biggest challenge is how cloud adoption can be taken place across the globe. And the second thing is we really want to scale up the platforms, not the cloud teams. That's the. So we don't want to create like admin pods in asia, admin pods in us or canada, we really want to be centric and everything should be good.

I'm just gonna give you a quick oversight on how we have dealt with within the cloud space and sunlife. And we have a centric team in canada that manages this. So we really wanted to provide our developers autonomy and we really wanted to get away from their path. We don't want it to be a dependency on a developer to be waiting for us to deliver resources to them, provisioning an s3 bucket for them or building sqsqs or whatever.

So we really wanted to make sure that our developers sitting in hong kong, singapore or like canada or us, they can self deploy those services. That was the biggest challenge we had.

Second thing was whenever they're deploying those services, are we actually providing them the security controls because we really want them to be focused more on building applications rather than being thinking about, oh, is this secure to use this over here or what not? And whether it is compliant to the directives that the co organization provides.

Third thing is building more and more cloud automation and i'm going to talk more about this. So in starting 2022 we we, we came through AWS workload account strategy. So we were a shared account shop in which we used to have shared account, shared resources, each store shared by multiple roles, each role using multiple resources. So we have to change or delete a role that can break the other resource or other account. So we were, we were quite messed up at that stage.

So we went into account, workload account state. Second thing we built up in 2022 was the Control Tower. So what Aaron and Emily were saying Control Tower has been the biggest win for us in 2022. Why? Because we don't have to be bothered enough to enable an account in hong kong and enable an account in us, as i always say, because all accounts of provision to Control Tower, they have the minimum guardrails and as per the C os objectives or directives, we actually put them into that uh provisioning environment

And the third thing is cloud automation. This diagram that you see on the screen is how we were set up in 2021 - all resources shared, IAM permissions shared. So if I have to break an IAM permission, I may rather break a couple of codes and we are still struggling with that. We get a lot of Sora alerts. We got a lot of uh uh GuardDuty alerts where we are deprovisioning access or deprovisioning roles with those resources are actually connected to somebody else.

So we went into the workload account approach. So what are the workload account approaches? Each account that you enable is attached to a cost center. So if you are a developer, you get your workload account and we'll give you the the CloudFormation templates and Terraform module libraries through which you can deploy your resources by itself. So and your cost center is attached to it. So you don't need to worry about uh what my cost is and I don't have the visibility to the cost because in the shared account model, each resource is shared. So you don't understand what cost is being acquired by what resource and what account is using that right now.

If John is my developer sitting in Hong Kong, they can get a workload account, they can attach a cost center to it and they can deploy whatever resources our team is building through Terraform modules and those modules come with IAM roles and policies. So you don't have to worry about. Oh, I need to have contact an admin sitting in India to deploy a service. for me. It is available readily available through an account and you can just double click on the modules and deploy the services.

So that's where we got away from. Uh and on the, we were, we were a blocker earlier but that's where we are getting away. I won't say that we have 100% there on Terraform module or CloudFormation templates on deploying those services. But it's a journey - AWS releases a lot of services and we are still working and testing it with the consumers. So that's the first problem we sorted out. We gave the developers autonomy to manage their own accounts, manage their own cost centers. And we sitting in Canada don't have to deal with any admin kind of issues with them. They manage, they have their views, they understand their cost, they deploy their resources, they spin up, they spin down by themselves.

Sorry. Next problem was how do we enable these environments and how do we make sure the code is uniform and compliant? So in 2021 on a bad day, an engineer can misconfigure an environment and there could be a breach or there could be an incident that come up and our power engineers are ok. So we need to set up another engineering team in Asia to set that up.

We came across the Control Tower, we were on Landing Zone shop and we turned from a Landing Zone to Control Tower through Control Tower. We have implemented those minimal guard rails that get provisioned when an account is created. So I'm not worried about about whenever an account is being created in any location across the globe, it should comply, it should be compliant. Because from our engineering perspective, we make sure all those guardrails are implemented through those accounts through Control Tower.

And the best part with the Control Tower is that our cloud operations team can send, manage and see the footprint of cloud across SunLife because they don't need to set up their individual pods in those zones. You can attach your monitoring or oversight or observable tools with Control Tower and you can have complete access to what account is being provisioned in Australia or like Hong Kong or US sitting at one location.

So as I said, we, our goal is not to scale up the platform. Our goal is sorry, our goal is not to scale up the cloud teams. Our goal is to scale up the platform. So Control Tower has actually helped us a lot on this front.

So these are kind of some kind of controls that we implement through Control Tower. I can go in details about this. I really don't want to because this is these are majorly like AWS best practices and it all depends on the cloud CISO office of your organization, what kind of directives they want to push in. So whenever a cloud engineer on my team works on building these controls and uh guardrails, they actually go through AWS best practices as well as we have our cloud security directives in the organization which we incorporate whenever we are building guardrails within a Control Tower.

So where are we right now? So SunLife is backed by Control Tower right now. So there's one Control Tower through which a developer, as I as I have repeatedly said, sitting in any location can go through a storefront, enable a workload account with them. The workload account comes attached to a cost center, each workload account, you can deploy those services that are being built through Terraform modules or CloudFormation templates. And we don't have to worry about whether those accounts are compliant or not. They are already make our engineers make sure that those minimal level of standards and controls are big and we don't and they're on their own. There can be cases where they want custom built controls, but that is that can be built over that. But Control Tower ensures that your all accounts are uniform. All the controls that you want to enable in all regions are maintained through uh Control Tower pipelines that have been that has been our Sun Life journey.

So the biggest challenge that I we we were facing was global global enablement of cloud platform across. And that's where we accomplished it through our workload account strategy as I said and going back to the Control Tower, this has helped us enable those accounts, maintain the uniformity across those accounts across the over to you Emily.

Thank you.

  • 9
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值