Building a comprehensive security solution with AWS security services

最新推荐文章于 2024-09-11 13:51:21 发布

李白的朋友王维

最新推荐文章于 2024-09-11 13:51:21 发布

阅读量135

点赞数

文章标签： aws 亚马逊云科技科技人工智能 re:Invent 2023 生成式AI 云服务

本文链接：https://blog.csdn.net/just2gooo/article/details/134833245

版权

so, hello everyone. it's the third day of re invent. is everyone having fun? great.

so we've got an exciting topic to discuss with you today - building a comprehensive security solution with aws security services. we've got an equally compelling guest speaker from d bs bank.

uh by way of introductions, my name is mun hossain. i'm a principal focused on aws security, the purview of security products that i look after are network firewall web application, firewall d os protection. been with aws for about a year and a half. been in cybersecurity for over two decades. prior to this, i was managing teams in product management doing, you know, web gateways, email, gateways xdr and so on and so forth

with me today. on the aws side, i have michael.

hey, everybody. my name is michael white. uh i'm a senior uh security s a here at a bs. i'm focused, especially on some of the services that moon just talked about and we're gonna talk about in our presentation today. i spend most of my time at a bs talking to some of our larger customers around how to use these network security services and how to get value out of them.

great.

hello everyone. i'm abira subramanian. i lead the security engineering for public cloud in d bs bank. uh i lead a team who creates the controls for secure adoption of cloud services. i've been in the public cloud space for seven years and in the security space for 13 years.

thank you, ari.

so from an agenda perspective, there will be three discrete areas in the first area. we'll be talking about the problem statement. this is dealing with the siloed approach that organizations typically use to deal with multi vector threats. and we'll talk about some of the best practices around that

in the second session. i will do some role playing with michael where i will be the bad actor, michael will be the defender of company x. he's a sec ops manager. so i'm just going to levy a bunch of attacks in his direction and he's going to incrementally thwart those attacks with different types of services that he has in his quiver.

we will then conclude that second part by michael and i talking about the overall solution, the solution in terms of how you can look at threats in an automated manner across the different components of the multi vector threat, as well as remediation actions that you can perform at different points of the network based on um efficiency.

um and then we will turn it over to abira at d bs bank to talk about real world examples of all of these things in action. and it's really telling when you hear a real world customer talk about these things, we can learn from the challenges that they face and ways that they mitigated the challenges.

so with that, um just talking about the problem statement in general, um there are lots of stats out there where we see that 77% of the attacks based on the new star report are based on multi vector attacks. what do i mean by multi vector attacks? this is when an attacker can use two or more tools to levy attacks at different networking and application resources.

now, the problem with this is that organizations tend to throw lots of resources, lots of body, lots of security tools at the problem. we have an ibm report that indicates that in enterprise locations, enterprise customers typically use north of 45 different vendors for security controls to protect from these attacks. and that's a staggering number.

um the challenge is that when you throw more resources, when you throw more attack tools at this, it sort of compounds the problem. it's diminishing returns. after some point.

we also have an another interesting stat from the same report that ibm published where in terms of containment of these attacks, it takes a whopping 322 days to actually contain an attack.

so we'll explain a little on this slide why we have that diminishing returns effect. in terms of throwing different types of attack, tools and defenses at this.

so if you look at this slide, typically you have attacks that start off with reconnaissance activity where you're rattling the doorknobs followed by backdoor access, where the attacker can actually gain access to certain resources within the protected environment, followed by the actual initial vector of the kill chain, which is the infection with malware.

so you can use root kits and things like that to initially affect the systems. and then the attacker will want to open up command and control channels where they, whereby they can get unfettered access to resources where they can actually tweak the malware to do different types of behavior based on the responses that they're seeing followed by lateral infection where you have exponential sort of expanding the blast radius and you can infect different types of systems, networking services and applications laterally in an east west direction.

now, the challenge with this is that when you layer on the fact that you have a plethora of different types of tools to detect the attacks and then you compound that with mitigation techniques, it becomes hairy and messy very quickly.

so let's just look at just the middle part. if you were dealing with a single vector threat that dealt with just the infection of malware, you could potentially detect it with waf and protect it and contain it with sand boxing, which should be all well and good because you're just dealing with that sliver.

but when you're dealing with the entire gamut of the attack, there is no silver bullet. and then the other thing to point out is that when you're dealing with all of these different solutions, you're dealing with different people administering these solutions, you've got different management consoles to push policy, different monitoring consoles to correlate alerts different api s that you're dealing with.

so now you can see what i meant by diminishing returns, the more security tools you throw at it, the more complex the issue becomes.

so now, rather than inundating you with slides, we're going to have this fun session where i will play a bad actor and then i will try to launch different types of attacks at my security operations, admin on the other end with company x.

and the first thing that we'll start off with is um a fairly unsophisticated attack which is related to reconnaissance where, you know, i can do ping sweeps and port scans. so the typical type of tools that i would use in this case would be something like, you know, n map or mass mass scan.

um so the first thing that i want to do is i want to understand what live hosts are out there. so i've got a quiver of attack tools i want to target at the live host. so i need to figure out who the live hosts are. so this will be a case where i will send out a bunch of ic mp echo requests, get a bunch of ic mp echo replies. and that will tell me that certain hosts are live.

the next thing that i want to do is understand what services and what ports are open. so this is where i can map the different types of applications that typically use certain ports to sort of refine where i actually perform the attack.

so in this case, what i'm gonna do is send udp and tcp packets to the different ports and then i'll get a response of this port is open, this port is closed or this port is filtered.

so now i've done a couple of attacks and you know, these attacks just aren't working for me for some reason, my ip s are getting blocked for this reconnaissance activity which indicates to me that in addition to security groups and knack with basic protection company x is probably using something like a firewall or an ip s.

so thankfully here on the defender side, i'm using something called ad bs network firewall. and what this is is this is a managed firewall service from ad bs. it allows me to deploy firewall endpoints anywhere into my environment where i want to inspect and control that traffic similar to what he's talking about here.

so in my case, i have it deployed looking at all of my ingress flows. so i have it looking at any traffic coming from my internet gateway into my b pc. and what it's doing is it's giving the ability to have visibility into and blocking threats.

i could also use it in a similar fashion within a gateway to look at outbound flows to say, all right, these are the specific workloads that i want to allow access to on the internet for my environment here.

i can also use something called underneath the hood. uh what network firewall uses is something called surat.

um and we expose the, the rich complexity of the surat rules engine to you. we also have some easier ways to ease into that as well. but what that does is it gives me capabilities to deeply inspect the traffic.

um we've also modified within the service uh network firewall so that it's able to do um more firewall functions. if you're familiar with seratti, you probably are familiar with it as an ds ip s engine.

uh and we've put additional capabilities to improve upon that. one of those capabilities, what we're seeing here on the screen is the ability to create policies that have a default drop action. so i can say if it's not on this prescribed set of applications and protocols that i want to allow into my environment, you're going to drop that by default.

so what does that actually mean? so it means if my web application developers, for instance, they dropped the ball, they were pretty careless with how they set up security groups or knuckles for the public facing resources. i have this additional security layer that i have control of as a security team to block that access.

um in addition, i can do validation of that, that traffic in a more deeper way. so an example of this, that i might want to do that would, would deal with the save reconnaissance activities. for instance, is to look at say like the tlsn i, which is the server name indicator for tl s traffic.

so what i could do there is validate that the traffic coming into those web application endpoints is actually trying to reach me versus just randomly knocking on ip addresses to see what's going to respond. and as a result, i can, i can block that

a side benefit that is that it also is going to keep my web application in points off of services. like you've probably heard of like show dan or census that are es essentially mass scanner scanning the whole internet. and they're oftentimes used as tools to either supplement that reconnaissance effort or to replace it entirely.

and so by keeping my end points off of those, i also keep myself off of the attackers radar.

great. so you know that didn't work ip filtering techniques are out there. so i do different types of reconnaissance like using newly formed domains just to find out that michael on the other end is probably blocking bad domains using dns filtering.

so now i change my method. now i'm going to look at something that has a different entry point.

um i know that company x does a lot of ecommerce so they likely have web servers. so what i do is i look at my os top 10 and i choose a more targeted attack in this case, a sql injection.

so for the sql injection, what i'm going to do is look for certain web forms where i can perform web queries to underlying back end databases that contain p i information and company confidential information that i can then use and harvest those for my ransomware attack down the road.

so what i try to do is look for different web forms. i insert some sql code, modify some code, manipulate some code, do some queries on the back end database. and what i find is that it's just not working

so this guy won't stop with the attacks anyway.

um so ii, i have not left my application developers on their own. so they're not, i'm not relying on their prowess in terms of validating all the inputs coming into their web applications.

uh i have something called aws web application firewall. you've probably heard of it. some people will just call it waf for short and with awb wf, i'm deeply inspecting all of those web requests that come into these applications.

um and abs wf also comes with a feature which is the ad bs manage rules. so the a ds manage rules deal with things like you see on the screen here, sql injection, but it also looks across the uh o a top 10 for other things.

um like cross site scripting or maybe uh a recent vulnerability like the log for j uh exploit.

um and what it's doing is it's actively being updated by a w bs. so these rules because they're managed by aws. uh i don't have to worry about continuously updating these rule sets on my own.

um the other thing i'm doing is i'm deploying something that my wf application to these endpoints with something called ibis fro manager. so ibis fro manager is a way of essentially centralizing my deployment for abis wth.

and i can actually create rules that govern how those protections are applied. so for instance, as new applications get deployed into production, i can have a rule that says as soon as this rule, this goes live, maybe it has a production tag or it's in a specific ou within my abs organization, it will immediately get those protections from attacks like this.

great. so, you know, i look at my o a top 10, the sql injection didn't work. i try some cross site scripting, but as michael just indicated wf has the ability to also detect and protect from that.

so now i try something else. i tried something a little more at scale. a little more volumetric in nature where i tried to starve resources, whether they're networking resources or application resources, doing a ddos attack.

what's worked for me in the past is a rapid reset tool that i've used successfully

So I'm going to use this rapid reset tool. What this rapid reset tool does is that it pushes out TCP reset packets to the client and the server. So if the attack is successful, TCP resets will inundate the server such that it is exhausted based on capacity and it cannot accept any more connections that are legitimate.

So with that, so thankfully again, I'm using something called Amazon CloudFront. You've probably heard of it. It's a content distribution network that I have sitting in front of my application workloads. And in this case, CloudFront was actually able to automatically mitigate that type of HTTP rapid reset. You probably read about it recently, Google CloudFlare and both Amazon, we report on how we dealt with this.

But along with that, I have a newly launched security dashboard. And essentially what this allows me to do is look at the various security trends for all of my CloudFront distributions that in the applications that they're protecting. And I can see specifically what was allowed or blocked within the context of this latest attack.

Uh I'm also using another service called AWS Shield Advanced. Um and what AWS Shield Advance is, uh you've probably heard of AWS Shield. That's something that all customers get uh as a benefit of using AWS Shield Advance gives me additional layers on top of that of protection.

Uh one of the things that it will let me do is actually pick and choose which of my resources are most mission critical. So which would have the biggest impact on my business and are most likely to be targeted by those that would do something like DDoS.

Um additionally, this actually works in conjunction with WAF. And what it will do is as it sees these attacks coming in in real time, it can write layer seven WAF rules that will combat or mitigate that attack traffic.

Uh along with that, it will actually in real time back, test those rules before it puts them into place. So it will look at like a historical baseline of what does my application traffic look like to make sure I'm not going to get a bunch of false positives with this attack mitigation.

Another big benefit of Shield Advance that I also get is something called access to the Shield Response Team. We call them the SRT for short. What they are is like a, a 24 by seven set of DDoS experts that are specifically good at looking at attacks on the AWS network. They're involved in in all sorts of discussions around the large scale attacks we see on AWS and they're available to help customers in a crisis.

They can also consult with you about how to build resilient architectures as a Shield Advanced customer. But you can also if you empower them to do so, they can proactively as soon as these attacks come in, start putting mitigations on your behalf on the network.

So at this point, I'm super frustrated, nothing's working for me. So I'm gonna do one last ditch attempt and then throw in the towel. So since the volumetric attacks are not working, I'm going to be a little more stealthy. And I'm gonna go below the radar because it's obvious that the WAF rules are being triggered by a certain request per second threshold.

So I'm gonna try to do something below that. So I'm gonna use bots to automate the attack in conjunction with residential and mobile proxies to target some of these applications. Now, why do I have this set up? So, first of all, I want to mimic human behavior so that it looks natural and what I'll do is randomize the timing and also the way in which I do the attacks, I will also use different um IP pools.

So for example, if there's any IP rate limiting going on, I'll be able to subvert those capabilities. I will also be using residential proxies and mobile proxies because they map to legitimate IPs so that if there's any IP filtering, which I know Company X is doing, it will subvert that technique as well.

And then the last thing is that I want to ensure that all of the IPs that I'm using for this attack are from different geographies. So it mimics real world traffic.

So we've already talked a little bit about AWS WAF and its capabilities. Um I've also employed some additional newer premium capabilities that have been launched around the ability to both identify and control bots.

So typically a web application firewall is gonna have pretty limited capabilities in terms of its ability to identify bots, uh especially the stealthy variety that he's referring to. So that web application firewall on a case by case basis is looking at web requests coming in. And just based on the information in that web request is designed, should I allow it or should I block it? So there's a, there's a minimum amount of information there and oftentimes bots can forge a lot of that information to look to look good, like for instance, play with the user agent or things like that.

Um but with the targeted bot control feature that we see here on the screen, I have additional capabilities to essentially force all these clients accessing my application to perform some additional javascript. What that additional javascript does is it allows me to collect a lot of rich telemetry about what that client is and what it's up to things that are much harder to fake because they're actually doing compute on the back end.

So it's much harder for an attacker to fake the results of that telemetry. Now for my simplified attacks, like, you know, just a, a really um kind of unsophisticated DDoS attack where somebody's using like a script or crawl request, that kind of stuff will get blocked out, right? Because it won't even be able to perform those additional javascript actions.

But for the more sophistic varieties where, you know, somebody is hiding behind these residential proxies, I'm able to use all that rich telemetry to now apply that to these WAF machine learning models that are able to detect even these kind of stealthy coordinated bots activities and block them.

So that was the tit for tat between myself and Company X and Michael obviously outsmarted me. Um so now what we're gonna do is highlight the problems and then talk about the solutions.

So what are the key problems? You know, we had three classes of attacks. One of the key problems that companies and organizations deal with is the fact that when they perform a detection, it's not necessarily the case that the remediation action that is warranted is based on that entity performing the detection.

Meaning that if I see something bad on a firewall, I can put a denial list on the firewall. That's a very simplistic thing. But typically with these multi vector attacks, you see it somewhere and in order to optimize the mitigation, you have to mitigate somewhere totally different.

So how do you actually automate that diverse set of detections, mapping with the remediation actions, which don't always perfectly align the second piece. And I think most of you in this room can probably agree with me is that especially for larger organizations. When you have those three types of attacks, you can have three different teams that are working on the attacks.

These three different teams like net ops, sec ops apps, app, security teams dev ops and so on. They don't typically talk well to each other. They don't communicate well with each other and certainly not to the point where they're sharing intel information to proactively go after threats.

So we need to figure out a way where for example, if the net ops person is dealing with attack one, if there's a source IP, offending source IP the ability for that individual to communicate with the sec ops team in attack three. So they can have much richer context and the resulting meta event that these multiple events boil up to is much more actionable and does not result in false positive with the potential of disrupting legitimate traffic.

So these are the two key problems. Now, we'll talk about the solutions before I hand it over to Abira, who will talk about how it's actually performed in the real world.

So in this solution, you know, I talked about the diversity between detection and mitigation. In this case, Amazon GuardDuty has the ability to monitor its environment and look at things like CloudTrail logs, VPC logs. And in this case, it has found VPC logs that indicate that they could be the communication channel, the C2 channel that has been instantiated for the purpose of egress traffic going out to nefarious destinations.

So in this case, I've detected it on GuardDuty, but GuardDuty is not mitigating it. I need a way to trigger an event or an action to ensure that when I detected on GuardDuty, I can mitigate it on Network Firewall as an example or it could be WAF it could be DNS Firewall. But for this example, let's use Network Firewall.

So how do I bridge that gap? We bridge that gap with Step Functions and automations that we can do with Lambda that will trigger based on an alert and perform a response action to block that activity.

So what is that secret source in the middle relative to the Step Functions? Yeah. So this, as, as Moon mentioned, this is actually a blog that's on the website you can go to right now to read more detail, we'll share the link at the end. So you can kind of read about this in more detail. Um and also Abira is going to talk about how they specifically customize this type of solution for their environment.

So what exactly happened? So we mentioned that GuardDuty does some sort of detection, right? And we want to take that and perform some sort of automated action as a result. So in this case, again, GuardDuty detected something. It's looking at VPC flow logs, it sees IP X is being communicated with that. We know is bad either because of some sort of anomaly in terms of the communication or because, you know, threat intel indicates that we know that that specific IP has been used for malicious behavior recently.

So what we want to do is take action and block it. Uh and the first step here is for GuardDuty to send these events to Security Hub, Security Hub is has the ability to essentially integrate findings from many different services. And along with that, it will also publish those findings to something called Amazon EventBridge.

And in this solution, what we're going to have is a rule that's looking about all those findings and looking for those GuardDuty findings that match the, the, the case that Moon just mentioned where there is a remote IP address that we know is malicious as part of the finding. And then that's going to kick off this Step Functions workflow that you see here at a high level.

What we're essentially doing is we're going to take that IP address, we're going to put it into a DynamoDB table, we'll observe the time when that occurred so we can prune this later because IPs are ephemeral. We're going to want to take those IPs out at some specified period of time. And then I'm going to push all those IPs into a referral policy to block it in real time. At the very end, I'll get an SNS message as well to let me know that the remediation workflow succeeded or if there's any errors.

So if you recall that was problem one, there was a problem two where we need to correlate events and boil up meta events that are actionable. So what you see over here on the top of this slide are a number of different services. Each of the services have produced alerts that they deem you know activity that just took place. And if you look at those meter icons on top of those services, they're all green. So they're all benign alerts because these alerts were viewed in a silo without the context of the broader picture.

So now how can I visualize all of these alerts in these different systems and services and combine everything into a meta event that will tell a very different story that will be actionable and the remediation action will be very nuanced based on the diverse visibility that I am now dealing with.

So we have a relatively newer service just launched in the last year. So that was specifically launched to solve this particular problem of how do I get all my security data in one place? And this is called Amazon Security Lake.

So Amazon Security Lake is going to help in a number of different ways with this particular problem, right. First of all, it's going to help bring all that data together into a centralized data lake, but it's also going to do some transformation and normalization of that data to something known as the OCSF or the Open Cybersecurity Schema Framework, which is a open standard that a whole bunch of other customers or sorry vendors, security vendors have signed up to supporting and we have lots of customers that are really excited about this and are trying to standardize their security log data on this as well.

So now that I have all that in one place, it's in a standardized format, I can do a lot of more interesting things with it so I can create in place with tools like Amazon Athena and that can help me to correlate across all these different services to to produce the sort of meta event that Moon is referring to.

I can also do other things, right? So I can forward this on to other services that I might want to use to do either visualization or correlation, other tools that I'm already using. So an example might be something like maybe you're using AWS OpenSearch and you're familiar with that and use it as your SIM or maybe using some other third party SIM like Splunk or QRadar or something like that and you want to do the correlation over there, I can, I can selectively forward on those events to there.

The other thing I can do is and this was demoed earlier at Re:Invent. Here is I can use tools like Amazon Kendra and it's access to some of the frontier models to interact with this data with using natural language.

So what are some examples of something I might want to do? I could for instance, say, do I know in any of this pile security data was one of my resources communicating externally with this malicious domain that my sec office teams just identified. Or maybe I want to see, you know, I found security event A, you know, and I, I really want to know, well, where else did the IP address that occurs in this event? Did it occur in any other security events? I can just ask that in natural language or maybe now that I've identified a particular IP address that I know is involved in something malicious. I really want to know anywhere in the environment that I have data about what's going to this. So I could say please summarize all of the VPC flow log info for any of the communications with that particular IP address.

Thank you, Michael. So in the remainder of our session for the next 20 minutes, I'd like to reintroduce Abira to talk about real world examples where he's put this into test. There are things that he saw about these things that he iteratively mitigated as well. So turn it over to you, Abira.

Thanks Juan. So let me ask you a quick question. How many of you here in the crowd have heard about DBS Bank, Singapore? Quick show of hands. Quite, quite a few here. Alright.

So let me quickly introduce DBS. So DBS is one of the leading financial services group in Asia. We are the biggest bank in Southeast Asia headquartered in Singapore and have global presence. We have more than 50% market share in Singapore and we have won many accolades such as Safest Bank in Asia, Most Valuable Brand in Singapore and Safest Asian Private Bank.

One of our top mission is to become a transformative bank which is digital in its core and that's truly embedded in the culture.

We have embarked upon DBS runs multiple workloads in AWS cloud across multiple regions for its compute, storage, and analytical needs. So let me walk through some of the challenges we faced as we scaled in our security operations.

The applications are deployed in multi-account strategy and they all use a central proxy for their egress traffic. As the accounts scale, the key challenge that we had is to look at securing the egress traffic and preventing threats such as ransomware, command and control activity, and zero day exploits.

The second challenge is to prevent data exfiltration risk. We need the visibility to look at the external data movement and to have the guardrails to contain them.

The third objective is to have a manual response on remediation. AWS provides multiple detective controls for identifying exfiltration and unauthorized access. But this often produces alerts in isolation. With growing number of alerts, there was a heavy reliance for us to do a manual triaging to discern what is a genuine threat. But this often led to an increased operational toil for our operators.

Another critical aspect we aim to address was to reduce the handoff between different teams such as platform engineering and SOC and thus minimizing the time taken to respond.

So with that, we had three key objectives:

Identifying key metrics and indicators for unusual traffic pattern and data spikes. We needed to detect the anomalous pattern on the VPC networks by doing a real time analysis so that we can prevent threats arising from the egress traffic.
The second objective is to detect exfiltration on the data sources, identifying anomalous activity such as sudden spike in the API calls or access to data from an unusual location, etc.
In terms of IAM, any subtle misconfiguration in the IAM policies can enable the attackers to escalate the privileges allowing them to move laterally or exploit a misconfigured IAM role which can lead them to carry out an SSRF attack. These threat activities can easily go unnoticed in the sea of IAM data but can have far reaching implications if exploited. Hence, these anomalies will help us to identify any unauthorized data movement in our environment.

The third objective is to have an automated remediation that will allow us to quickly respond to threats. We want to proactively prevent threats as it emerges and to have an automated incident response playbook to reduce the manual operations.

So let's look at the implementation options we had to achieve our objectives:

If I can draw your attention to column one direct egress option one. So in the current state where we use a traditional proxy setup for our egress traffic and use multiple detective controls for identifying threats. But the downside of this approach is that for incident response, it still required a manual intervention and triaging and it provided us with less options for automation.

In the middle column, we went in with option two which is network firewall, a fully managed service with scales automatically with the network traffic. It provides us with advanced filtering capability to filter the egress traffic. We can also the rules based on fully qualified domain names and ensure that the ports are used only by their estimated protocols. While this solved the central proxy issue by providing us a fine grained policies for controlling the egress traffic and some automation opportunities, we still had the problem of dealing with multiple detective controls, sending in isolated alerts. And this led to an increased incident response time that made the threat correlation very difficult and it was manual.

And with option three, in the last column, we have, what we did was we combined the detection and remediation capability to have an automated workflow. So this is an integrated solution. When we receive an alert from GuardDuty or other security services, we then correlate this alert and invoke a corrective action. So in this case, it's updating a firewall rule group or updating an IAM policy with the restrictive permissions.

The benefit of this approach is that it provided us with an automated remediation and it streamlined our security response operations.

So just as a quick comparison between the three options, option one, we looked at it was more manual. Option two was network firewall. It gave us a little bit of automation capability but we still had the manual threat correlation problem. And with option three, we got the full orchestration capability and it solved the problem of threat correlation and the handoff between different teams which were some of the problem statements had by Man earlier in his talk.

So with that our implementation path was to choose option three for controlling and protecting the AWS egress traffic.

So we achieve this by going ahead with a two phased approach.

Phase one was focusing on visibility, we leverage the telemetry across multiple detective services and the feeds from different threat intelligence. By correlating these alerts from different security sources, we can then distinguish the noise from actual threats and get more meaningful insights. We need to do the visibility first so that we can reduce the false positives and then increase the accuracy of the automation in the next phase.

The next phase, we then move to automation where we focused on high confidence alerts by orchestrating automated correlation and corrective action. We accelerated our incident response and thus it reduced the handoff between different teams that we had the problem earlier.

To initiate the visibility phase, we then enabled ecosystem of services that are designed to enhance the visibility and detect threats across your entire cloud infrastructure.

We use Amazon GuardDuty which detects the attack tactics associated with anomalous activity and data exfiltration events, GuardDuty continuously monitors and analyzes your AWS event data such as CloudTrail VPC Flow Logs, DNS logs across all your workloads and then reports your finding to a central security account which can be delegated as administrator.

GuardDuty has specific detection coverage across services such as S3, IAM, Lambda, EKS and malware and the EBS volumes. The findings can then be investigated using Amazon Detective which can be used to determine the root cause of the incident.

Next we use AWS IAM Access Analyzer which detects any resources that are shared to an external entity outside of a zone of trust. With I Access Analyzer, you can define the analysis and then can validate your data perimeter across your organization. And then it sends in all the findings into a central security account which is delegated as the administrator.

It also provides features such as policy validation and policy generation for least privileged permission offering across IAM and SCP policies.

The next one that we used was AWS Network Access Analyzer which helps you to detect any unintended egress traffic from your environment. With Network Access Analyzer, you define network access scopes which can pinpoint any potential network routes that does not align with your network control objectives. It also helps you to verify network segmentation, trusted network access and trusted network path in your environment.

The next service that we used was AWS Config. It detects any resource misconfiguration that allows public exposure or data exfiltration. It provides you with customizable and prebuilt config rules to do compliance checks.

Finally, all of the services can then be integrated with AWS Security Hub where the findings will be ingested into AWS Security Hub. You can then consolidate all of these findings and assign CVSS score to get more meaningful insights for your automation.

Later, once we have the visibility sorted, we can then move on to automating our remediation use case.

So in the previous detection phase, we saw how GuardDuty supports multiple detector types. Here is a sample mock GuardDuty finding. We can find information such as the type of the finding, the severity, the account information and the resource id that's involved in this attack.

So all of these findings can then be exported to AWS Security Hub via Amazon EventBridge based on the finding that we can then orchestrate different remediation workflow.

And when it comes to automation, we don't have to start from scratch. Here is a sample solution from the blog "Automatically block suspicious traffic with AWS Network Firewall and Amazon GuardDuty" where a Step Function ingests a GuardDuty finding and then orchestrates a different remediation workflow, which is something Michael highlighted earlier in his talk as well.

We can leverage the code samples from this solution and adjust it to better fit your organizational needs later.

So let's look at a remediation use case that we implemented when we get a high fidelity insight on a network threat by putting all the pieces together from detection to auto remediation.

Here in the detection phase in the security account, we have enabled GuardDuty which gets findings about any anomalous activity in the network traffic. And by analyzing the VPC Flow Logs across all your workload accounts, we have Network Access Analyzer which gets the findings about your network that allows unintended access to internet by defining network access scopes across all your workload accounts.

These findings can then be ingested into AWS Security Hub as you can see in step two and Security Hub, then consolidate all these findings with other security findings as well and then expose those events to the security tooling account via Amazon EventBridge.

So in EventBridge, we created patterns which can then use to invoke different orchestration workflow based on the finding type.

Next, we can move on to the automation phase here. So here in the security tooling account, we have the automation engine which is orchestrated by Step Functions and Lambda, the Step Function ingests the findings from Security Hub and then it orchestrates the workflow for remediation.

So we made this component to be more generic so that we can use the same solution for automating different use cases.

So here on step three, you can see the enrichment lambda which correlates the findings from GuardDuty with Network Access Analyzer findings and then it decides whether an orchestration workflow needs to be invoked for remediation.

The enrichment lambda also manages exemptions for handling deviations for any authorized use case of your application. It then updates the insights back into AWS Security Hub with the severity score and then invokes the remediation for the next step of actions.

Now moving on to the remediation phase here on step four and five, we have the remediation lambda which then invokes the corrective actions. So in this case, it updates a firewall rule group with stateless rule blocking traffic to this external IP that we got from the finding.

So in this case, the Network Firewall is deployed in the centralized pattern which is easier for policy administration. So the remediation lambda then updates the remediation status back into Security Hub and then notifies the security operators for next step of actions to do a review and investigation.

Here's another remediation use case we implemented which is to remediate the exfiltration risk on S3.

So here in the detection phase, we have in the security account,

Step 1: GuardDuty will get all the findings about any anomalous access patterns on S3 and IAM user from all your workload accounts.

We use IAM Access Analyzer to get findings about resources that are shared to an external entity by defining analyses the zone of trust with your organization.

Now all of these findings get ingested into AWS Security Hub which exposes the findings via Amazon EventBridge to the next step of orchestration actions.

So here in the automation phase in the security tooling account, we use the same automation engine that we used in the previous solution.

So here on step three, we have the enrichment lambda which consolidates the events with GuardDuty findings and IAM Access Analyzer findings and then orchestrates the remediation workflow.

Now moving to the remediation phase, we have the remediation lambda on step four and five. Here in this case, it updates the S3 bucket with a restrictive bucket policy permission and IAM role with denied permissions to block unauthorized user.

As you can see from here, we use the same solution to orchestrate different remediation use cases based on the threat insight that we have got.

Here I want to leave you with a couple of takeaways that we learned from our implementation:

Start with the visibility phase. AWS offers multiple detective controls. So with that enable and automate the threat detection metrics, consolidate all of these alerts in Security Hub for a comprehensive ticketing workflow.
Once we have got the visibility of the security posture, we can then move to correlation, assign severity score to reduce high fidelity insights and you can prioritize them accordingly by correlating and managing the exemptions. We can reduce false positives and alert fatigue to our security operators.
Automation and feedback loop - Automation provided us with streamlined security operations and faster response time to reduce the handoff between different teams. Establish a constant feedback loop by validating your threat response and remediation using game days and chaos security programs.

So with this, I come to the end of my sharing, passing on to Man for some key call to actions. Thank you.

李白的朋友王维

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Building a comprehensive security solution with AWS security services

great.great.
复制链接

扫一扫