"Ok. Have you ever found yourself having to follow some user activity? Have you had to correlate? Look in one place? Look in the secure logs somewhere, look at the um cloud trail logs. Did you find that a challenge? Did you wish you had more time to just do better things and, and have everything in the same place?
My name is Anya De Velda. Today. I'm gonna um tell you how to centralize, use, centralize user activity from external sources with using AWS CloudTrail LA.
So first, let's just think about the problem that we're trying to solve. So you might have uh an access key that you've discovered has been used by somebody who's no longer in the business or some resource configuration has changed and it's no longer compliant. And you're wondering, yeah, what happened there? Or even like a database server has disappeared from UAT? And you're wondering what has happened? How did this happen? When did it happen? Was it intentional? Was it automated? Why am I no longer compliant? Where did it happen? And why did it happen? So that's the problem.
Well, what's the challenge? Well, the challenge is if you even look in your CloudTrail in one account. There's lots and lots of activity coming in from everywhere, from the console from the CLI but you don't just have one account, right? You have multiple accounts. Uh so that activity is in lots of accounts. So imagine having to, to look in multiple accounts to have a look at that. But of course, you'll have some data and CloudTrail, but there'll, there'll also be some data, say and Config that will give you some more context about what's changed on the resource. But then you might be using Octa to log on. So there'll be some activity somewhere else in a different place. Or again, it could be on a on a hybrid instance, someone and it's really, really, really hard for humans to correlate all that info, especially when you're in a hurry and you want to get the answers really quickly. And of course, that's always maintaining immutability, right? Because if somebody can delete that data, then you might never find it. So that's the problem and the challenge in terms of what this activity may look like.
There's some examples. So CloudTrail um shows you the actions taken by users roles, AWS services in the console, using the CLI et cetera. So here's an example of a CloudTrail event. Uh and here it shows you who, so it was uh a user called Mary um from this source IP address trying to do something. And then when did they try to do it? So that's the event time. What were they trying to do that? We are creating a user called Paolo. How did they do it? They did it on the Mac? And what was the outcome? A user called Paolo was created? So that's just an example of some of the activity and CloudTrail.
What does activity look like in AWS Config? So this is an example of somebody setting encryption settings on an S3 bucket and then there's somewhere else activity. So this is uh just a screenshot of the vlog secure file from a Linux server and somebody created a user called Anya demo. I wonder who that was.
Ok. So if, if the problem is trying to centralize all the data and the challenges that the data could be everywhere, what's the solution? I'm a solutions architect. So the answer is always it depends, right? It depends on your use case. But if these were the problems you're trying to address most likely AWS CloudTrail Lake if you need to um have a control over the underlying storage, um then maybe that's an anti pattern, but otherwise CloudTrail Lake will help you with that challenge.
So just in case you haven't heard of the CloudTrail Lake, um it's a managed a data lake and it allows you to capture the data aggregate, it visualize it and analyze it. It's a turnkey solution. So with a few clicks of a button. you can start collecting the audit events. You don't have to do any ETL the data is ready for you to query and it's of course immutable. So once the event is ingested, you can't change it. And the QR code on the screen just um tells you more about a recent announcement with the way that you can store the data just in terms of your use cases. So check it out if that's something that you're interested in.
So with CloudTrail Lake, you can ingest data from AWS sources. So you can ingest data from CloudTrail and you can do that for your whole organization for AWS Config again for your whole organization, you might see some data from Audit Manager there. You can also store data from third party ISV sources and I'll, I'll show you some of the uh the examples of the ones that you can ingest the data with and then you can also just ingest other data from on premises or other hybrid applications. And then from there, you can obviously audit, visualize and query that data.
So there are two types of integrations um that you can um they can have in CloudTrail Lake. What the first type is those direct integrations, the partner integrations. So this is the list of some of the partner integrations that you can have. The partners will um you will configure the the channel, the CloudTrail Lake channel, you wanna send the data to that will put the data in the right format and send it into CloudTrail Lake for you. Uh but for the sake of today in this quick lightning talk, I'm gonna show you those custom integrations from these, let's just say from the from a Linux server, but there could be database activity or just some other integration.
So this is where you have to run an application, put the data into the format that CloudTrail Lake expects and then use the PutAuditEvents API to send the data over to CloudTrail Lake. So again, this is just a foundational example of how to send external data into CloudTrail Lake. Uh I've got some QR codes at the end that show you how to do that more on scale. But just for the sake of today's demo, it's really simple. I've got a remote VM that's running somewhere. It has a secure log file and I'm just gonna run a script that will use the PutAuditEvents API to push this data into CloudTrail Lake.
Once the data is in CloudTrail Lake, we're gonna query the data again. This is just foundational just to show you how this works. Um so let me switch over to the demo.
Ok. So I am now in the CloudTrail Lake console and I'm gonna head over to event data stores here and of course, my account's decided to log out uh quite handy. Just bear with me. Let me just log back in and let me refresh this. Okey dokey. So I'm in the CloudTrail LA console. I've got my event data stores. If I wanted to create one, I would click on this button, I would give it a name and I would select how long I want to store it for. And then from there, I would select what type of events I want to push into my event data store in CloudTrail Lake.
So as you can see, I can push AWS events here from CloudTrail, different type of events including my organization, I can also push some CloudTrail Insight events or some A to Bs Config events, but I want events from integrations. And again, the choices that I have here is these are those partner integrations I could select. But instead I'm gonna go to my custom integration and then from there, it's literally next and finished. So that's how simple it is to create an event data store.
But as I already have one that I created earlier, I'm just gonna expand that for you and the relevant data that I need here is this channel ARN I'm gonna need that for later. So this is what I'm concerned about the channel ARN here cause I'm gonna need that so that we can um we can push the data um to CloudTrail LA.
So let's connect to this, uh, to this remote VM. Ultimately, it's just an EC2 instance that's running in another account, but it could be a hybrid instance somewhere. Um, I'm just gonna connect to it remotely. It, it doesn't run in this account. So I'm just gonna SSH to it and I'm gonna clear this so you can see this better. Ok. And let's just add a user, user. It's like a test in spelling, isn't it live demo? Be in that? Ok.
So let's just have a look at this entry in the uh in the log file. Actually, let's not lesser because there's gonna be a lot. Let's just tail it. Oh, there you go. I'm just going to tail the last 15 lines and somewhere in there you should see this user, a event for re invent and, and just the, the copy of the event. So I'm just gonna, so what I would normally do is I would have um the, the, the log file, the secure log file rotating and have a cron tab, a job running on, on a schedule to send the data off to CloudTrail LA. But since we're in a demo, I'm just gonna pause that look um log rotation quickly and I'm just gonna quickly run a script and then I'll explain to you in um on a different screen of what this script does. So the script will just run and um it just looks through some lines.
So what is the script actually doing? So it's just a simple Python script again. Um this is just to show you the principles of what we're doing here. But just some of the key things we initially think the profile so that we can actually send the data into CloudTrail Lake. I've hard coded the region in there. I'm specifying a file path. So where I want the script to pick up my log files from and there's that channel ARN here. So this is where I'm specifying which CloudTrail Lake channel to send my log data to some more things here. I've, I've hard coded the instance ID as, remote VM one and then from there, I start actually reading the event events and from the lines. So there's that looking through 100 lines. So I'm grabbing the data from my secure log file. I'm building it out and this is where I actually put the data into the correct format that CloudTrail Lake is expecting. Once I've built out that data, I can then send it over to CloudTrail Lake. So this is, this is me doing so the response equals CloudTrail audit events. So it's putting that data over to CloudTrail Lake. And hopefully, when we look over at uh CloudTrail Lake, we'll be able to see that data.
Uh and just note just in case you're wondering um I'm using CodeWhisperer, I actually come from a operational background. So I'm not a fan of writing code. So uh I could use CodeWhisperer to actually help me write some of this. So let's just say I wanted to send files to S3. Um in theory, uh if my laptop continues to behave today, that should actually, there you go. It's gonna give me information on how to do it, which is quite cool. Um because then I don't have to do that myself. So that's just uh just a quick additional information on, on how to do that. Okey dokey.
So let's go back to the console uh the data sent in. So let's just have a look in CloudTrail LA at my data. So if I just head over here, uh I've got some um some queries that I've prewritten. Uh so you don't have to watch me type them, but there's my event data stores that I have and I'm using the, the demo event data store. So if I was to just run this first query here, which is just a, a select from, yes, I will get some data but it will just be a lot of data, but it's, it might not be quite what I'm looking for just yet. So there is the the query results. So it's pulling some data and for me it's 25 items, but I wanna look for something maybe more informative or specific.
So let's search for anything that's coming in from today. So it is the 28th today, right? So let's change that to the 28th and let's have a look what's coming through from today. So again, that query should, should run it is the 28th today, right? I hope so. Uh bear with me. There we go. Ok. So there's that data coming through all the data that came in from today. So obviously, again, if you were collecting this data from multiple instances straight away, you would have that data already in there. But let's look for some data with uh with the actual account account reinventor. I think I called it re invent. There we go.
So again, this query should now return. Oh I don't remember what I called that user, but let's just have a look, it should be 10. Ok? Obviously bad spelling, but you get the idea, I'm able to actually search for part of the trace and here ok? You get the idea, it should return what I'm looking for.
But what else is really handy once I've got this data centralized, for example, if you come across uh a specific IP entry. So, you know, you're, you're looking at some of the activity and your secure logs and you're noticing an IP address coming through and you're thinking, you know, there's something not quite right going on here. What are they doing this? Um you know, they look, they're trying to um kind of try all these different accounts. They're trying to connect what's happening on this remote VM. But what else you can do rather than have to go to a different console or a different product, you can switch over to your other event data store, which in this case, this one here from CloudTrail and you can search for that for the IP address just from the same console from the same place. What else did they do in the console? What other activities did they perform again? Everything is all in one place here. Um so you're able to kind of get to the data much quicker and you can see that again, they were already doing some, they, they were doing some queries uh and doing some other bits and pieces, but you've got that information all in one place. So that's just a really quick whistle stop tour of how to send the data from an external source. And in this case, this was a Linux server but to have it all in one place to be able to query all together used in CloudTrail Lake.
Again, you wouldn't be doing this, you know, you wouldn't be running one single script on an EC2 instance. You'd probably do this more centrally. So there's a couple of QR codes here. The first QR code is a blog post from my colleagues and basically they, they do this but more at scale. So um they created Systems Manager Automation documents, then they will collect the data from the EC2 instances in a central place and then push them down to CloudTrail Lake. So that's one of the ways that you could do this. You could also use a Lambda function to, to pull the data in or you could uh put your scripts and your um your schedule into the the user data of the EC2 instance to push the data across that way. Um so that's one of the ways to get that external data into um CloudTrail Lake.
The second QR code is just in case you haven't worked with CloudTrail LA before we have a repository of sample queries that you can just get started with. So you don't have to write them yourself. So that's another uh QR code to check out.
If you haven't come to speak to us in our Cloud Operations kiosk, please come and see us. Um if you want to talk about CloudTrail Lake or anything else, cloud ops related, come and say hello, there's also swag and uh and please don't forget to fill out the session survey and thank you so much for coming to see me today."