Security analytics and observability with Amazon OpenSearch Service

Hello, everyone. And thank you for joining our session. I hope that you had a good first day. A lot of learning at treatment today. We're going to be talking about two important use cases. So you're going to be learning how OpenSearch can help you with observable and security analytics use cases.

My name is Mohammed Ali and I have with me, my colleague Hager Waif, we both Opensearch SAs at AWS, which means we help AWS customers build solutions with OpenSearch and more importantly, get value out of their machine data. So we're going to be discussing that and how OpenSearch helps you with it.

And uh we're going to talk about uh we're going to show, give you a couple of reference architectures and a couple of demos and then we'll give you some resources at the end to uh to, to, to get started with these use cases.

So IDC predicts that there will be 129 zettabytes of data produced in 2023 that's 1 29 billion terabytes that's staggering. And as we are going to adopt generative AI, the machines are going to produce more and more data. In fact, most of the data today is machine generated and getting value out of this data. it is challenging.

So just to give you perspective of what data are we talking about, we operate application services and APIs all capture their performance and behavioral data in the form of signals like metrics and logs. We have security tools or devops tooling that are monitoring our servers and databases, how they're performing, they're monitoring. What is the volume of data flowing through the network? Who is accessing our application? And then we have IoT devices that are continuously monitoring our business operation environments and reacting to it at the edge. And most importantly, send the data back to the base for analysis.

All of this data has immense value and can help you answer many questions. So for example, with the data from application and infrastructure, you could determine if an application is operating as expected or troubleshoot. In case of a failure from our security tooling, we can find out we can detect whether there is a traffic from a suspicious IP address or someone's trying to access our application from an unknown IP that could trigger a protective action such as isolating that, that server or starting an investigation.

And then we have IoT devices with that. If you were monitoring our factory floors, you are able to answer if your employees are working in a safe environment. So as a builder, how do you go about building machine data inside solutions.

So this is what a typical solution would look like. You would collect the data from the source. You're going to put that into a software system that is designed to work with large volume machine data and give you ability to ask question in the form of insight, both in the form of signals like alert or visual insight to help you understand trends in high velocity time series data.

And once such popular software is OpenSearch, an OpenSearch is a pure open source search and analytic suite. It has quickly risen in popularity and now it's number four in the in the popular search engines, we have over 60 plus partners and with hundreds of new features released since it's launched within this year alone, we have uh we have released eight minor versions each adding to the core functionality or to the different use cases that you can build with it. All of that comes under Apache v2 license for free.

Now, OpenSearch is a distributed system, it runs on many machines. And if you were a company that does not want to manage that software and the infrastructure that runs OpenSearch, you can use it as a managed service and OpenSearch is available as a managed service in many cloud provider. And AWS also offers OpenSearch as a managed service aptly called Amazon OpenSearch Service, which makes it easy for you to run this distributed technology at scale with ease with high reliability and securely within your AWS account.

OpenSearch Service comes out of the box tested and enabled with these plugins that help you build those use cases like observability, like high scalability, highly scalable vector search or security analytics. Then the service also offers integrations with the wider AWS and open source ecosystem to help you build applications quickly. We have features like UltraWarm that allows you to store large volume machine data for a longer period of time at a fraction of the cost.

So as a developer when you're building application with OpenSearch, this is how you generally go about it. You identify the infrastructure or the application that will generate your machine data. It could be, you know, logs metrics or any other type of signals. Then you send them to OpenSearch in JSON format, OpenSearch talks JSON language and then OpenSearch indexes all the fields within that JSON document by default for you. And then it gives you the, the UIs and APIs, the UIs called OpenSearch Dashboard that will help you create those insights. It will help you search, analyze slice and dice the data. And that is common for all type of use cases.

Observability is a very popular use case with OpenSearch. So let's dive into some of the features that OpenSearch offers to make it easy for you to build observable platforms. But before that let me just define what an observable platform mean, observable platforms, help you collect and analyze data from your application and associated infrastructure so that you can understand the internal states and be alerted to in real time if there was any problem. And then it offers you tools to troubleshoot the applications quickly. This helps us in two ways. Your developers are spending less time in troubleshooting and debugging an application and more time in building those application and user experiences that matters to the business. More and second advantage is it helps you minimize the outage window or downtime because downtime have consequences, they cost you real money. And do you know that for an average Fortune 1000 company unplanned downtime costs exceed a billion dollar. And then there is significant uh reputational ramification which are not measurable.

Now, speaking of measuring, how do you know that you have a strong observe, you know, a strong observable strategy in place and how do you measure your success with that? These are two particular metrics that customer use to measure how they are helping their company reducing those downtime. One is mean time to detect and the other is meantime to repair, reducing both of which directly impacts the cost associated with the downtime, the faster you're going to detect, the faster you're going to repair the lower the impact of the downtime to your business. And this applies for both applications, trouble like application downtime or security breaches.

So, so a strong observable strategy in place for both application and security data would enable you to capture and improve on these metrics. So how do observable platform achieve these promises? They do? So by collecting and processing the signal from your application or security tooling? And as the industry is mature, there has been a general consensus on what makes or what signals make a good foundation, capturing the metrics will help you get alerted. If there was a, you know, a problem in your uh in your application, let's say your user experience was degraded because of higher rate from one of your microservice. And that could intimate you before your users start complaining for it.

Then comes the traces, traces track end to end request processing life cycle. They help you pinpoint the component which is causing the failure or degradation of the functionality. And then when you know which component is causing the failure through traces, logs will surface the events that will help you perform the uh the perform the troubleshooting and identify the root cause. Correlating these three types of signal is critical and will help you uh will help you minimize that downtime or the impact of the downtime.

So essentially what we are saying is this platform will help you reduce time to detect, reduce time to investigate and remediate. So you can build applications that are more available, reliable and performant regardless of whatever technology you you are going to use to observe your applications. The reference architecture would remain the same.

We have application that produce these signals and we need a mechanism to collect the signals from those applications and pass that on to what generally is referred to as a buffering layer, which is basically a layer where you're collecting the data and writing and reading that in bulk and writing that to the analytics engine which in our case will be OpenSearch. And from there you will start building those rules to create the alerts and detecting the problems and build visualization to perform the analysis.

So if you were building similar architecture in AWS, your signal producer could be applications running in any of the compute environment like EC2 or EKS or even third party tools. Generally, people use agents based collection of collectors which can be open source or AWS native. Many of you are operating polyglot architecture which means there are different languages. So it becomes a challenge to standardize the way you're going to collect these signals, which is where OpenTelemetry comes into the picture. It is a very popular observability framework that offers vendor agnostic SDK and libraries to instrument the applications. It supports many languages and collect logs, trace and metrics data. It is fast becoming a de facto standard which is why the monitoring and observable vendors are offering compatibility.

So using such standard, common standard or tools allow you to build an architecture where you can replace the backend observable engine without impacting the large landscape of your application that is already instrumented. Generally from the collection, you would send these signals to, as I said, a buffering layer for most of the log analytics use cases. We've seen S3 is sufficient for collecting the logs as a buffering. But if you have a uh a requirement for low latency, then you can use some sort of streaming based buffering. And that's where OpenSearch Injection Service comes into the picture.

OpenSearch Injection Service can read from S3 and also you can point your OpenTelemetry collector directly to it. So it basically processes parses the data enriches it and writes that to OpenSearch. Now, for, for those who are uh were not familiar with the OpenSearch Injection Service. It is a fully managed service that delivers real time log trace and metrics data to Amazon OpenSearch Service domains or the clusters or OpenSearch serverless collections. It is essentially a managed service for OpenSearch Data Prepper. If you're not familiar with that Data Prepper, it is a community driven open source project under the umbrella of OpenSearch open source. It allows you to filter transform, enrich the data and deliver to OpenSearch Service clusters or OpenSearch clusters. And it's part of, you know, as I said, the open source project.

When you use OpenSearch Injection Service, it offers AWS PrivateLink support, which means the data remains within that secure network. And in your VPC under the hood, the Data Prepper works based on what is called pipelines. Now, pipelines have source sync, source processor and syncs source is where the pipeline is either going to read the data or receive the data from, let's say OpenTelemetry collector. Then we have and passes this on to processor which will filter enrich and convert the data into the format that can be loaded into OpenSearch. Then you have sync where you have OpenSearch and Amazon S3 as a supporter chin.

So pipeline is generally defined or is defined in YAML format. We have section for source processor and sync in there on the screen. You see a pipeline where the source is HTTP, this would create an HTTP end point with that resource path and we collect the data and pass that on to the processor. And in the processor, we are asking the Ingestion Services to use common Apache log parser pattern to parse the data and identify or create those fields in the in the JSON document. And then in the sync, we're saying it to write it to a specific domain. This is where we provide the endpoint for the domain and the index name where you need to write the data to.

Now many of you are operating applications in AWS. You would be using AWS services or common uh open source uh applications like Apache log server, web, uh web servers therefore, we have created a bunch of blueprints that you can use as a starting template for collecting these common signals. And you can basically select the template and we'll write that YAML format for you and then you just plug in your specific details of your endpoint on on your servers.

Finally, the analytics engine in our use case or in the architecture is OpenSearch and OpenSearch Dashboard is the visualization tool, which is the user interface that comes bundled with OpenSearch Service. The important thing here to note is the pipelines work with any type of signals you can have uh or any type of log data or metrics, data or trace data from application or security tools. Um you know, getting your data in OpenSearch will follow this similar pattern. So it works both for security and applications.

Now on the screen, there's a QR code of a solution that some of the AWS SAs have put together to create the centralized logging solution for you. Now you would, you would be able to write metrics to OpenSearch and you can use OpenSearch for that. However, we believe that you should use a purpose built tool for uh for for everything, which is why Amazon Managed Service for Prometheus comes into into the picture which is a managed service for Prometheus, which is designed to store large volume metric data. And Grafana is the dashboard tool that works very closely with Prometheus. In fact, OpenSearch open source dashboard also can surface or query the data from Prometheus. So this completes our architecture and in fact, this is exactly what the architecture you will see in our demo today.

So let's jump into the demo. So for this demo, I have prepared an ecommerce application that runs in EKS cluster. So we have six containers that are representing different microservices instrumented with OpenTelemetry. So we're collecting traces and metrics through OpenTelemetry and logs through Fluent Bit and then passing it on to the Tele OpenTelemetry collector. The metrics are written to Amazon Managed Prometheus and logs and traces are forwarded to the Injection Service which writes the data to OpenSearch Service and OpenSearch Dashboard. So we'll use OpenSearch Dashboard for investigation and Grafana and OpenSearch Dashboard for alerting for our use case.

Now, my application is a simple shopping cart application. You, you select the products and you go through the checkout process through the whole car, uh the shopping cart process. Now, for this application, I've planted a bug in there. So when we are going to interact with it, there will be errors in there. Now, let's set up our observability uh dashboard first for that, I logged into OpenSearch Dashboard and I go to applications and we're going to compose an application. What does that mean? It means we're going to tell OpenSearch what services from that trace data make up my application. So I can hand it over to a single team that can be that can monitor this or they can use this uh dashboard to observe the application, we define which log source that is relevant for this particular application.

So here we're providing the source and we are selecting the services and this data is already collected from OpenTelemetry sitting in OpenSearch and that's how it knows the name of the services. So as soon as we select all the services, it's going to show us how they're interacting with this application composition map. So our application composition is ready.

So we're going to create that and we're going to go back to our uh we, we we'll go to manage Grafana to show you the dashboard parts on the metrics. So here I am create, I've created a dashboard that monitors the CPU memory and disk utilization of the containers that are running in the EKS. And most importantly, there's this panel uh for this demo is monitoring to the service payments in order. And I've created an alerting rule here on the metrics on the CPU utilization metrics. So the rules will say if the CPU utilization goes above certain level and stays there for 32 seconds, alert me in my to my SRE channel or Slack channel in there.

So that's one place where we can monitor, which is the metrics. The other bit is log signals. So for logs, we have alerting plug in and OpenSearch that allows you to create monitors. Here, we're creating a monitor that looks in the sample application, log data and finds if there is an error in any log line in for, for x number of errors in the last two minutes, it sends a message to the SRE channel again, the same channel which is built for the SRA team. Here we provide all the details and it is configured with the APM.

So I'll create the traffic which is going to start putting those errors because there's a bug in the application. And then we get an error, uh an alert. This is from dashboard for this demo. I'll just push, keep on pushing the traffic. So it actually goes beyond that, beyond the CPU level goes higher. And Grafana also sends us the alert. So here we'll see CPU goes higher than that level. It will show us as a yellow dotted line and then it will breach that threshold of 32 seconds and we'll get an alert in the Slack channel.

Now, within the alert, the good practice is to put the link of the dashboard where they can start investigating. So my team will click this link and jump straight into the Observability dashboard that we set it up here. We'll see all the, all the traces, all the functionality like check out and pay order that are going in there in application composition map. I can see the payment is causing a lot of error. But with trace group, we see checkout as 100% error. So we'll drill down into it. We'll see that how the check, you know, client checkout trace group has a problem.

We are drilling down to individual user requests, which is the traces. Now, we can click any of these traits and it will show us how these services are calling each other. And you can see the whole stack of services and you can also see where the errors are coming in there. So we have a good idea about, you know where the error is coming from, which is the payment service, you can see it in different ways and you can go to the details of each trace.

Now let's correlate it with the log. So we'll put it here in the in the language called PPL. PPL allows you to query data easily from OpenSearch following pipe processing sort of format. So here we put the trace ID surface, the event in the log which potentially tells us there was a problem. Now to investigate this thing, I want to see the events that happened before and after this. So as you can see, you can see what has happened after and what has happened before this event. And if there was any thing that of interest, we can find that out there, what came to what, what caused that that problem.

Now, my error itself has a lot of details so it says that payment service dot python online 77 has an error called checkout operation fail. Now, this is how we troubleshoot it and came to the root cause because the bug is landed by me through changing a certain variable in the court, I should be able to fix this problem here, I fix the problem. And the other important thing is when you're going to fix the bug, you need to see the feedback, whether this has a, you know, this has resolved the problem.

So as soon as I rebuild and deploy uh my payment service again and send some traffic, the error rate should start going down. So we'll generate the traffic and see the dashboard again. You can see 87.5 which is basically lower than 100 I'll continue to generate the traffic for the the error rates to go down. So we go down to somebody and let's see last few minutes of data and see if this is all resolved. Ok? I'll just check the last 10 minutes in that moment. And this, this tells me that uh everything's fixed. I can look at the error rate again. There's no uh there's no dot which is um which is showing the editor are there. So we were able to detect, investigate, resolve the problem and see the feedback of it.

Now, that was the out of the box demo that or out of the box user experience that comes for Observable data. Now you may be wondering what is required to visualize my custom logs or the common services log that from AWS w we have an open source project, what is called OpenSearch integration, which is the collection of common visualization for assets for popular sources. So with one click of a button, if you're using web application firewall, it can create the dashboard for you.

Now, good thing is because these are assets, you can create a comprehensive dashboard that looks at all your layers of application. Now it works for both application logs or security logs. However, for security logs, you might want to do something a bit more. So for that, I'm going to call my colleague Hager who is going to take you through the features that are in open source that we offer for security analytics.

Thank you, Ali. Hi everyone again. So, so far you learned how to analyze logs, traces and metrics using Observation features in OpenSearch service. I want to focus on one particular pillar of Observation which is log analytics and more specifically security log analytics. So in this section, we'll go with through why this should matter to you and the different strategies that you can leverage using OpenSearch server to build your security analytics log solutions.

So currently, at this point, um witness an increase in uh the frequency of the security issues or attacks and we also witnessed an increase in the sophistication and complexity of the security issues. This is due to multiple factors, but one of them is that the systems themselves are getting more complex in nature, right? So, so are the security uh issues. So it follows the um the state of the systems, other elements to why security issues are getting more complex to detect or investigate. They are powered by artificial intelligence now. And AI is ubiquitous is integrating all domains in our lives from self driving cars to health care diagnosis systems, et cetera, which pushes governments actually to build new legislations to make these self driving car safer and uh protect your privacy data.

So this means that your teams vs teams, teams across the organization should be aware of this and understand what is coming in terms of legislation to keep up with it. Also when we talk about securing your system, that also means that you are securing the access of your customers that you are making your application trustworthy customers now are more aware of the privacy issues concerns. So when you protect your system, you actually protecting your business.

Also this journey ahead of cybersecurity raises many challenges for your teams, not only security teams, but also the web s teams, developers, teams that should be aware of this current state of or current landscape of security and they should be able to uh handle the challenges that come come with uh the current landscape first, your organization, your teams should be able to collect and store this high volume of data that we are generating uh in in a daily basis. They should also be able to integrate new coming data from different sources. Security log data comes from cloud, from in premises applications from third party uh providers, et cetera. So you should be able to power your teams to integrate, easily, integrate these new sources of data and be able to analyze it.

All these applications are churning out logs of different formats or different languages if we don't want to call it that way. So this challenge brings um or brings your team to generate or create hundreds of queries to be able to analyze this uh log data and correlate the log data to understand the context of the security issue. For example, or the context of uh even our Observable, the context of a system issue, et cetera. You should also be able to leverage the community knowledge. This is very important. Currently, we need to grow all together. So leverage as much as possible the open standards that are available in the community and cyber security community is very active and we discuss in a few slides, what are the common standards that you should be aware of uh to handle these challenges? And this will also help your team like to lower the barrier of entry for them, whether the webs engineering developers and security teams for this, I would like to uh share a few strategies uh on how to handle these challenges.

First, strategy, I would like to introduce security analytics plug in with Amazon OpenSearch service. So it is b within the uh OpenSearch management service and it offers features and tools to uh be able to detect, investigate and resolve issues. Same for Observation. So Ali earlier mentioned about two metrics right that you need to focus on mm tt uh d meantime, to detect m tt r meantime to resolve. But there is another metric that you should be able to reduce across time for security analytics use cases which is m tt c. The meantime to complete basically the time that your team uh spends to complete an investigation and resolve a security issue.

The security analytics plain also offers uh a couple of features and more of them I will show later. But uh the first one is that you will be able to use out of the box more than 10 lock types. This means that uh the security analysts plug in will automatically offers mapping for DNS locks. For example, our system uh locks, windows, locks, et cetera. And you will be able to use out of the box. More than 2200 sigma rules. Sigma rules are based on an open source project which is called uh sigma and it offers you wide coverage of threads that are happening across systems and industries uh in the cybersecurity community.

So it simplifies the process of creating security rules and it also will help you to share the security rules that you build in your organization with other organizations and vice versa. So you take from the community and you give back to the community. The second standard that I would like to share with you today is OCSF open cybersecurity schema framework. It is a uh an open standard as well and it provides you standards for log formatting.

So we talked about DNS log system logs, et cetera. So it offers you um way to um map the data or build formatting of the data. How does that work? OCSF puts the log data in terms of categories. So you have OCSF categories like for example, uh network activity, uh category findings, um uh identity and access, uh management category, et cetera. And under each category you have OCSF classes, right? So you have, for example, um rdp activity class ssh class under network activity. Uh category, we have over 145 partners that are working on the uh on the project, including of course AWS and also partners like Splunk Cloud for et cetera.

So uh coming back to the uh available features within security analytics plug in first, you will be able to create the detector. The detector is the uh tool or the mechanism for you to uh um set up what are the rules that you would like to uh analyze within your system. What are the threads that you would like to uh analyze and you can leverage out of the box. Also fighter attack knowledge base. This is a global knowledge base of advisory tactics. Basically. uh in cybersecurity, it gives you an understanding what are the different threats and also an understanding of how to mitigate these threats. Basically, I will show you in the demo later how to leverage or like use a micro attack in your investigations.

Then you will be able to use the uh sigma rules like over 2200 sigma rules that i mentioned. Or you can also build your own rules uh in the security analytics tracking, you will be able to leverage the out of the box uh log types. And we also uh added the support of building custom log types recently within the same plug in, you will be able to create alerts. So how uh what is the frequency of alerts that you would like to receive? Um in which channel uh email s ms notification, uh link it to page of duty, for example, um or uh chime slack, et cetera. Whenever there is uh a thread or let's say potential issue in your system that is called finding. Ok.

So when a detector completes an execution, uh it will generate a finding for you. We will show you in, in the demo. How does that work. And once the findings are generated, the plug in will be able to give you a visualization of the relationship or the correlation between the different findings in the system. And this is very important in security analytics uh investigations because you might have network uh security logs that are coming in. But you should be able to correlate the security network data with application, security logs as well in terms of notification and et cetera.

Now how the detection work flow workflow looks like. First of course, you need to identify and index the log uh data that is coming from your uh security tools systems network, et cetera. You will build the detector in which you will define the security rules that you would like to use configure the alerting mechanism and then use the finding to uh investigate uh to detect of course and investigate uh the security issues. This is how the correlation engine looks like

So um in this screen, you see uh open source dashboard and here there is a graph of uh 11 finding and then the correlation between the other three findings. The red, the red color means that the severity of that finding is high, right, the green color means that the severity is low. However, pay attention that might be a risk that you should be aware of in your system and probably worth checking as well.

In the uh right right hand side, you see the uh correlation uh cards here. So in correlation cards, you will have uh information about each finding. What are, what is the detector that generated, find that finding and the uh rules, the lock type sources, et cetera.

With no further ado let's go to the uh demo. So here um let's imagine you are a the webs engineer or security engineer. You received an alert uh via email, for example, and you will connect to open source dashboard to investigate uh that uh that issue.

First, you need to go uh to open source dashboards under open source plugins. You select the security analytics plug in. Here, you have different tabs. So you have overview findings there of findings that are generated in the system, uh alerts, detectors and the correlations. So in terms of investigation, we stick with the overview.

And here i can see that we have total of 97 findings that have been generated in a couple of days with total uh four active alerts within this instagram. You can also uh group i or like display the number of fine be lo type as well. So here we change the group i to log type. Ok. Back to uh the original one and here you can hover uh over and you can see the number of alerts per day. Uh and the number of findings under the histogram, you will be able to see more details about the findings.

So here the recent alerts uh the alert severity. And uh then the recent findings that are generated uh in your system with the uh rule name that triggered that finding the uh severity and the name of the detector.

Now, this graph is important. So this shows that like for example, we have uh a high percentage of findings probably. So here, 68% of findings are related to publicly accessible rt p server, right? So this raises a great point for you saying, hey, there is a lot of findings generated here. We need to uh check what is happening in the uh system and then you have the list of detectors that are uh available on your system as well.

So back to the uh investigation, uh we'll move to the findings where we need to investigate the publicly accessible rdp service and understand the context of it. Uh understand what is happening uh through the uh the the logs that are received from different applications.

And here i can see all the findings or de details related to these findings, right? So i can see the time where the finding has been triggered, the finding id, uh the rule name, the threat detector that generated the finding uh log types, rule, um rule severity, of course and actions. Ok.

So uh we'll check for the publicly accessible rt p server. So we can see the route severity is high. So we need definitely need to check this out for that. I will go under the actions and i will view the details about this issue.

So here the finding details, you see the rule details, right? So in this rule detail is called publicly accessible rdp services. I can have a description but also attack about that specific attack. And this is where we will leverage the miter attack knowledge database.

I can have more details in open source dashboard about this attack who or or this uh this uh this rule who could created this rule, the author uh the source. And here i click on that tag and it will redirect me to the micro attack publicly uh public website to uh understand this kind of issue. And what are the possible mitigation solutions that you can uh leverage once i have more information about this rule, how i can uh resolve the issue.

I go back to open source dashboards. Uh here, we can also have more information about the post posi false positive uh cases, uh the rule um uh status, etcetera. So these details are important in your investigation to be aware of uh outliers, et cetera.

Now, in terms of investigation, it's important also to understand what's happened before and after that event. And this is where we uh check the surrounding uh documents. And here officer dashboard uh shows me five events that happened prior to that uh to that event. So probably worth also uh checking if there is anything unusual there, right?

So now under the finding details as well, i can see correlations here, i can see 12 correlations, right? So let's jump into that. And here i don't need to visit a. So i will just jump directly to the correlation graph to have a visual understanding of the uh the the system uh status.

Here we have a graph that is uh generated for us. And if i go scroll to the uh finding the middle, so this is our finding publicly accessible rdp service. Let's check the correlation with highest severity we have sign in failure, bad password thresholds. This means first of all, you have a publicly accessible rdp service that is open to the public. Second, you have someone that is trying to access your application and they are raising this issue of wrong password. So you definitely need to fix the publicly accessible rt p service. But you also be uh you need to check who is trying to access your application and uh fix that issue at the same time.

And here you can also uh have an understanding of other uh rules that are uh generated and other uh risk or potential risk that are uh correlated to this specific issue and talking about the cards. So here in the cards there, you can see all the details about uh all the findings that are correlated to the publicly accessible rdp services, right? So i went quickly there.

So um now you understand how to use security analytics plug in to and to detect, investigate and resolve issues. Let's take a use case where your security team need to index new type of data coming from in premise environment, for example, and you don't want to overwhelm or wait for engineering team to build the induction pipeline. This happens all the time, right?

So how to make it easier for your security teams or uh set ups teams, et cetera to be less dependent on the engineering team. This is where i would like to introduce amazon security link if you are not familiar with the service, this is a manager service that we released at re invent last year and it automatically centralized security like data into a pur purpose built lake which is based on amazon s3. It uh adopted o cs f format standard that we discussed it uh about earlier.

So it automatically transforms the log data that is coming from cloud iws cloud all cloud provider uh in prom as third party tools into o cs f format and stored into also parquet format in s3 for uh best performance uh et cetera. And uh in top of that, you can leverage or you can use any uh analytics solution.

So in this use case, we use amazon opensearch service to read from security uh like data, analyze the data. But you can also use amazon athena, for example, you can use amazon sage maker to train your models and understand the trends in your security data.

So how security leak integrates to the pipeline, right? So we pull back the uh reference architecture that ali shared with you before the same pipeline for observation applies to security use cases. Because security analytics uh security log analytics is part of observable as a whole.

Within this uh pipeline, we will focus on the collection and buffering and this is where we will replace it by amazon security lake security lake will automatically receive the locks from your sources. Transform it into o cs f and pg stored in s3 for you.

And since we are using amazon open source service, it uses a data access uh pattern or subscription model with uh security link. And this use case security link will automatically create a simple queue service for you to uh send a notification to open source transaction service where where there is a new uh documents that is available in the security link.

So open source transaction service or receive notification, read the data from amazon s3 and store it into open source uh service where you can use open source dashboard to use security analytics, plug in to uh build your investigation uh use case or simply to uh analyze the data, use ppl uh and build uh dashboards.

Now, this is how the uh pipeline looks like right now. Let's uh move to the transformation, the good news for your team for your organization is that opensearch induction service offers out of the box blueprints to read from security la and index into opensearch. And this is how the pipeline looks like. Ok.

We'll go into its uh part by part, first part which is source. And so you will just need to update the sqs main q url that is automatically created by security link in your uh account. You will update the ws region. Uh you will uh update the um sts roll a rn with the ingestion roll. The ines role should have permission or allow open source injection serves to read uh the notification, delete notification and also read from amazon s3.

The second part is processor. So the default processing here is simple as putting the o cs f class name into lower case replace uh the white spaces and deleting uh some events coming from s3. But you can also of course uh customize the processing based on your uh use case requirements.

Now moving to the sync and within the sync, you will just update the uh host that will receive the log data here. You will put open source end point. Uh you will also uh uh you need to put or specify the pipeline role that will allow open source induction serve to write in open search service. And then you will specify the index that will receive this data in the blueprint.

We follow a naming pattern where we put the o cs f category o cs f class and also the uh injection uh date, you can follow the same pattern that we recommend or you can if you have other requirements in your organization, of course, you can uh update and change the blueprints.

So looking at the different strategy that we shared with you, whether for observation or security analytics use cases, we have different benefits that we should look at. And this is where you also should look when you build your architecture. Uh basically for observable and security analytics.

First, the architectural diagram and the solution that we shared are extendable in the sense that open source service you've seen whether for observable, it can integrate with prometheus with the grafana uh with uh security analytics solutions. You can also integrate with amazon security la that integrates with opensearch induction service.

So here you need to think about a solution, not specifically a service and also understand how you can extend that solution in order to handle the use case and evolution of your use case observation features. And security analytics also will help your team to analyze real time data and also analyze historical data if needed.

And it offers you the features, the two links to help your team to power your team to uh to quickly analyze any issue, to quickly investigate and resolve to reduce time downtime and reduce the security impact on your business.

And in terms of cost, of course, which is one of the main uh best practice or like pillars that you should follow in uh building new architecture. Think about how i can build the optimized architecture while reducing cost. And here open source transaction service offers different options for this.

One of them is to leverage uh this storage. Cheering hot, ultra warm and cold. So you can store more data for longer time with less cost.

Three main takeaways that i would like to share with you tonight. Uh first think about centralizing the observable and security data whenever possible, this will help you to build the contextualization of your issues uh uh with the health and understand the health of your system overall. Whether for uh observ application observable or security analytics leverage the open standards, it is important to work as a community.

Whether you learn from it to, for example, security analytics, you use the sigma rules uh and you give back or share back to the community. This will help your team to uh uh uh um ramp up quickly actually on the current state of the current landscape and security analytics and observable.

And third think about using managed services, right? So in this use case, you leverage open search for observation and security analytics to allow your team to focus on what they should be focusing on not uh spending their time on infrastructure management, software updates or patches, et cetera.

These are some resources that we'd like to share with you tonight. So if you're not familiar with opensearch service, the first qr code, it will direct you to the official documentation. Uh we also can uh have a look at the best practices, the reference architecture. Um the new releases in opensearch blog post.

Last clear code is the um socialized logging with open source solution that will help your organization to quickly start with log analytics use cases big.

Thank you everyone.

Big, thanks from me.

Big. Thanks for ali, a good take from here.

Thank you everyone and uh uh for those data heroes. Uh these this qr core points to other sessions that you can attend to learn more about a to bs data again.

Thank you as uh for attending our session and there is a, there's a mobile applications, there's a survey available in your mobile application. So please use that provide us the feedback and let us know how we can improve. Uh we hope that you enjoyed our session and have a good rest of the evening.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
### 回答1: "Linux Observability with BPF" 是一本关于使用BPF(Berkeley Packet Filter)来增强Linux可观测性的PDF书籍。BPF是一种能够在内核空间运行的虚拟机,它可以通过动态编程来对内核进行扩展和控制。这本书通过BPF来提供了一种全新的方法来监控和调试Linux系统。 这本书首先介绍了BPF的原理和基本概念,包括如何编写BPF程序和如何将其加载到内核中。然后,它详细介绍了各种用于增强Linux可观测性的BPF工具和技术,如BCC(BPF Compiler Collection)、BPF_trace、BPF_perf_event和BPF_ringbuf等。 通过使用这些工具和技术,读者可以了解和追踪系统的各种事件和性能指标,如系统调用、网络流量、硬件事件、存储和文件系统等。这些工具还可以用于实时监控和调试,以及进行深度分析和故障排查。 此外,这本书还介绍了如何使用BPF来实现安全监控和防御措施,并介绍了一些用于性能优化和资源管理的技术。它还包含了一些实际案例和场景,以帮助读者更好地理解和应用BPF和相关工具。 总的来说,"Linux Observability with BPF" 是一本深入介绍和探索如何使用BPF来增强Linux可观测性的实用指南。它为读者提供了丰富的工具和技术,帮助他们更好地理解和优化Linux系统的性能、安全性和可靠性。 ### 回答2: "Linux Observability with BPF"是一本关于使用BPF(Berkely Packet Filter)在Linux上进行可观察性工作的PDF书籍。BPF是一个强大的工具,可以在内核空间进行数据收集、分析和操作,以提供更好的系统可观察性和性能调优。 这本书以非常详细的方式介绍了BPF的概念、原理和使用方法。它涵盖了BPF在Linux系统中的各个方面,包括BPF程序的编写、加载和追踪,以及如何使用BPF来监控系统的各个组件,如网络、文件系统和性能指标等。 通过阅读这本书,读者可以学到如何使用BPF来解决实际的系统问题。例如,可以使用BPF来监控网络流量,检测和过滤恶意流量,或者分析系统性能瓶颈并进行优化。 此外,这本书还介绍了各种BPF工具和框架,如BCC(BPF Compiler Collection)和bpftool,以及如何使用这些工具来简化BPF的开发和调试过程。 总的来说,"Linux Observability with BPF"是一本对于想要深入了解和使用BPF来提升Linux系统可观察性和性能的读者非常有价值的书籍。它提供了详细而全面的指导,使读者能够充分发挥BPF的潜力,并应用于实际的系统管理和优化中。无论是初学者还是有经验的系统管理员,都可以从中获得很多实用的知识和技巧。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值