云原生应用安全_云原生安全性检测工程

最新推荐文章于 2024-05-14 02:37:07 发布

weixin_26722031

最新推荐文章于 2024-05-14 02:37:07 发布

阅读量563

点赞数

文章标签：安全 python https

原文链接：https://medium.com/swlh/detection-engineering-for-cloud-native-security-190afdd4558c

版权

云原生应用安全

by Jamie Lewis, Venture Partner

杰米·刘易斯 ( Jamie Lewis) ，风险合伙人

The rapid move to cloud-native architectures is having a profound impact on enterprise security posture and operations. In the world of containers, microservices, and orchestration frameworks, the notions of an “application” running on a “machine” in a persistent “state” are obsolete. The application, or service, is now a distributed system, consisting of multiple components running on a highly variable number of nodes, in a nearly constant state of change. Traditional security controls that rely on machine isolation and a predictable system state are ineffective. Security policies that are blind to service-to-service communications and controls that lack horizontal scalability simply cannot keep pace with today’s microservice applications.

快速迁移到云原生架构对企业安全状况和运营产生了深远影响。在容器，微服务和业务流程框架的世界中，以持久性“状态”运行在“机器”上的“应用程序”的概念已过时。现在，该应用程序或服务是一个分布式系统，由多个组件组成，这些组件运行在数量几乎可变的节点上，处于几乎恒定的变化状态。依赖于计算机隔离和可预测的系统状态的传统安全控制无效。对服务到服务通信视而不见的安全策略以及缺乏水平可伸缩性的控件，根本无法跟上当今的微服务应用程序的步伐。

To protect business assets in cloud-native environments, organizations must bring security practices and technologies into architectural alignment with the systems they are meant to protect. Just as DevOps enables continuous development and deployment pipelines, “DevSecOps” must enable continuous security pipelines. That means establishing new methods, capabilities, and instrumentation, ensuring that systems designed to protect cloud-native systems embody these basic characteristics:

为了在云原生环境中保护业务资产，组织必须将安全性实践和技术与要保护的系统进行体系结构调整。正如DevOps支持连续的开发和部署管道一样，“ DevSecOps”也必须启用连续的安全管道 。这意味着要建立新的方法，功能和手段，以确保旨在保护云原生系统的系统体现以下基本特征：

Extensive, real-time visibility: Partial or after-the-fact visibility will not suffice. Both the infrastructure layer and applications, wherever they are, must be visible.
广泛的实时可见性 ：部分或事后可见性不足。无论位于何处，基础结构层和应用程序都必须可见。
Rapid, iterative feedback loops: Feedback loops allow security measures to adapt continually to rapidly changing environments.
快速，迭代的反馈循环 ：反馈循环允许安全措施不断适应快速变化的环境。
An engineering approach to solving security problems: Automation, continuous measurement, and controlled experimentation will be the predominant method of solving security problems across the enterprise, replacing manual analysis and control updates.
解决安全问题的工程方法 ：自动化，连续测量和受控实验将是解决整个企业安全问题的主要方法，取代了手动分析和控制更新。

Detection engineering is a pivotal aspect of this alignment. Detection engineering uses automation and leverages the cloud-native stack to discover threats and orchestrate responses before they can do significant damage. As part of a move to DevSecOps, detection engineering can improve security posture. Consequently, organizations making the transition to cloud-native architectures should consider how and when to incorporate detection into their security programs.

检测工程是这种对准的关键方面。检测工程使用自动化，并利用云原生堆栈来发现威胁并精心安排响应，以免造成重大破坏。作为开发DevSecOps的一部分，检测工程可以改善安全状况。因此，向云原生架构过渡的组织应考虑如何以及何时将检测纳入其安全程序。

检测工程 (Detection Engineering)

Detection engineering is the continuous process of deploying, tuning, and operating automated infrastructure for finding active threats in systems and orchestrating responses. Indeed, both the terms “detection” and “engineering” carry important connotations when it comes to the new approaches to security we’re discussing.

检测工程是部署，调整和操作自动化基础结构的持续过程，用于发现系统中的活动威胁并协调响应 。的确，当涉及我们正在讨论的新安全方法时，“检测”和“工程”这两个术语都具有重要的含义。

Detection: The debate over preventative versus detective controls isn’t new. Conventional wisdom holds that the key is finding the right balance between the two given an organization’s risk profile. But most enterprises have invested significantly more in prevention than they have in detection. In fact, much of the cybersecurity industry has focused on the creation, marketing, and sale of prevention technologies and products. But those products are failing. Insider threats, social engineering, zero-day attacks, determined (often state-sponsored) attackers, and many other factors have made an over-reliance on prevention a losing bet. It’s simply smarter, and more effective, for security managers to focus on detection rather than attempting to build impenetrable systems.

检测：关于预防控制与侦探控制的争论并不新鲜。传统观点认为，关键在于给定组织的风险状况，才能在两者之间找到适当的平衡。但是，大多数企业在预防上的投入比在检测上的投入大得多。实际上，许多网络安全行业都将重点放在预防技术和产品的创建，营销和销售上。但是那些产品失败了。内部威胁，社会工程，零时差攻击，坚定的(通常是国家赞助)的攻击者以及许多其他因素，使得对预防的过度依赖成为了失败的赌注。对于安全经理来说，集中精力进行检测而不是尝试构建难以渗透的系统，将更加智能，高效。

Engineering: The goal of any successful alerting system is to separate signal from noise, distilling meaningful and actionable alerts from the collection of event information, moving them up the chain for remediation and response. In a typical security operations center (SOC), analysts process those alerts, determining their severity and whether to escalate them to a higher-level analyst or incident response team. Processing alerts involves compiling contextual data (who, what system, how it’s used, what roles, and so on), filtering according to some or all of that contextual data, comparing events with threat feeds, and assembling a coherent picture of what happened — all before deciding what to do about the alert.

工程：任何成功的警报系统的目标都是将信号与噪声分离，从事件信息的收集中提取有意义且可操作的警报，将其向上移动以进行补救和响应。在典型的安全运营中心(SOC)中，分析人员处理这些警报，确定其严重性以及是否将其升级为更高级别的分析人员或事件响应团队。处理警报包括编译上下文数据(谁，什么系统，如何使用它，什么角色等等)，根据部分或全部上下文数据进行过滤，将事件与威胁源进行比较，以及对所发生的事情进行连贯的描述—在决定如何处理警报之前。

Given the sheer volume of alerts and event logs complex systems can create, this is a staggering task at best. Overwhelmed by mountains of busywork, security programs suffer from analyst burnout and alert fatigue. As analysts become desensitized to the staggering data load, real problems slip through their fingers. As Ryan McGeehan said in a recent post, “when a human being is needed to manually receive an alert, contextualize it, investigate it, and mitigate it . . . it is a declaration of failure.”

考虑到复杂系统可以创建的大量警报和事件日志，这充其量是一项艰巨的任务。安全程序不胜枚举，使分析人员精疲力尽，并使警报变得疲劳。随着分析师对庞大的数据负载不敏感，真正的问题从他们的指尖溜走了。正如Ryan McGeehan在最近的一篇文章中所说： “当需要人工手动接收警报，对其进行情境化，调查并减轻它时，……。。。这是一次失败的宣告。”

Consequently, organizations such as Netflix, Lyft, and Square have started treating threat detection as an engineering problem, using automation to avoid these pitfalls and make security teams more effective. They are also avoiding the silos that separate detection, response, and development teams, following the DevOps mindset when building detection mechanisms and integrating them with response orchestrations.

因此，诸如Netflix，Lyft和Square之类的组织已开始将威胁检测视为一种工程问题，使用自动化来避免这些陷阱并提高安全团队的效率。在建立检测机制并将其与响应流程整合时，他们也遵循DevOps的思维方式，避免了将检测，响应和开发团队分开的孤岛。

检测工程基础设施 (Detection Engineering Infrastructure)

In practice, implementing detection engineering requires an integrated infrastructure that consists of the following components:

实际上，实施检测工程需要一个集成的基础架构，该基础架构包含以下组件：

Data sources
数据源
Event pipelines
事件管道
The correlation engine
相关引擎
Response orchestrations
响应流程
Testing and feedback loops
测试和反馈循环

Our definitions and examples of detection engineering infrastructure components rely heavily on the work of Alex Maestretti, formerly the engineering manager of Netflix’s detection and response team, and Ryan McGeehan. We thank them for sharing their insights with us and allowing us to reference them here.

我们对检测工程基础架构组件的定义和示例在很大程度上取决于Netflix检测与响应团队的工程经理Alex Maestretti和Ryan McGeehan的工作。我们感谢他们与我们分享他们的见解并允许我们在此处引用他们。

数据源 (Data Sources)

An effective detection system must start with data about the applications and infrastructure that it is protecting. But getting the right (and good) data is not always easy. Some of the key aspects to consider are:

有效的检测系统必须从有关其保护的应用程序和基础结构的数据开始。但是，获取正确(且良好)的数据并不总是那么容易。需要考虑的一些关键方面是：

Cloud-native visibility: The cloud-native architecture has introduced new layers and components such as containers, orchestrators, service mesh, and others. Collecting data from these systems will likely require new tools and instrumentation.
云原生的可见性：云原生架构引入了新的层和组件，例如容器，编排器，服务网格等。从这些系统收集数据可能需要新的工具和仪器。
Align instrumentation and log consumption: Instead of creating a central team that deals with logs written by others, security engineers should work directly with DevOps to instrument data collection. The security organization should also hold those engineers accountable for the results. Netflix, for instance, holds the philosophy that the person who logs the data should be the person who consumes that data. This brings a new level of rigor and discipline to the process of creating useful logs.
调整仪器和日志使用量：安全工程师应该创建DevOps，以直接仪器数据收集，而不是创建一个处理其他人编写的日志的中央团队。安全组织还应要求这些工程师对结果负责。例如，Netflix坚持这样一种哲学，即记录数据的人应该是使用该数据的人。这在创建有用的日志的过程中提高了严格性和纪律性。
Find the best sources: Different systems support different levels of instrumentation and use different data schemas and models. Security managers need to find and understand the different logs and formats and decide which are relevant to threat detection.
查找最佳资源：不同的系统支持不同级别的检测，并使用不同的数据模式和模型。安全管理人员需要查找和了解不同的日志和格式，并确定与威胁检测相关的日志和格式。
Decentralizing detection The traditional detection architecture focuses on gathering a massive amount of data into a centralized data lake and run detection algorithm on it. This means detection can hardly be in real time as the amount of analysis needed is enormous. Utilizing a distributed architecture that performs detection closer to major sources of data allows for much faster detection and higher quality of data to be gathered.
分散式检测传统的检测架构着重于将大量数据收集到集中式数据湖中并在其上运行检测算法。这意味着很难进行实时检测，因为所需的分析量非常大。利用执行更接近主要数据源的检测的分布式体系结构，可以更快地检测并收集更高质量的数据。

事件管道 (Event Pipelines)

Event pipelines gather event data, streamline them via automated mechanisms, and prepare them for further examination before a human ever has to see them. To ensure effective detective controls, security teams must consider these issues when building event pipelines:

事件流水线收集事件数据，通过自动化机制对其进行简化，并为在人们见到它们之前进行进一步的检查做好准备。为了确保有效的侦查控制，安全团队在构建事件管道时必须考虑以下问题：

Stream vs. Batch: Today, many SaaS products and most legacy systems can’t push log and instrumentation data to the security system, making batch processes a necessity. But organizations should instrument services to stream data in real time to event pipelines when practical and possible.
流与批处理：如今，许多SaaS产品和大多数旧式系统都无法将日志和检测数据推送到安全系统，因此必须进行批处理。但是组织应该对服务进行检测，以在可行且可行的情况下将数据实时流式传输到事件管道。
Normalization vs. Workflows: Security teams should avoid the difficult work of normalizing event data by creating pipelines for each data source, basing its workflow on the data in that system. Templates and reusable modules to streamline work on common data types can streamline these efforts.
规范化与工作流：安全团队应通过为每个数据源创建流水线(将其工作流基于该系统中的数据)来避免规范事件数据的繁琐工作。简化通用数据类型上的工作的模板和可重用模块可以简化这些工作。
Alert Frameworks: A rigorous framework for creating events and alerts, and the rules that drive them, is essential. The process should follow the same engineering standards that govern software projects, making alerts subject to peer review and using a version-controlled repository. And the person who writes an alert should be accountable for its results. Palantir’s incident response team posted an excellent write up on such a framework.
警报框架：创建事件和警报以及驱动它们的规则的严格框架至关重要。该过程应遵循与管理软件项目相同的工程标准，使警报接受同行审查并使用版本控制的存储库。编写警报的人员应对其结果负责。 Palantir的事件响应团队在该框架上发表了出色的文章。
Enrichment: The low quality of security alerts is a major contributor to alert fatigue. Automation should enrich event data, adding critical information, eliminating manual labor, and improving alert quality. In general, less-expensive enrichments — such as a health and schema check or lookups for specific values, such as the geo-location of an IP address — occur in the event pipeline.
充实：安全警报的质量低是造成警报疲劳的主要原因。自动化应该丰富事件数据，添加关键信息，消除体力劳动并提高警报质量。通常，事件管道中会发生费用较低的扩充(例如运行状况和架构检查或对特定值的查找，例如IP地址的地理位置)。
Machine Learning: Proven machine learning techniques increase the ability of the system to detect active threats and conditions that warrant further investigation, reducing false positives. Algorithms for feature extraction run on the data, building models of actions, looking for anomalies and conditions, generating model-based events that drive event triggers.
机器学习：成熟的机器学习技术增强了系统检测需要进一步调查的活动威胁和状况的能力，从而减少了误报。用于特征提取的算法在数据上运行，建立动作模型，查找异常和条件，生成驱动事件触发器的基于模型的事件。
Forensic Data Storage: Detection engineering infrastructure should store and archive event data, which can be crucial when it comes to forensic activity.
法医数据存储：检测工程基础设施应存储和存档事件数据，这对于法医活动可能至关重要。

关联引擎 (The Correlation Engine)

The event pipeline passes only the events warranting further inspection onto the correlation engine, which will ultimately determine automated responses to alerts, including notifying humans. Key factors include:

事件流水线仅将需要进一步检查的事件传递到关联引擎，该引擎最终将确定对警报的自动响应，包括通知人员。关键因素包括：

Choice of Platform: The correlation engine is typically a data analytics platform. Commercial detection engineering products such as Capsule8 include their own data analytics engine. Some organizations rely on Splunk while others use Elasticsearch. (Disclosure: Rain Capital has an investment in Capsule8.)
平台选择：相关引擎通常是数据分析平台。商业检测工程产品(例如Capsule8)包括其自己的数据分析引擎。有些组织依赖Splunk，而另一些组织则使用Elasticsearch 。 (披露：Rain Capital对Capsule8进行了投资。)
Further Enrichment: The correlation engine uses rules and more expensive enrichment to determine whether the system triggers an alert sent to a human or invokes automated response mechanisms. Security teams will need additional tools and services to gather and include data such as the user accounts involved in an event, their security classification, who they work for, and their contact information. Context-specific information can include screenshots of what the user saw (in the case of a phishing attack, for example), how a given operation was launched (manually vs. automatically), what privilege levels were used, and any privilege escalation that occurred. Automated communication mechanisms, such as Slackbots, can reach out to get confirmations of activity or more information from users.
进一步扩展：关联引擎使用规则和更昂贵的扩展来确定系统是触发发送给人类的警报还是调用自动响应机制。安全团队将需要其他工具和服务来收集和包括数据，例如事件中涉及的用户帐户，其安全分类，工作人员以及联系信息。特定于上下文的信息可以包括用户看到的屏幕截图(例如，在网络钓鱼攻击的情况下)，如何启动给定操作(手动还是自动)，使用了什么特权级别以及发生的任何特权升级。诸如Slackbots之类的自动化通信机制可以伸手从用户那里获得活动确认或更多信息。

响应编排 (Response Orchestration)

Automation should do as much as possible to mitigate events as part of response orchestration, before an alert reaches a human. When it does reach a human, a properly enriched alert should contain a reasonable set of response actions. The right set of response actions will depend, of course, on the application and the type of event. In general, however, these are some of the factors to consider:

在警报到达人之前，作为响应编排的一部分，自动化应该尽可能地减轻事件的发生。当确实到达人类时，适当丰富的警报应包含一组合理的响应操作。当然，正确的响应操作集将取决于应用程序和事件的类型。通常，这些是要考虑的一些因素：

Automate common fixes: If a common temporary fix involves re-deploying a cluster in Kubernetes, for example, a workflow could re-deploy it automatically, triggered by a rule. Alternatively, an alert could include re-deployment as an option, allowing the human receiving the alert to do so with just the click of a mouse, saving a great deal of time and effort.
自动执行常见修复程序：例如，如果常见的临时修复程序涉及在Kubernetes中重新部署集群，则工作流可以根据规则触发自动重新部署它。或者，警报可以包括重新部署作为选项，从而使接收警报的人员只需单击鼠标即可这样做，从而节省了大量时间和精力。
Build in communication: Effective response and escalation orchestration should include mechanisms that can quickly bring users, developers, and other relevant players into the communication loop. Integration with Slackbots is often a key component of detection engineering. Dropbox recently released Securitybot, a communication mechanism designed specifically for integration with detection engineering systems, under an open source license.
内置沟通：有效的响应和升级流程应包括可以将用户，开发人员和其他相关参与者Swift带入沟通循环的机制。与Slackbots集成通常是检测工程的关键组成部分。 Dropbox最近发布了Securitybot ，这是一种在开放源代码许可下专门用于与检测工程系统集成的通信机制。

测试和反馈循环 (Testing and Feedback Loops)

Given a set of response options, the security team should capture feedback on how alerts are working for alert tuning and improvement. These feedback loops should include end-to-end integration testing. This includes working with red team and other testing efforts. McGeehan says security teams should “treat detection the same way you’d treat a build pipeline supported by CI/CD platforms like Jenkins.” He recommends simple scenario tests and writing direct attacks on the detection mechanisms. Canary testing is another useful technique for testing changes to controls. More often than not, the team will gain valuable insight into how the system works, driving improvements in both existing and future detection controls.

给定一组响应选项，安全团队应获取有关警报工作方式的反馈，以进行警报调整和改进。这些反馈循环应包括端到端集成测试。这包括与红色团队合作以及其他测试工作。 McGeehan说，安全团队应该“像对待Jenkins这样的CI / CD平台所支持的构建管道一样对待检测。” 他建议进行简单的场景测试，并对检测机制进行直接攻击。金丝雀测试是测试控件更改的另一种有用技术。该团队通常会获得对系统工作原理的宝贵见解，从而推动对现有和未来检测控件的改进。

结论 (Conclusion)

As organizations move to cloud-native systems, security must evolve, gaining higher degrees of alignment in terms of the technology stack and DevOps mindset. That means creating continuous security pipelines that accompany the continuous development pipelines inherent to the cloud-native ecosystem. Detection engineering is one way organizations can accomplish that goal.

随着组织迁移到云原生系统，安全性必须不断发展，从而在技术堆栈和DevOps思维方式方面获得更高的一致性。这意味着创建与云原生生态系统固有的持续开发管线相伴随的持续安全管线。检测工程是组织可以实现该目标的一种方法。

This post is an excerpt from a paper originally published at raincapital.vc

这篇文章摘自最初发表于raincapital.vc的论文

Originally published at https://www.raincapital.vc on March 17, 2020.

最初于 2020年3月17日 发布在 https://www.raincapital.vc 上。

翻译自: https://medium.com/swlh/detection-engineering-for-cloud-native-security-190afdd4558c

云原生应用安全

weixin_26722031

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
云原生应用安全_云原生安全性检测工程

云原生应用安全by Jamie Lewis, Venture Partner 合伙人杰米·刘易斯 ( Jamie Lewis) The rapid move to cloud-native architectures is having a profound impact on enterprise security posture and operations. In the world of...
复制链接

扫一扫