控制访问列表_为什么访问控制列表这么难

最新推荐文章于 2021-05-27 15:33:13 发布

weixin_26755331

最新推荐文章于 2021-05-27 15:33:13 发布

阅读量758

点赞数

文章标签：列表 python linux java

原文链接：https://medium.com/@eneiford/why-are-access-control-lists-so-hard-27ebeb186c4f

版权

控制访问列表

We’ve all been there. There’s a critical delivery, application teams are in a panic requesting resources at the last minute to meet a deadline that they had months to plan for, but somehow still are scrambling. Project managers are whining to anyone in a position of power about how the infrastructure teams are all standing in the way of meeting deadlines. Then the inevitable happens — a developer is unable to access a particular resource, and immediately the call rises around the office, you read the indignity in the slack messages, you hear the exasperation on the zoom calls — someone failed to open up the ACL.

我们都去过那里。有一个至关重要的交付，应用程序团队在紧急状态下紧急请求资源，以满足他们有数月计划的最后期限，但是仍然不知所措。项目经理正在向有权力的任何人抱怨基础设施团队如何才能按时完成任务。然后，不可避免的事情发生了-开发人员无法访问特定资源，立即在办公室附近发起呼叫，您阅读闲置消息中的侮辱，听到对zoom呼叫的不满，有人无法打开ACL。

“这怎么可能发生？我告诉你复制其他主机的访问权限！” (“How could this happen? I told you to copy the access from my other host!”)

The subtext is clear. How could you be so incompetent? Why is it that Access Control Lists are always wrong, can’t you do your job!

潜台词很清楚。你怎么这么没能力为什么访问控制列表总是错误，所以您做不到您的工作！

And they are correct. We are terrible people. As network engineers, we do enjoy the power we hold over an organization. No engineering team is so reviled, yet so critical to the function of an organization (well, maybe security). No team has to understand so thoroughly how every aspect of the business operates, while at the same time having no fucking idea what anyone else is doing.

他们是正确的。我们是可怕的人。作为网络工程师，我们确实享受我们拥有的组织力量。没有哪个工程团队受到如此的抨击，但对组织的功能却至关重要(嗯，也许是安全性)。没有团队必须如此彻底地了解业务的各个方面如何运作，而同时又不知道其他人在做什么。

I need to understand how your application behaves, I need to understand why your application does what it does, but only at a very macro level. I need to know how your processes interact with other processes, I could give two fucks what the result of those processes are (unless of course they need to interact with a third process).

我需要了解你的应用程序的行为，我需要了解为什么你的应用程序做它做什么，但只在一个非常宏观的层面。我需要知道您的流程如何与其他流程进行交互，我可以做两件事，即这些流程的结果是什么(除非它们当然需要与第三个流程进行交互)。

I digress. To understand why we suck at ACLs, we need to understand why ACLs are shit to begin with.

我离题了。要了解为什么我们会吸纳ACL，我们需要了解为什么ACL会一开始就很糟糕。

问题1： (Problem #1:)

ACLs are largely stored in device configurations. Routers, Firewalls, Load Balancers. Most of these things have configuration files that are best suited for very specific functions. Need to route traffic around your infrastructure, great, put in a router, establish your peers, set your route policies and steer that traffic baby. What that thing DOES is pretty clear. You can issue concise commands and get direct information about the operation of that device. Add policy enforcement to the mix and whoa there Bessy, you done fucked up.

ACL大部分存储在设备配置中。路由器，防火墙，负载均衡器。其中大多数都具有最适合特定功能的配置文件。需要在您的基础结构周围路由流量，很好，放在路由器中，建立您的对等点，设置路由策略，并控制流量。那件事确实很清楚。您可以发出简洁的命令并获得有关该设备操作的直接信息。将策略执行添加到组合中，然后在Bessy中，您就搞砸了。

Firewall configurations within the data center can (and will) quickly grow to hundreds of thousands of lines.

数据中心内的防火墙配置可以(并将很快)增长到成千上万的线路。

What if you have applications with huge latency dependencies, and someone at some point said, OK great, lets just use our switches to apply the access filters in TCAM at line speed. Oh fuck you. Now you done fucked up twice. Hammering a square peg in a round hole with that one. Switches are not designed to manage massive ACLs, they often have restrictions in the amount of grouping they can accommodate (and trust me, you need to be able to implement a sane object-grouping strategy — more on this to come). Once a config starts to become unwieldy, the problems in managing your access policies are just going to snowball.

如果您的应用程序具有巨大的延迟依赖性，并且在某个时候有人说，很好，那就让我们使用交换机以线速在TCAM中应用访问过滤器。哦，操你。现在，您完成了两次操。用那个锤子敲一个圆Kong中的方形钉。交换机并不是为管理大型ACL而设计的，它们通常在可容纳的分组数量方面受到限制(并且相信我，您需要能够实施明智的对象分组策略，有关更多内容请参见下文)。一旦配置开始变得笨拙，管理访问策略中的问题就会滚雪球。

Through all my years of education, never once was I sitting in class, and a professor would pause and say “That is how an Access Control List functions, now lets talk about how to manage the access control policies long-term. Because if you don’t, hoo-boy, at some point in your career, everyone is going to hate you”.

在我所有的教育生涯中，我从来没有上过课，一位教授会停下来说：“这就是访问控制列表的功能，现在让我们来谈谈如何长期管理访问控制策略。因为如果你不这样做，老兄，在你职业生涯的某个时候，每个人都会讨厌你。

问题2： (Problem #2:)

Responsibility. Application owners are responsible for making the application work. They have a shit job too, so I don’t blame them on this one. The firewall rules that permit application communication are part of the application. I have never started as a network engineer at an organization where this is the case. I’ve heard mythical stories of organizations that actually have a security team responsible for the administration of ACLs, but in practice, its always been neteng’s responsibility. Security often gets to wield their giant random ban-hammer over the ability to implement a particular ACL, but they are always one step removed from the application teams and the network team. Having a security review and approval process exacerbates a bad ACL strategy. By being human beings, they are therefore as shitty at their jobs as the rest of us. They will arbitrarily approve or deny ACLs depending on their mood, or if their coffee was burnt. They carry their own heir of superiority that comes with being the gatekeepers of pretty much everything anyone else is allowed to do (with the exclusion of their own team, of course).

责任。应用程序所有者负责使应用程序正常工作 。他们也有一份很烂的工作，所以我不要怪他们。允许应用程序通信的防火墙规则是应用程序的一部分。在这种情况下，我从未在组织中担任网络工程师。我听过有关组织的神话故事，这些组织实际上拥有负责ACL管理的安全团队，但实际上，这始终是neteng的责任。安全性经常会因为实施特定ACL的能力而受到巨大的随机冲击，但始终离应用程序团队和网络团队只有一步之遥。具有安全审查和批准过程会加剧不良的ACL策略。因此，作为人类，他们在工作上和我们其他人一样卑鄙。他们将根据自己的心情或咖啡是否被煮熟而任意批准或拒绝ACL。他们拥有自己的优势继承人，成为几乎任何其他人都可以做的事情的守门人(当然，不包括自己的团队)。

问题3： (Problem #3:)

Risk aversion. Everyone is goddamn petrified of deleting firewall rules. You can check all the fucking netflow exports, packet-captures, hit-counters, whatever the fuck you want to prove to the application team, your boss, whoever, that an ACL is not being used. As soon as someone somewhere removes an ACL line and something breaks, suddenly no one will ever let you remove another firewall rule ever for any reason. Firewall configurations are stale. Often they are confusing as shit as well. Obfuscated object groups with meaningless names, nested object-groups that act like Russian Dolls, mixed deny and permit rules spattered throughout the policy to make visual inspection of the configuration impossible. Compound that with generations of engineers adding rules with their own idiosyncrasies and bad habits and you have a recipe for disaster. You manage a firewall 1000 times with no impact, good for you, way to do your damn job. You fuck up once and miss a dependency or pull the wrong line, and that’s all anyone will remember. They don’t care how shitty and tedious managing ACLs are. Toughen up buttercup, don’t break my shit.

风险规避。每个人都对删除防火墙规则感到震惊。您可以检查所有该死的网络流出口，数据包捕获，命中计数器，无论您想向应用程序团队，您的老板(无论谁)证明没有使用ACL的该死。一旦某人删除ACL线路并发生故障，突然之间就不会有人出于任何原因让您删除另一条防火墙规则。防火墙配置已过时。通常，它们也会使人感到困惑。具有无意义名称的混淆对象组，行为类似于俄罗斯娃娃的嵌套对象组，混合拒绝以及允许散布在整个策略中的规则，从而无法对配置进行直观检查。伴随着几代工程师添加具有其自身特质和不良习惯的规则，您将面临灾难。您可以管理1000次防火墙，而不会产生任何影响，这对您有益，是您该死的方法。您一次搞砸，错过了依赖关系或走错了路线，这就是任何人都会记住的一切。他们不在乎管理ACL多么繁琐和乏味。收拾毛butter，不要打破我的狗屎。

好的，我们该如何解决？ (OK, so how do we fix it?)

I’m glad you asked. Because its possible, and its actually not hard. It takes some buy-in from the organization, and there will be a period of pain to get to a sane policy. Often if the existing rule set is already out of control, it turns into a years-long effort. Installing a new hardware platform is a great time to start a greenfield deployment of the new firewall ACL strategy. Let the legacy wither and die and every time someone complains, steer them to the better way.

我很高兴你问。因为它可能，而且实际上并不难。这需要组织的支持，而要制定一个理智的政策将需要一段时间。通常，如果现有规则集已经失控，则需要花费数年的时间。安装新的硬件平台是开始新防火墙ACL策略的全新部署的绝佳时机。让遗产消亡，每当有人抱怨时，将他们引向更好的方法。

1.了解您的防火墙将在很长一段时间内管理许多事物的访问。 (1. Understand that your firewall is going to manage access for lots of things for a very long time.)

Physical network infrastructure often has expected lifecycles of 7+ years. Figure out a way to organize the chaos.

物理网络基础结构通常具有7年以上的预期生命周期。找出一种组织混乱的方法。

Develop an organization strategy that works and STICK TO IT, even if there is some push back. Exceptions are the enemy here. If you ever want to get to where you are automating this rule set, think like a Mandolorian — “This is the way”.

制定可行的组织策略并坚持下去，即使有一些退缩。例外是这里的敌人。如果您想使该规则集自动化，请像Mandolorian一样思考-“这就是方法”。

Here are some general guidelines I have found effective:

以下是一些我发现有效的一般准则：

Create object groups that have meaningful names. Don’t use change ticket numbers or other referential information here that become impossible to decipher. If you look at large organizations, how many applications do you actually support…hundreds, thousands? In the scheme of things, the number of applications is not that large. I find that creating object groups referencing the application they support is best
创建具有有意义名称的对象组。不要在此处使用无法识别的更改票号或其他参考信息。如果您查看大型组织，那么您实际上支持多少个应用程序……成百上千？在事物方案中，应用程序的数量不是那么大。我发现创建引用它们支持的应用程序的对象组是最好的
Name your object groups over the service they DIRECTLY support. So application x (appX) consumes LDAP as part of its service. Even if the LDAP currently only supports appX, name the LDAP group for the LDAP service it is protecting, not the appX it is supporting by proxy. As soon as application Y (appY) comes along and also needs to talk to LDAP, you’ll understand why it was a good idea. Always think of an object-group in relation to the simplest element the ACLs referencing that object-group will be protecting.
用对象直接支持的服务来命名对象组。因此，应用程序x(appX)使用LDAP作为其服务的一部分。即使LDAP当前仅支持appX，也要为其受保护的LDAP服务命名LDAP组，而不是由代理支持其支持的appX。一旦应用程序Y(appY)出现并且还需要与LDAP通讯，您就会明白为什么这是个好主意。始终考虑与最简单元素相关的对象组，引用该对象组的ACL将受到保护。
Use object-group nesting effectively. Too-deep of nests are a nightmare to maintain. No nesting limits the ability for you to effectively manage your ACL ruleset. I find that a 2-level nesting strategy works for 95% of situations. I never nest more than a 3rd level.
有效使用对象组嵌套。巢太深是维持噩梦的原因。没有嵌套会限制您有效管理ACL规则集的能力。我发现2级嵌套策略适用于95％的情况。我从来没有嵌套超过3级。
Level 1 Object group — contains groups of like things (e.g. appX web servers). This object group should contain no nested objects, and contain every like entity the group represents. The objects within the group should be interchangeable in function for the scope of the object group
级别1对象组-包含类似事物的组(例如appX Web服务器)。该对象组不应包含任何嵌套对象，并且应包含该组代表的每个相似实体。对于对象组的范围，组内的对象应该在功能上可以互换
Level 2 Object group — this is an access group that only contains Level 1 Object groups. From our example above you would have the following structure
级别2对象组-这是仅包含级别1对象组的访问组。从上面的示例中，您将具有以下结构

Object Groups: 
  appX_app_servers  (level 1 object group)    
    1.1.1.1    
    1.1.1.2    
    1.1.1.3     appY_app_servers  (level 1 object group)    
    1.1.2.1        LDAP_servers  (level 1 object group)    
    2.2.2.1    
    2.2.2.2     LDAP_server_sources  (level 2 object group)    
    appx_app_servers    
    appy_app_servers  ACL:  
  LDAP_server_sources > LDAP_servers TCP389, TCP636 Permit

Even if a level 1 object group only contains a single host, go ahead and follow the standard. Create the object group and then add that object group into any level 2 object group that will grant the access it requires.
即使1级对象组仅包含一个主机，也请继续遵循该标准。创建对象组，然后将该对象组添加到将授予其所需访问权限的任何2级对象组中。

2.与应用程序团队合作，以确保他们“拥有”代表其服务的对象组是他们的最大利益。 (2. Work with the application teams to ensure they understand it is in their best interest to “own” the object-groups that represent their service.)

Often application teams already have to create network diagrams of their application to use as a reference. It’s almost never detailed enough to include firewall object groups. If you can get the firewall object group names that represent the service nodes in a network diagram that the application team owns, you have won half the battle. Security will sign off on the design, and everyone is on the same page.

应用程序团队通常已经必须创建其应用程序的网络图以用作参考。它几乎从来没有足够详细地包含防火墙对象组。如果您可以获得代表应用程序团队所拥有的网络图中服务节点的防火墙对象组名称，那么您就成功了一半。安全性将在设计上签字，每个人都在同一页面上。

When the requests start coming in asking to “add 1.1.1.4 to appX_app_servers” suddenly there is no ambiguity. This is a much better relationship with your application teams then getting a request that says “copy access from 1.1.1.2 for host 1.1.1.4”

当请求开始要求“向appX_app_servers添加1.1.1.4”时，突然就没有歧义了。与您的应用程序团队之间的关系要好得多，然后收到一个要求“从主机1.1.1.4从1.1.1.2复制访问权限”的请求

Which one of those conversations sounds easier to automate?

这些对话中哪一个听起来更容易自动化？

If you can get security sign-off on the firewall groups that represent an application during the POC and pre-prod deployments, there is never a question if the access will be permitted in the future. The access has already been permitted. Is 1.1.1.4 an appX_app_server? Yes, well then it automatically inherits all the access policies of the other appX_app_servers just by being placed into the object group that represents those servers.

如果您可以在POC和pre-prod部署期间在代表应用程序的防火墙组上获得安全性签核，那么将来是否将允许访问绝对不会有问题。该访问已被允许。 1.1.1.4是appX_app_server吗？是的，那么，只要将其放置在代表那些服务器的对象组中，它就会自动继承其他appX_app_servers的所有访问策略。

3.默认情况下拒绝，例外情况下允许 (3. Deny by default, permit by exception)

The default access policy for security enforcement should be deny (edge ACLs are a different beast for a different conversation). You should fight tooth and nail to keep every line of ACL being a permit. There are situation where stacking a narrow permit over a broad deny can accomplish an ask without much thought, but it sets a bad precedent. If at all possible stick to this rule. You can almost always accomplish the same access by only permitting the exact traffic without having to resort to inserting a deny in the middle of your rule base. Having a strict but simple access policy will be much easier to automate down the road. Explicit permit, implicit deny all. That way the only thing in your ACL is the traffic you want to know about, not the traffic you want to exclude knowing about.

安全实施的默认访问策略应为拒绝(边缘ACL对于不同的对话是不同的野兽)。您应该拼搏，以确保ACL的每一行都获得许可。在某些情况下，如果不加考虑就堆放一个狭窄的许可证就可以完成问询，而无需多加考虑，但这开了一个不好的先例。如果有可能，请遵守此规则。您几乎总是可以通过只允许确切的流量来完成相同的访问，而不必在规则库中间插入拒绝。拥有严格但简单的访问策略将很容易实现自动化。显式允许，隐式拒绝所有。这样，ACL中的唯一内容就是您想知道的流量，而不是您想排除的已知流量。

A flip to this coin is negating access. Don’t do this either. Some firewalls allow you to use negation (NOT).

抛弃这个硬币会否定访问权限。也不要这样做。一些防火墙允许您使用否定(NOT)。

e.g. NOT appY_servers > LDAP_servers TCP389 / TCP636 Permit

Checkpoints specifically come to mind, but I am sure there are others. This is just as bad as a deny buried in your code. You want to define good traffic, as opposed to excluding bad traffic. Let the default deny block everything.

我特别想到了检查点，但我敢肯定还有其他地方。就像埋在代码中的拒绝一样糟糕。您想要定义良好的流量，而不是排除不良流量。让默认拒绝阻止一切。

防火墙ACL工具： (Firewall ACL Tools:)

There are many security products that attempt to do some of this work for you. I have used some of them with varying levels of success (and frustration). Cisco CSM, Cisco Tetration, Juniper Space, Palo Alto Panorama. Each of these tools has their efficiencies, but build your solid foundation of access strategy, object-group membership, and shared ACL ownership, and then allow these tools to manage some of the administration of implementing the policies in sane way.

有许多安全产品试图为您完成某些工作。我使用了其中一些具有不同程度的成功(和挫败感)的东西。思科CSM，思科Tetration，瞻博网络，帕洛阿尔托全景图。这些工具中的每一个都有其效率，但它们为访问策略，对象组成员资格和共享的ACL所有权奠定了坚实的基础，然后允许这些工具以理智的方式管理一些实施策略的管理。

Just trying to do my part to make us network / security engineers a little less hated.

只是尽我所能使我们的网络/安全工程师少受讨厌。