要写易删除，而不易扩展的代码

最新推荐文章于 2021-02-26 17:51:18 发布

flyingleo1981

最新推荐文章于 2021-02-26 17:51:18 发布

阅读量1.1k

点赞数

译者序

本文托管在 GitHub 上: https://github.com/freedombird9/code-easy-to-delete，欢迎 Star 或纠错。

好的文章总是见解独到，功底深厚而逻辑清晰。这是一篇关于如何设计、架构代码的文章。文章的观点新颖而有力。作者的观点是，我们所做的一切 —— 重构、模块化、分层，等等，都是为了让我们的代码易于被删改，都是为了让遗留代码不成为我们的负担，而不是为了代码复用。

作者认为，经过七个不同的开发阶段，最终便可以提炼出这样的代码。每个阶段都有详细的介绍和例子。

初读文章，可能会有抽象、晦涩之感。但多读几遍之后，其主旨就会变的清晰。

一个晚上的彻夜不眠，有了这篇中文翻译，与大家分享，希望对读者有所助益。

本文托管在 GitHub 上，水平有限，还望大家多多指点。

感谢

谢谢秋兄将这篇文章分享给我。

中文翻译如下

编程是一件很糟糕的事 —— 在荒废了自己的一生之后所学到的东西

要写容易删除，而不容易扩展的代码。

没有一行代码产生于理性、有很强的可维护性，且不会被偶然地删除掉 Jean-Paul Sartre’s Programming in ANSI C.

每写一行代码，都会有一个代价：维护。为了不在代码上花费太多，我们有了可复用的软件。但是代码复用有一个问题：当你以后想要修改的时候它就会成为一个障碍。

一个 API 的用户越多，为了引入修改而需要重写的代码就越多。相似的，你依赖第三方 API 越多，当其有任何改变时你的麻烦就越多。管理代码之间的兼容性，或者模块之间的依赖关系在大型系统中是一个很重要的问题。而且随着项目越来越久，这个问题就会变得越复杂。

今天我的观点是，如果我们要去计算一个程序有多少行代码，我们不应该将其看成是「产生了多少行」，而应该看成「耗费了多少行。」 EWD 1036

如果我们将「有多少行代码」看成是「耗费了多少行代码」的话，那么当我们删除这些代码的时候，我们就降低了维护成本。我们应该努力开发可丢弃的（disposable）软件，而不是可复用的软件。

我不需要告诉你删除代码比写代码更有趣吧。

为了写易于删除的代码：重复你自己以避免产生模块依赖性，但是不要重复管理这些代码。同时将你的代码分层：在易于实现但不易于使用的模块的基础上构建易于使用的 API。拆分你的代码：将很难于实现且很可能会改变的模块互相隔离，并同时和其他的模块隔离。不要将每一个选项都写死，容许在运行时做改变。不要试图同时去做上述所有的事情，或许你在一开始就不要写这么多代码。

阶段0：不写代码

代码有多少行本身并不能告诉我们什么，但是代码行数的数量级可以：50，500，5000，10000，25000等等。一个一百万行的庞然大物显然会比一个一万行的程序更折磨人。替代它也会显著花费更多的时间、金钱和努力。

虽然代码越多，摒弃起来就越困难，但是少写一行代码本身并不能省掉任何事情。

即使如此，最容易删除的代码是你一开始就避免写出来的代码。

阶段1：复制粘贴代码

写可复用的代码是一件在事后有了代码库中的使用示例后更容易做的事情，而不是在事前就能预料好的。往好的看，仅仅是利用文件系统你或许就已经在复用很多代码了，所以何必这么担心呢？一点点冗余是健康的。

复制粘贴代码若干次，而不是仅仅为了给这个用法取一个名字就去写一个库函数，是完全没有问题的。一旦把一个东西变成共享的 API，改变起来就会更困难。

调用你的函数的那段代码会依赖于其实现背后有意或无意的行为。使用你的函数的程序员不会根据你的文档去调用，而会根据他们观察到的函数行为去调用。

删除函数内的代码比删除一个函数更简单。

阶段2：不要复制粘贴代码

当你已经复制粘贴足够多次数时，或许就是该提炼出一个函数的时候了。这是「把我从标准库中拯救出来」的东西：「打开一个配置文件并返回一个哈希表」，「删除这个文件夹」。这些例子包括了无状态函数，或者有一些全局信息，如环境变量的函数。这些是最终会出现在一个叫做 “util” 文件中的东西。

旁白：建一个 util 文件夹，把不同的功用放在不同的文件里。单个 util 文件总是会不断变大直到大得来无法拆分。使用单个 util 文件是不简洁的做法。

对于应用或者项目而言通用性越强的代码，就越容易复用，被改变或者删除的可能性就越低。它们包括日志记录，第三方 API，文件柄（handle）或者进程相关的库。其他你不会删除掉的代码有列表、哈希表，以及其他集合。这不是因为它们的接口通常都很简单，而是因为它们的作用域不会随着时间的增长而变大。

我们要努力将代码中难以删除的部分与易于删除的部分分隔得尽可能开，而不是使所有代码都变得易于删除。

阶段3：写更多的模版

虽然我们通过库来避免复制粘贴，但是我们常常会需要复制粘贴来使用这些库，最后导致写了更多的代码。不过我们给这些代码另外一个名字：模版（boilerplate）。模版和复制粘贴在很大程度上很像，除了每次使用模版的时候都会在不同的地方做一些改变，而不是一次次重复完全一样的东西。

就像复制粘贴一样，我们会重复部分代码以避免引入依赖性，以获得灵活度，代价则是冗余。

需要模版的库通常有网络协议、有线格式（wire formats）、解析套件，或者很难将策略（一个程序应该做的）和协议（一个程序能做的）交织起来而又不限制可选项的东西。这种代码是很难被删除的：与其他的电脑通信或者处理不同的文件通常是一种必需，而我们永远不想让业务逻辑充斥其中。

写模版不是在练习代码复用：我们尽可能将变化频繁的部分和相对更稳定的部分分隔开。应最小化库的依赖性或责任，即使我们必须通过模版来使用它们。

你会写更多的代码，但是这些多出来的代码都是在易于删除的部分。

阶段4：不要写模版

当库需要迎合所有要求的时候，模版的作用最为明显。但是有时候重复的东西太多了。是时候将一个弹性很大的库用一个考虑到了策略、流程和状态的库打包起来了。开发易用的 API 就是将模版转换成一个库。

这比你想象中的要普遍：最为流行和倍受喜爱的 Python http 客户端模块 requests 就是一个很成功的例子，它将一个使用起来更为繁琐的库 urllib3 打包，为用户提供了一套更加简单的接口。当使用 http 的时候， requests 照顾到普遍的工作流，而对用户隐藏了许多实际的细节。相比而言， urllib3 处理流水线和连接管理，不对用户隐藏任何细节。

当把一个库包进另一个库的时候，与其说是为了隐藏细节，倒不如说是为了将不同的关切分开： requests 是关于http的冒险，urllib3 则是给你工具让你自己选择你自己的冒险。

我并不是主张让你去建一个 /protocol/ 和 /policy/ 文件夹，但是你确实应该尝试使 util 不受业务逻辑的干扰，并且在易于实现的库的基础上开发易于使用的库。你并不需要将一个库全部写完之后再在上面写另一个库。

将一个第三方库打包起来通常也是很好的实践，即使它们不是协议类的库。你可以写一个适合你的代码的库，而不是在整个项目中都锁定一个选择。开发一个好用的 API 和开发一个具有扩展性的 API 通常是互相冲突的。

像这样将不同的关切分开，能让我们在使一些用户很高兴的同时不会让其他用户想做的事情变得不可能。当你从一开始就有一个好的 API 的时候，分层是最简单的。但是在一个写得不好的 API 上开发出一个好的 API 则会很困难。好的 API 在设计之时就会站在使用者的位置上考虑问题，而分层则是我们意识到我们不能同时让所有人都高兴。

分层更多的是为了使那些很难删除的代码易于使用（在不让业务逻辑污染它们的情况下），而不仅仅是关于写以后可以删除的代码。

阶段5：写一大段代码

你已经复制粘贴了，你已经重构了，你已经分层了，你已经构建了，但是代码在最后还是需要做一些事情的。有时候最好的做法是放弃，然后写一大段垃圾代码将剩余部分弄在一起。

业务逻辑是那种有着无尽的边界情况和快速而肮脏的hack的代码。这是没问题的，我对此并不反对。其他的风格，如「游戏代码」，或者「创始人代码」，也是同一个东西：采用捷径来节省大量的时间。

原因？有时候删掉一个大的错误比删掉18个小的交错在一起的错误更为容易。大量的编程都是探索性的，犯几次错误然后去迭代比想着一开始就做对更快速。

这个对于更有趣味或者更有创造性的尝试来说更为正确。如果你正在写你的第一个游戏：不要写成一个游戏引擎。类似的，不要在写好一个应用之前就去写一个框架。第一次的时候尽管大胆的去写一堆乱七八糟的代码。你是不会知道怎样拆分成模块的，除非你是先知。

单一库有类似的取舍：你事先不会知道怎样拆分你的代码，而一个大的错误显然比20个紧密关联的错误更容易处理。

当你知道哪些代码将会被舍弃、删除，或者替换的时候，你就可以采用更多的捷径。特别是当你要写一个一次性的客户端网站，或关于一个活动的网页的时候。或者任何一个有模版、要删除复本、要填补框架所留下的缺口的地方。

我不是说你应该重复同一件事情十次来纠正错误。引用 Perlis 的话：「所有东西都应该从上到下建立，除了第一次的时候。」你应该在每一次尝试时都去犯新的错误，接纳新的风险，然后通过迭代慢慢的来完善。

成为一个专业的软件开发者的过程就是不断积累后悔和错误清单的过程。你从成功身上学不到任何东西。并不是你能知道好的代码是什么样的，而是你对坏的代码记忆犹新。

项目不管怎样最终都会失败或者成为遗留代码。失败比成功更频繁。写十个大的泥球，看它们能将你带向哪比尝试去给一个粪球抛光更快速。

一次性删掉所有的代码比一段一段的去删更容易。

阶段6：把你的代码拆分成小块

大段的代码是最容易写的，但同时维护起来也最为昂贵。一个看起来很简单的修改就会以特定的方式影响代码库的几乎每个部分。本来作为一个整体删除起来很简单的东西，现在变得不可能去一段一段地删除了。

就像我们根据相互独立的任务来将我们的代码分层一样，从特定平台的代码到特定领域的代码，我们同样需要找到一种方法来梳理出顶层逻辑。

从一系列很困难的或者很容易变的设计决定开始。然后去设计一个个模块，让每一个模块都能隐藏一个设计上的决定，使其对其他决定不可见。 D. Parnas

我们根据代码之间没有共享的部分来拆分代码，而不是将其拆分成有共同功能的模块。我们把写起来、维护起来，或者删除起来最让人沮丧的部分互相隔离开。

我们构建模块不是为了复用，而是为了易于修改。

不幸的是，有些问题相比其他的问题而言分割起来更加困难和复杂。虽然单一责任原则说「每一个模块都应该只去解决一个难题」，但更重要的是「每一个难题都只应该由一个模块去解决」。

当一个模块做两件事情的时候，通常都是因为改变一部分需要另外一部分的改变。一个写得很糟糕但是有着简单接口的组件，通常比需要互相协调的两个组件更容易使用。

我如今再也不会尝试用「松耦合」这种速记一样的描述来定义那种应该被认可与接受的材料了，或许我永远不可能以清晰易懂的方式来定义它。但是当我看到它的时候我能够认出来，而当前的代码不属于那种。 SCOTUS Justice Stewart

你如果可以在一个系统中删除某一模块而不用因此去重写其他模块的话，这个系统就通常被称为是松耦合的。但是解释松耦合是什么样的比在一开始就建立一个这样的系统要容易多了。

甚至于写死一个变量一次，或者使用命令行标记一个变量都可以叫松耦合。松耦合能让你在改变想法的同时不需要改写太多的代码。

比如，微软 Windows 的内部 API 和外部 API 就是因为这个目的而存在的。外部 API 与桌面程序的生命周期捆绑在一起，内部 API 则和内核捆绑在一起。隐藏这些 API 在给了微软灵活性的同时又不会挂掉过多的软件。

HTTP 中也有松耦合的例子：在你的 HTTP 服务器前设置一个缓存。将图片移到 CDN 上，仅改变一下到它们的链接。这两者都不会挂掉你的浏览器。

HTTP 的错误码是另外一个关于松耦合的例子：服务器之间常见的问题都有自己独特的错误码。当你收到400的时候，再尝试一次还是会得到同样的结果。如果是500则可能会变。结果是，HTTP客户端可以替代程序员处理许多的错误。

当把一个软件分解成更小的部分时，必须要考虑到如何去处理错误。这件事说比做容易。

我勉强决定去使用LATEX。在有错误存在的情况下去实现可靠的分布式系统。 Armstrong, 2003

Erlang/OTP 在处理错误方面有独到之处：监督树（supervision trees）。大致来说，每一个 Erlang 进程都由一个监督进程发起并监视。当一个进程遇到了问题的时候，它就会退出。当进程退出的时候，其监督进程会将其重启。

（这些监督进程由一个引导进程（bootstrap process）发起，当监督进程遇到错误的时候，引导进程会将其重启）

其思想是，快速的失败然后重启比去处理错误要快。像这样的错误处理看起来跟直觉相反 —— 当错误发生的时候通过放弃处理来获得可靠性。但是重启是解决暂时性错误的灵丹妙药。

错误处理和恢复最好是在代码的外层进行。这被称为端对端（end-to-end）原则。端对端原则说在一个连接的远端处理错误比在中间处理要更容易。即使在中间层进行处理，最终顶层的检查也无法被省去。如果不管怎样都需要在顶层来处理错误，那么为什么还要在里层去处理它们呢？

错误处理是一个系统可以紧密结合在一起的方式之一。除此之外还有许多其他紧耦合（tight coupling）的例子，但是要找一个糟糕的设计出来有一点不公平。除了 IMAP。

IMAP 中的每一个操作都像雪花一样，都有自己独特的选择和处理。错误处理相当痛苦：错误可能因为其他操作产生的结果而半路杀出。

IMAP 使用独特的令牌，而不是 UUID，来识别每一条信息。这些令牌也可能因为一个操作而中途被改变。许多操作都不是原子操作。找到一种可靠的方式将一封email从一个文件夹移动到另一个文件夹花费了25年时间。它还采用了一种特别的 UTF-7 编码，和一种独特的 base64 编码。

以上这些都不是我编的。

相比而言，文件系统和数据库是远程储存中好得多的例子。在文件系统中，操作的种类是固定的，但是却有很多可操作的对象。

虽然 SQL 像是一个比文件系统要广得多的接口，它仍然遵循相同的模式。若干对 set 的操作，许许多多对行的操作。虽然不能总是用一个数据库去替换出另一个数据库，但是找到可以和 SQL 一起使用的东西比找到任何一种自制的查询语言都更容易。

其他松耦合的例子有具备中间件、过滤器（filter）和管道（pipeline）的系统。例如，Twitter Finagle 的服务都是使用共同的 API，这使得泛型的超时处理、重试机制，和身份验证都能被毫不费力的加进客户端和服务器端的代码中。

（我很确定如果我不在这提UNIX管道的话，肯定会有人向我抱怨）

首先我们将我们的代码分层，但现在其中的一些层要共享一个接口：一系列有着不同实现的相同行为和操作。好的松耦合通常就意味着一致的接口。

一个健康的代码库不一定要完美的呈现出模块化。模块化的部分使写代码变得很有趣，就像乐高玩具的趣味来自于它所有的零件都可以被拼在一起一样。一个健康的代码库会有一些赘言和冗余，但它们使得可移植的组件间的距离恰到好处，因此你不会把自己套在里面。

松耦合的代码不一定就是易于删除的代码，但是它们替代和修改起来都会容易得多。

阶段7：持续的写代码

如果在写新代码的时候不需要去考虑旧有的代码，那么测试新的想法就要容易很多。并不是说一定要写小的模块，避免庞大的程序，而是说你的系统在你正常开发的同时还需要能够支持一两个试验。

功能发布控制（feature flag）是能让你在以后改变主意的一种方法。虽然 feature flag 被视作一种测试不同功能的方法，但同时它能让你在不重新部署的情况下就应用修改。

Google Chrome 是一个很好的例子，能说明其带来的好处。他们发现维持固定发布周期最困难的就是要合并一个长期存在的功能分支的时候。

能够在不需要重新编译的情况下激活和关闭新的代码，大的修改就可以在不影响现存代码的情况下被分解为更小的合并。如果新功能在代码库中更早出现的话，当一个长期的功能开发影响到其他部分的时候就会表现得更加明显。

Feature flag 并不是命令行开关，它是一种分离功能发布与合并分支，分离功能发布与代码部署的方式。当软件更新需要花费数小时、数天、甚至数周的时候，能够在运行中改变功能就变得越来越重要了。随便问一个运维人员，你就会知道任何一个可能在半夜把你叫起来的系统都值得在运行时去控制。

你更多的是要有一个反馈回路，而不是不停的迭代。模块更多的是用来隔离不同组件以应对改变的，而不仅是用来做代码复用的。处理代码的更改不仅仅是开发新的功能，同时也是抛弃掉旧的功能。写具有扩展性的代码是寄希望于三个月后你能把所有事情都做对。写可以被删除的代码则是基于相反的假设。

我在上文中谈到的策略 —— 分层、隔离、共同的接口、构造 —— 并不是有关写出优秀的软件的，而是关于怎样开发一个可以随着时间而改变的软件。

因此，管理上的问题不是要不要建一个试验性的系统然后把它抛弃掉。你会这么做的。［……］所以做好抛弃它的打算吧；无论如何你都会的。 Fred Brooks

你不必要将它全部抛弃，但是你需要删除某些部分。好的代码并不是要第一次就做对一件事。好的代码是那些不会造成障碍的遗留代码（legacy code）。

好的代码总是易于删除的代码。

programming is terriblelessons learned from a life wasted

2016-02-13

Write code that is easy to delete, not easy to extend.

“Every line of code is written without reason, maintained out of weakness, and deleted by chance” Jean-Paul Sartre’s Programming in ANSI C.

Every line of code written comes at a price: maintenance. To avoid paying for a lot of code, we build reusable software. The problem with code re-use is that it gets in the way of changing your mind later on.

The more consumers of an API you have, the more code you must rewrite to introduce changes. Similarly, the more you rely on an third-party api, the more you suffer when it changes. Managing how the code fits together, or which parts depend on others, is a significant problem in large scale systems, and it gets harder as your project grows older.

My point today is that, if we wish to count lines of code, we should not regard them as “lines produced” but as “lines spent” EWD 1036

If we see ‘lines of code’ as ‘lines spent’, then when we delete lines of code, we are lowering the cost of maintenance. Instead of building re-usable software, we should try to build disposable software.

I don’t need to tell you that deleting code is more fun than writing it.

To write code that’s easy to delete: repeat yourself to avoid creating dependencies, but don’t repeat yourself to manage them. Layer your code too: build simple-to-use APIs out of simpler-to-implement but clumsy-to-use parts. Split your code: isolate the hard-to-write and the likely-to-change parts from the rest of the code, and each other. Don’t hard code every choice, and maybe allow changing a few at runtime. Don’t try to do all of these things at the same time, and maybe don’t write so much code in the first place.

Step 0: Don’t write code

The number of lines of code doesn’t tell us much on its own, but the magnitude does 50, 500 5,000, 10,000, 25,000, etc. A million line monolith is going to be more annoying than a ten thousand line one and significantly more time, money, and effort to replace.

Although the more code you have the harder it is to get rid of, saving one line of code saves absolutely nothing on its own.

Even so, the easiest code to delete is the code you avoided writing in the first place.

Step 1: Copy-paste code

Building reusable code is something that’s easier to do in hindsight with a couple of examples of use in the code base, than foresight of ones you might want later. On the plus side, you’re probably re-using a lot of code already by just using the file-system, why worry that much? A little redundancy is healthy.

It’s good to copy-paste code a couple of times, rather than making a library function, just to get a handle on how it will be used. Once you make something a shared API, you make it harder to change.

The code that calls your function will rely on both the intentional and the unintentional behaviours of the implementation behind it. The programmers using your function will not rely on what you document, but what they observe.

It’s simpler to delete the code inside a function than it is to delete a function.

Step 2: Don’t copy paste code

When you’ve copy and pasted something enough times, maybe it’s time to pull it up to a function. This is the “save me from my standard library” stuff: the “open a config file and give me a hash table”, “delete this directory”. This includes functions without any state, or functions with a little bit of global knowledge like environment variables. The stuff that ends up in a file called “util”.

Aside: Make a util directory and keep different utilities in different files. A single utilfile will always grow until it is too big and yet too hard to split apart. Using a singleutil file is unhygienic.

The less specific the code is to your application or project, the easier they are to re-use and the less likely to change or be deleted. Library code like logging, or third party APIs, file handles, or processes. Other good examples of code you’re not going to delete are lists, hash tables, and other collections. Not because they often have very simple interfaces, but because they don’t grow in scope over time.

Instead of making code easy-to-delete, we are trying to keep the hard-to-delete parts as far away as possible from the easy-to-delete parts.

Step 3: Write more boilerplate

Despite writing libraries to avoid copy pasting, we often end up writing a lot more code through copy paste to use them, but we give it a different name: boilerplate. Boiler plate is a lot like copy-pasting, but you change some of the code in a different place each time, rather than the same bit over and over.

Like with copy paste, we are duplicating parts of code to avoid introducing dependencies, gain flexibility, and pay for it in verbosity.

Libraries that require boilerplate are often stuff like network protocols, wire formats, or parsing kits, stuff where it’s hard to interweave policy (what a program should do), and protocol (what a program can do) together without limiting the options. This code is hard to delete: it’s often a requirement for talking to another computer or handling different files, and the last thing we want to do is litter it with business logic.

This is not an exercise in code reuse: we’re trying keep the parts that change frequently, away from the parts that are relatively static. Minimising the dependencies or responsibilities of library code, even if we have to write boilerplate to use it.

You are writing more lines of code, but you are writing those lines of code in the easy-to-delete parts.

Step 4: Don’t write boilerplate

Boilerplate works best when libraries are expected to cater to all tastes, but sometimes there is just too much duplication. It’s time to wrap your flexible library with one that has opinions on policy, workflow, and state. Building simple-to-use APIs is about turning your boilerplate into a library.

This isn’t as uncommon as you might think: One of the most popular and beloved python http clients, requests, is a successful example of providing a simpler interface, powered by a more verbose-to-use library urllib3 underneath. requests caters to common workflows when using http, and hides many practical details from the user. Meanwhile, urllib3 does the pipelining, connection management, and does not hide anything from the user.

It is not so much that we are hiding detail when we wrap one library in another, but we are separating concerns: requests is about popular http adventures, urllib3 is about giving you the tools to choose your own adventure.

I’m not advocating you go out and create a /protocol/ and a /policy/ directory, but you do want to try and keep your util directory free of business logic, and build simpler-to-use libraries on top of simpler-to-implement ones. You don’t have to finish writing one library to start writing another atop.

It’s often good to wrap third party libraries too, even if they aren’t protocol-esque. You can build a library that suits your code, rather than lock in your choice across the project. Building a pleasant to use API and building an extensible API are often at odds with each other.

This split of concerns allows us to make some users happy without making things impossible for other users. Layering is easiest when you start with a good API, but writing a good API on top of a bad one is unpleasantly hard. Good APIs are designed with empathy for the programmers who will use it, and layering is realising we can’t please everyone at once.

Layering is less about writing code we can delete later, but making the hard to delete code pleasant to use (without contaminating it with business logic).

Step 5: Write a big lump of code

You’ve copy-pasted, you’ve refactored, you’ve layered, you’ve composed, but the code still has to do something at the end of the day. Sometimes it’s best just to give up and write a substantial amount of trashy code to hold the rest together.

Business logic is code characterised by a never ending series of edge cases and quick and dirty hacks. This is fine. I am ok with this. Other styles like ‘game code’, or ‘founder code’ are the same thing: cutting corners to save a considerable amount of time.

The reason? Sometimes it’s easier to delete one big mistake than try to delete 18 smaller interleaved mistakes. A lot of programming is exploratory, and it’s quicker to get it wrong a few times and iterate than think to get it right first time.

This is especially true of more fun or creative endeavours. If you’re writing your first game: don’t write an engine. Similarly, don’t write a web framework before writing an application. Go and write a mess the first time. Unless you’re psychic you won’t know how to split it up.

Monorepos are a similar tradeoff: You won’t know how to split up your code in advance, and frankly one large mistake is easier to deploy than 20 tightly coupled ones.

When you know what code is going to be abandoned soon, deleted, or easily replaced, you can cut a lot more corners. Especially if you make one-off client sites, event web pages. Anything where you have a template and stamp out copies, or where you fill in the gaps left by a framework.

I’m not suggesting you write the same ball of mud ten times over, perfecting your mistakes. To quote Perlis: “Everything should be built top-down, except the first time”. You should be trying to make new mistakes each time, take new risks, and slowly build up through iteration.

Becoming a professional software developer is accumulating a back-catalogue of regrets and mistakes. You learn nothing from success. It is not that you know what good code looks like, but the scars of bad code are fresh in your mind.

Projects either fail or become legacy code eventually anyway. Failure happens more than success. It’s quicker to write ten big balls of mud and see where it gets you than try to polish a single turd.

It’s easier to delete all of the code than to delete it piecewise.

Step 6: Break your code into pieces

Big balls of mud are the easiest to build but the most expensive to maintain. What feels like a simple change ends up touching almost every part of the code base in an ad-hoc fashion. What was easy to delete as a whole is now impossible to delete piecewise.

In the same we have layered our code to separate responsibilities, from platform specific to domain specific, we need to find a means to tease apart the logic atop.

[Start] with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others. D. Parnas

Instead of breaking code into parts with common functionality, we break code apart by what it does not share with the rest. We isolate the most frustrating parts to write, maintain, or delete away from each other.

We are not building modules around being able to re-use them, but being able to change them.

Unfortunately, some problems are more intertwined and hard to separate than others. Although the single responsibility principle suggests that ‘each module should only handle one hard problem’, it is more important that ‘each hard problem is only handled by one module’

When a module does two things, it is usually because changing one part requires changing the other. It is often easier to have one awful component with a simple interface, than two components requiring a careful co-ordination between them.

I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description [”loose coupling”], and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the code base involved in this case is not that. SCOTUS Justice Stewart

A system where you can delete parts without rewriting others is often called loosely coupled, but it’s a lot easier to explain what one looks like rather than how to build it in the first place.

Even hardcoding a variable once can be loose coupling, or using a command line flag over a variable. Loose coupling is about being able to change your mind without changing too much code.

For example, Microsoft Windows has internal and external APIs for this very purpose. The external APIs are tied to the lifecycle of desktop programs, and the internal API is tied to the underlying kernel. Hiding these APIs away gives Microsoft flexibility without breaking too much software in the process.

HTTP has examples of loose coupling too: Putting a cache in front of your HTTP server. Moving your images to a CDN and just changing the links to them. Neither breaks the browser.

HTTP’s error codes are another example of loose coupling: common problems across web servers have unique codes. When you get a 400 error, doing it again will get the same result. A 500 may change. As a result, HTTP clients can handle many errors on the programmers behalf.

How your software handles failure must be taken into account when decomposing it into smaller pieces. Doing so is easier said than done.

I have decided, reluctantly to use LATEX. Making reliable distributed systems in the presence of software errors. Armstrong, 2003

Erlang/OTP is relatively unique in how it chooses to handle failure: supervision trees. Roughly, each process in an Erlang system is started by and watched by a supervisor. When a process encounters a problem, it exits. When a process exits, it is restarted by the supervisor.

(These supervisors are started by a bootstrap process, and when a supervisor encounters a fault, it is restarted by the bootstrap process)

The key idea is that it is quicker to fail-fast and restart than it is to handle errors. Error handling like this may seem counter-intuitive, gaining reliability by giving up when errors happen, but turning things off-and-on again has a knack for suppressing transient faults.

Error handling, and recovery are best done at the outer layers of your code base. This is known as the end-to-end principle. The end-to-end principle argues that it is easier to handle failure at the far ends of a connection than anywhere in the middle. If you have any handling inside, you still have to do the final top level check. If every layer atop must handle errors, so why bother handling them on the inside?

Error handling is one of the many ways in which a system can be tightly bound together. Thre are many other examples of tight coupling, but it is a little unfair to single one out as being badly designed. Except for IMAP.

In IMAP almost every each operation is a snowflake, with unique options and handling. Error handling is painful: errors can come halfway through the result of another operation.

Instead of UUIDs, IMAP generates unique tokens to identify each message. These can change halfway through the result of an operation too. Many operations are not atomic. It took more than 25 years to get a way to move email from one folder to another that reliably works. There is a special UTF-7 encoding, and a unique base64 encoding too.

I am not making any of this up.

By comparison, both file systems and databases make much better examples of remote storage. With a file system, you have a fixed set of operations, but a multitude of objects you can operate on.

Although SQL may seem like a much broader interface than a filesystem, it follows the same pattern. A number of operations on sets, and a multitude of rows to operate on. Although you can’t always swap out one database for another, it is easier to find something that works with SQL over any homebrew query language.

Other examples of loose coupling are other systems with middleware, or filters and pipelines. For example, Twitter’s Finagle uses a common API for services, and this allows generic timeout handling, retry mechanisms, and authentication checks to be added effortlessly to client and server code.

(I’m sure if I didn’t mention the UNIX pipeline here someone would complain at me)

First we layered our code, but now some of those layers share an interface: a common set of behaviours and operations with a variety of implementations. Good examples of loose coupling are often examples of uniform interfaces.

A healthy code base doesn’t have to be perfectly modular. The modular bit makes it way more fun to write code, in the same way that Lego bricks are fun because they all fit together. A healthy code base has some verbosity, some redundancy, and just enough distance between the moving parts so you won’t trap your hands inside.

Code that is loosely coupled isn’t necessarily easy-to-delete, but it is much easier to replace, and much easier to change too.

Step 7: Keep writing code

Being able to write new code without dealing with old code makes it far easier to experiment with new ideas. It isn’t so much that you should write microservices and not monoliths, but your system should be capable of supporting one or two experiments atop while you work out what you’re doing.

Feature flags are one way to change your mind later. Although feature flags are seen as ways to experiment with features, they allow you to deploy changes without re-deploying your software.

Google Chrome is a spectacular example of the benefits they bring. They found that the hardest part of keeping a regular release cycle, was the time it took to merge long lived feature branches in.

By being able to turn the new code on-and-off without recompiling, larger changes could be broken down into smaller merges without impacting existing code. With new features appearing earlier in the same code base, it made it more obvious when long running feature developement would impact other parts of the code.

A feature flag isn’t just a command line switch, it’s a way of decoupling feature releases from merging branches, and decoupling feature releases from deploying code. Being able to change your mind at runtime becomes increasingly important when it can take hours, days, or weeks to roll out new software. Ask any SRE: Any system that can wake you up at night is one worth being able to control at runtime.

It isn’t so much that you’re iterating, but you have a feedback loop. It is not so much you are building modules to re-use, but isolating components for change. Handling change is not just developing new features but getting rid of old ones too. Writing extensible code is hoping that in three months time, you got everything right. Writing code you can delete is working on the opposite assumption.

The strategies i’ve talked about — layering, isolation, common interfaces, composition — are not about writing good software, but how to build software that can change over time.

The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. […] Hence plan to throw one away; you will, anyhow. Fred Brooks

You don’t need to throw it all away but you will need to delete some of it. Good code isn’t about getting it right the first time. Good code is just legacy code that doesn’t get in the way.

Good code is easy to delete.

Acknowledgments

Thank you to all of my proof readers for your time, patience, and effort.

Common Interfaces

The Design of the MH Mail System, a Rand technical report.

The Styx Architecture for Distributed Systems

Your Server as a Function, M. Eriksen.

Feedback loops/Operations lifecycle

Chrome Release Cycle, A. Laforge.

Why Do Computers Stop and What Can Be Done About It?, J. Gray.

How Complex Systems Fail, R. I. Cook.

The technical is social before it is technical.

All Late Projects Are the Same, Software Engineering: An Idea Whose Time Has Come and Gone?, T. DeMarco.

Epigrams in Programming, A. Perlis.

How Do Committees Invent?, M.E. Conway.

The Tyranny of Structurelessness, J. Freeman

flyingleo1981

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
要写易删除，而不易扩展的代码

译者序本文托管在 GitHub 上: https://github.com/freedombird9/code-easy-to-delete，欢迎 Star 或纠错。好的文章总是见解独到，功底深厚而逻辑清晰。这是一篇关于如何设计、架构代码的文章。文章的观点新颖而有力。作者的观点是，我们所做的一切 —— 重构、模块化、分层，等等，都是为了让我们的代码易于被删改，都是为了让遗留
复制链接

扫一扫