java重复代码重构_重构重复代码

最新推荐文章于 2022-04-27 10:19:47 发布

郝ren

最新推荐文章于 2022-04-27 10:19:47 发布

阅读量2.2k

点赞数 1

文章标签： java leetcode

原文链接：https://medium.com/machine-words/refactoring-duplicate-code-5e17f20dcb51

版权

java重复代码重构

As a software engineer working on a large project, you’ll likely be asked to do cleanup work on the code to make it more readable and more maintainable. There are several different ways in which code can be reorganized, but one of the most common refactoring tasks that you’ll be tasked with is elimination of duplicate code.

作为从事大型项目的软件工程师，可能会要求您对代码进行清理工作，以使其更具可读性和可维护性。有几种不同的方法可以重组代码，但是您要承担的最常见的重构任务之一是消除重复的代码。

In fact, the very word refactor sort of implies elimination of redundancies. That is because the term originally comes from the word factor in the mathematical sense. When we “factor” an equation, we pull out repeated usages of a variable name and combine them into a single instance, so that an equation such as a × b + a × c becomes a × (b + c).

实际上，重构这个词本身就意味着消除了冗余。这是因为从数学意义上说，该术语最初来自词因子。当我们“分解”一个方程式时，我们会重复使用变量名并将它们组合成一个实例，这样一个方程式，例如a×b + a×c变为a×(b + c) 。

There are a number of benefits to doing this.

这样做有很多好处。

One obvious benefit is that it makes the program smaller. Less code means less effort needed to maintain and debug that code.

一个明显的好处是它使程序变小了。更少的代码意味着维护和调试该代码所需的精力更少。

A somewhat less obvious benefit is that spending time to optimize or improve common, shared code yields a higher payoff. If you have a nifty sort algorithm that is used in 10 different places in your program, and you make it twice as fast, you’ll get a larger overall benefit than if that same sort algorithm was only used once. Similarly, if you fix a bug in a common piece of code, the benefits of that bug fix apply to every user of that code.

不太明显的好处是花费时间来优化或改进通用的共享代码会带来更高的收益。如果您有一个漂亮的排序算法在程序中的10个不同位置使用，并且使其速度提高了一倍，则与仅使用一次相同的排序算法相比，您将获得更大的总体收益。同样，如果您修复通用代码段中的错误，则该错误修复的好处也适用于该代码的每个用户。

It’s easier to think about this benefit if you adopt the perspective that code is always changing and evolving.

如果您采用代码始终在变化和发展的观点，则更容易想到此好处。

You see, when I was a young and naive programmer starting out on my career in the late 70s, I was greatly impressed by the beauty of the idea that code was “forever” — once you solved a problem by expressing that solution in code, it was solved for all time, there was no need to go back and revisit that problem again.

您会发现，当我还是70年代后期刚起步的年轻而幼稚的程序员时，我对代码“永远”的美好印象印象深刻-一旦您通过用代码表达该解决方案来解决问题，这个问题一直都得到解决，因此无需回头再讨论该问题。

However, I later learned that I was dead wrong. I realized that code is not a static artifact — it’s not runes carved in stone or mathematical formulae. Code is constantly in motion, being improved and updated. Code changes because the world changes, and code has to keep up. There are a few “eternal” algorithms and data structures that have lasted for decades, but even those have to adapt to new programming languages and machine architectures. In fact, I have grown highly skeptical of code that never changes — this is often a sign of abandonment rather than success.

但是，后来我得知自己完全错了。我意识到代码不是静态工件，也不是用石头或数学公式雕刻的符文。代码在不断地发展，改进和更新。代码改变是因为世界在变化，并且代码必须跟上潮流。有些“永恒的”算法和数据结构已经持续了数十年，但即使是那些也必须适应新的编程语言和机器体系结构。实际上，我对那些永不更改的代码表示高度怀疑，这通常是放弃而不是成功的标志。

If you think about a function or a class as an ever-evolving stream of improvements rather than a timeless monument of intellect, you can easily imagine that it is much easier to manage a small collection of evolving entities than a larger number of them. Thus, refactoring duplicate code can make this evolution easier.

如果您将功能或类视为不断发展的改进流，而不是永恒的智慧纪念碑，则可以轻松地想象到，管理一小部分不断发展的实体要比管理大量的实体容易得多。因此，重构重复的代码可以使这种演变变得更加容易。

Some programmers are pretty obsessive about eliminating duplicate code — whenever they see two blocks of code that look suspiciously similar, they get a kind of itch that needs to be scratched, and the way to scratch it is merge those two blocks of code into one.

一些程序员非常痴迷于消除重复的代码-每当他们看到两个看起来可疑的代码块时，就会遇到一种需要刮擦的痒，而刮擦它的方法是将这两个代码块合并为一个。

重构并不总是那么容易 (Refactoring isn’t always easy)

However, eliminating duplicate code is not always as easy as you might imagine. There are several challenges which can occur.

但是，消除重复的代码并不总是像您想象的那样容易。可能会发生一些挑战。

Probably the most common source of difficulty is where you have two blocks of code that are similar, but not identical. In a case like this, you’ll need to generalize — that is, you’ll need to come up with a new function that can handle the work of both of the old code fragments you are trying to replace. Generalization is something that requires creative thought, it can’t just be done mechanically. The more different the two code blocks are, the harder it will be to come up with a general solution that fits both — especially if those differences aren’t merely superficial deviations but abstract and fundamental.

可能最常见的困难源是您有两个相似但不相同的代码块。在这种情况下，您需要进行概括—也就是说，您需要提出一个新功能，该功能可以处理您要替换的两个旧代码片段的工作。泛化是需要创造性思维的东西，不能仅仅机械地完成。两个代码块之间的差异越大，就很难找到一个适合两者的通用解决方案-特别是如果这些差异不仅是表面上的差异，而是抽象的和根本的差异。

It’s more difficult to generalize over differences if those differences are ones of structure or type, rather than differences of value. The reason is because types can’t be passed in as parameters (well, they can be passed in as template parameters in languages that support generics, but most languages have limits on what templates can do — see type erasure), and structural differences can’t be passed in as parameters at all. Thus, if you have two functions that look similar, but one operates on scalars and the other operates on arrays, you are going to have a hard time combining them into a single function that efficiently handles both.

如果差异是结构或类型的差异，而不是价值差异，则很难对差异进行概括。原因是因为类型不能作为参数传递(好吧，可以在支持泛型的语言中将它们作为模板参数传递，但是大多数语言对模板的作用有所限制(请参阅类型擦除 )，并且结构差异可以根本不作为参数传递。因此，如果您有两个看起来相似的函数，但一个函数在标量上运行，而另一个函数在数组上运行，您将很难将它们组合成一个可以有效处理这两个函数的函数。

Another problem you may have is that the blocks of code may be located in places that are hard to access — deep inside of another class or function, or even in a separate library. Merging those code blocks will require moving the code to some common context where it can be reached by all.

您可能遇到的另一个问题是，代码块可能位于难以访问的位置-在另一个类或函数的内部，甚至在单独的库中。合并这些代码块将需要将代码移到所有人都可以访问的某个通用上下文中。

The code in question might also have a lot of dependencies from the context where it lives. For example, a block of code inside a function might depend on local variables defined within the scope of the function body. Refactoring that common block into a separate function will require passing in all those variables as parameters instead. If there are too many such parameters, then calling the refactored common code will be cumbersome to use.

所讨论的代码可能还与它所处的上下文有很多依赖关系。例如，函数内部的代码块可能取决于在函数主体范围内定义的局部变量。将该公共块重构为单独的函数将需要传递所有这些变量作为参数。如果此类参数过多，则调用重构的通用代码将很麻烦。

Up to this point, we’ve talked about eliminating duplicate code as if it were always a good thing — perhaps challenging at times, but always worth attempting. But is that really true?

到现在为止，我们已经讨论过消除重复的代码，这似乎总是一件好事-有时可能很困难，但总是值得尝试的。但这是真的吗？

In fact, there are cases where eliminating duplicate code is not a good idea, and I want to outline some of those cases.

实际上，在某些情况下，消除重复的代码不是一个好主意 ，我想概述其中的一些情况。

命运不同 (Divergent Destinies)

Remember how I mentioned that code is ever-evolving? Well, you might have two functions that look identical now — but those two functions might be on radically different trajectories. In other words, those functions might be very different in a future version of the code, and tying them together prematurely might stifle the evolution of one or both of them.

还记得我提到过代码在不断发展吗？那么，你可能有一个现在看起来相同的两个功能-但是这两个功能可能会在完全不同的轨迹 。换句话说，这些功能在代码的未来版本中可能会大不相同，并且过早地将它们绑在一起可能会扼杀其中一个或两个的演进。

A real-world example I can think of: say you have a simple function that appends strings to a buffer. And let’s say that you discover that there’s another function, in another part of the code, that does nearly the same thing. So the logical thing to do is to combine them, right?

我可以想到的一个真实示例：假设您有一个简单的函数，可将字符串追加到缓冲区。假设您发现在代码的另一部分中存在另一个功能几乎相同的功能。所以合乎逻辑的事情是将它们结合起来，对吗？

Except I haven’t told you the whole story. You see, these two functions live inside of a multi-threaded environment. One of those functions is called in a way that is always thread-safe — that is, the function is only ever called by one thread. The other function, however, is called by many threads.

除了我没有告诉你整个故事。您会看到，这两个函数存在于多线程环境中。这些函数之一的调用始终是线程安全的，即该函数只能由一个线程调用。但是，另一个函数被许多线程调用。

Now, you notice that this second function is made thread-safe by the fact that all of the calls to it are guarded by a mutex (mutual exclusion) lock. So no problem, right? We can merge the two functions, and then multi-threaded callers can guard any calls to the string append function by acquiring the lock first, while the single-threaded code doesn’t need to bother with locks. So they both can share the same code…right?

现在，您会发现，这第二个功能是由一个事实，即所有它由互斥锁保护的呼叫( 亩图阿尔除外)锁线程安全的。所以没问题吧？我们可以合并这两个函数，然后多线程调用者可以通过先获取锁来保护对字符串附加函数的任何调用，而单线程代码则无需打扰。这样他们俩可以共享相同的代码…对吗？

Except…mutex locks are expensive in terms of CPU time. Using a mutex lock may be good enough for now. But at some point in the future you might want to switch to something better and faster. This might mean re-writing your string append function to use a “lockless” algorithm that takes advantage of atomic CPU instructions. That’s a much more complicated implementation that requires a total rewrite of the function, but it’s lots faster that trying to acquire a mutex every time you call it.

除了...互斥锁在CPU时间方面很昂贵。使用互斥锁可能已经足够了。但是在将来的某个时候，您可能希望切换到更好，更快的方式。这可能意味着要重写字符串附加函数，以使用利用原子CPU指令的“无锁”算法。这是一个复杂得多的实现，需要完全重写该函数，但是每次调用它都需要获取互斥量要快得多。

However, even atomic instructions aren’t free. They are much faster than a mutex, sure — but they are slower than using regular non-atomic instructions.

但是，即使原子指令也不是免费的。当然，它们比互斥锁要快得多，但是比使用常规的非原子指令要慢。

If you kept these two functions separate, without refactoring them, then there is no problem — the single-thread users can continue to call the fast, non-thread-safe fast version. The multi-threaded users can call the thread-safe version that uses atomic instructions, which is slightly slower, but those callers are willing to pay that minor cost for thread safety.

如果将这两个函数分开而不进行重构，就没有问题-单线程用户可以继续调用快速，非线程安全的快速版本。多线程用户可以调用使用原子指令的线程安全版本，后者稍慢一些，但是那些调用者愿意为线程安全支付少量费用。

However, if you combined those two functions into one, now you have a problem. If you re-write the function to use atomic instructions, it will speed up all of the multi-threaded users, since they no longer need to use locking— but it will slow down all the single-threaded ones. By making our function try to serve too many needs at once, we have achieved less than optimal performance.

但是，如果您将这两个功能合而为一，那么您将遇到问题。如果您重写该函数以使用原子指令，则它将加快所有多线程用户的速度，因为他们不再需要使用锁定，但是它将降低所有单线程用户的速度。通过使我们的功能尝试同时满足过多的需求，我们所获得的性能达不到最佳性能。

How can you avoid scenarios like these? The problem is that you can’t predict the future of a given block of code just visually inspecting it. You can, however, get a sense of what changes are likely to happen in the future if you are intimately familiar with the development plans for your whole project. Using the previous example, if you know that thread safety and performance improvement are important goals for your team, you can then predict that someday someone will want to migrate mutex-based code into atomic, lock-free code. This in turn guides your decisions around refactoring that code.

如何避免出现此类情况？问题在于，仅凭视觉检查代码就无法预测给定代码块的未来。但是，如果您非常熟悉整个项目的开发计划，则可以了解将来可能发生的变化。使用前面的示例，如果您知道线程安全性和性能改进对于您的团队而言是重要的目标，那么您可以预测某天某人希望将基于互斥量的代码迁移为原子的，无锁的代码。反过来，这将指导您围绕重构该代码的决策。

糊涂的意思 (Muddled Meanings)

One bad habit I sometimes see is programmers refactoring code based on the incidental details of implementation rather than a clear, common definition of what the code is supposed to mean.

我有时看到的一个坏习惯是，程序员根据实现的偶然细节来重构代码，而不是根据代码的含义进行清晰，通用的定义。

Suppose Dave and Alice each have a list of chores. Dave’s list looks like this:

假设Dave和Alice都有一个琐事清单。 Dave的列表如下所示：

Feed the cat
喂猫
Write a letter to Mom
给妈妈写一封信
Do laundry
洗衣服
Check bank balance
检查银行余额
Organize bookshelf
整理书架

While Alice’s list looks like this:

虽然爱丽丝的清单看起来像这样：

Pay bills
支付帐单
Write a letter to Mom
给妈妈写一封信
Do Laundry
洗衣服
Check bank balance
检查银行余额
Clean the BBQ grill
清洁烧烤架

A zealous programmer might look at these two lists and spot the fact that items 2–4 on each list are the same. Why not eliminate the duplication by combining those three steps in a separate function?

一个热心的程序员可能会看这两个列表，并发现每个列表中的项目2-4是相同的。为什么不通过将这三个步骤合并到一个单独的函数中来消除重复呢？

The problem is that there is no logical connection between tasks 2, 3 and 4. A function that performed these tasks would be logically incoherent — what would you even call it? You wouldn’t want to have a name like WriteLetterToMomAndDoLaundryAndCheckBankBalance. You can’t call it DoChores, since there are other chores that are not included. DoMiscChores or DoCommonChores are also bad name choices because they are vague about what the function does.

问题在于任务2、3和4之间没有逻辑连接。执行这些任务的函数在逻辑上是不连贯的-您甚至会称它为什么？您不希望使用类似WriteLetterToMomAndDoLaundryAndCheckBankBalance的名称。您不能将其DoChores ，因为其中不包含其他杂项。 DoMiscChores或DoCommonChores也是错误的选择，因为它们对函数的功能含糊不清。

Unless you can find some unifying concept or principle that defines the meaning of the newly-refactored function, then by refactoring the code you have actually made the program harder to understand rather than easier — which defeats the purpose of refactoring in the first place.

除非您找到定义新重构函数含义的统一概念或原理，否则通过重构代码，实际上会使程序更难于理解，而不是变得更容易，这首先破坏了重构的目的。

Good naming really helps — if you can come up with a succinct name for the function that accurately describes what it does, then that’s a good sign that you have identified a clear semantic meaning for the refactored code.

好的命名确实有帮助-如果您可以为该函数想出一个简洁的名称来准确地描述其功能，则表明您已经为重构的代码确定了明确的语义含义。

可读性ROI (Readability ROI)

There’s one other aspect to consider when refactoring code, which is that indirection has a mental cost. To get a full understanding of this cost, please refer to footnote one¹ of this essay.

重构代码时，还需要考虑另一个方面，那就是间接性会带来精神上的损失。为了全面了解这笔费用，请参阅本文的脚注一¹。

You see what happened? Your reading was interrupted and you had to scroll down to the bottom of the article. The same is true for function calls — there are many instances where the meaning of the code is not clear unless you read each of the executable statements in order, and refactoring things into separate functions can make reading harder.

你看到发生了什么事吗？您的阅读被打断了，您不得不向下滚动到文章的底部。函数调用也是如此—在很多情况下，除非您按顺序读取每个可执行语句，否则代码的含义不清楚，并且将内容重构为单独的函数会使读取变得更困难。

This is especially true in certain contexts where the sequential nature of the code serves as a kind of log or document — for example, unit test code tends to be very linear, where the various assertions and expectations listed in the test case can also serve as a set of bullet points outlining the programmer’s assumptions. Fancy refactoring can detract from this, making the code harder to read as a linear document.

在某些情况下，尤其是在某些情况下，其中代码的顺序性质可以用作日志或文档，例如，单元测试代码往往是非常线性的，在测试案例中列出的各种断言和期望也可以用作代码。一组要点，概述了程序员的假设。花式重构会降低这一点，使代码更难以作为线性文档阅读。

As with the previous section, having good naming can mitigate this to a great extent. If the name of a function is descriptive enough that you don’t actually need to look inside of it to know what it does, then you can still read the calling code sequentially without excessing jumping around.

与上一节一样，具有良好的命名可以在很大程度上减轻这种情况。如果一个函数的名称具有足够的描述性，实际上您无需查看其内部即可知道它的功能，那么您仍然可以按顺序读取调用代码，而不会过多地跳转。

结论 (Conclusion)

Many of these pitfalls that I have outlined can be categorized as “overfactoring”, a term I sometimes use to mean an overly-zealous approach to refactoring, especially in cases where code looks superficially similar, but in fact has deep differences. There is often a lot of pressure in software teams to reduce duplicated code, even in cases where such factoring is not warranted.

我概述的许多陷阱都可以归类为“过度分解”，我有时用这个术语来表示过度热衷的重构方法，尤其是在代码表面上看起来很相似，但实际上却存在巨大差异的情况下。即使在没有必要进行这种分解的情况下，软件团队通常也面临着减少重复代码的巨大压力。

So as with most things in software, it’s a balancing act. Eliminating duplicate code is a good thing in the majority of cases; but be careful not to take it too far.

因此，与软件中的大多数事物一样，这是一种平衡行为。在大多数情况下，消除重复的代码是一件好事。但请注意不要将其拉得太远。

脚注 (Footnotes)

All human writing systems allow text to be scanned and interpreted sequentially, however the mechanism of footnotes breaks this pattern, requiring the reader to consume the text in a non-linear fashion, “jumping around” from place to place. In literature, footnotes are generally only used for non-essential text, so that the reader has the option to skip over them if they wish.
所有的人类书写系统都允许顺序地扫描和解释文本，但是脚注的机制打破了这种模式，要求读者以非线性方式使用文本，“到处跳来跳去”。在文学中，脚注通常仅用于非必要的文本，因此读者可以根据需要选择跳过它们。