IT 翻译是有多难？对 CSDN 翻译文的纠错

最新推荐文章于 2014-08-20 07:31:26 发布

Wolf0403

最新推荐文章于 2014-08-20 07:31:26 发布

阅读量3.7k

点赞数 1

分类专栏： Translations

本文链接：https://blog.csdn.net/Wolf0403/article/details/8473437

版权

Translations 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

今天 Fenng 在微博上链接了 CSDN 的一篇翻译文《GitHub历史上最糟糕宕机事故回放及反省》

http://weibo.com/1577826897/zdd2J1oh8

本着对 Fenng 品牌的信任，我就点去看了一下。

CSDN 译文开篇第一段是这样的……

开源中国社区翻译了该博文的部分内容，但可惜漏掉了不少重要内容，CSDN整理并编译了余下的部分，与大家分享。

所以，这份中文译本至少已经经过两次翻译和审阅，于是……我浪费了两个半小时，对照原文，忽略所有排版、不通顺的情况，仅对最基础、最严重的语义错误进行了以下勘误。除了 1、2 两条之外，译稿基本都属于内容表述指代错误或者意义与原文完全相反两种情况。

CSDN 译稿链接：http://www.csdn.net/article/2013-01-05/2813427-Github-Downtime-last-Saturday

原文链接：https://github.com/blog/1364-downtime-last-saturday

勘误部分，第二段是 CSDN 译稿，第三段是我的译稿。

1，
在次之前，……
在此之前，……

2，
(n-service
In-service

3，
... but our traffic was low enough at the time that it didn't pose any real problems.

并没有解决真正的问题

并没有导致真正的问题

4，

... to revert the software update and return to a redundant state at 1300 PST if we did not have a plan for resolving the issues ...
并且回滚到太平洋时间13:00的状态
并且在 13:00 PST 的时候开始回滚升级，恢复充分冗余的状态

5，

... When the agent on one of the switches is terminated, the peer has a 5 second timeout period where it waits to hear from it again. If it does not hear from the peer, but still sees active links between them, it assumes that the other switch is still running but in an inconsistent state. In this situation it is not able to safely takeover the shared resources so it...

一个部署在交换机上的代理被终止，一个节点在等待再次响应时出现了5秒的延迟。节点间无法彼此响应，但它们之间的链路仍然是联通的，可以预想其他的交换机也在以类似的状态运行，但都已经处于不能同步(消息)的状态。在这种情况下，交换机之间无法安全的接管共享资源，因此它……

一个部署在交换机上的代理被终止后，（成对部署的）另一个节点将等待 5 秒钟窗口期判断前者是否会恢复。如果它无法收到第一个节点的响应，却看到两者之间链路处于活跃状态，它会默认对方处于运行但状态不同步的情况。在这种情况下，它不能安全地接管与另一个路由器共同管理的资源，因此它……

6、

When the agent was terminated on the first switch, the links between peers did not go down since the agent is unable to instruct the hardware to reset the links. They do not reset until the agent restarts and is again able to issue commands to the underlying switching hardware. With unlucky timing and the extra time that is required for the agent to record its running state for analysis, the link remained active long enough for the peer switch to detect a lack of heartbeat messages while still seeing an active link and failover using the more disruptive method.

当在第一台交换机上的代理被终止，节点间的链路并没有减少，只是代理无法指示硬件重置链路，直到代理重新启动，并再次向相关交换器硬件发出命令。
（译者注：此处丢失半段）当第一个交换机上的代理被终止时，这一对路由器之间的链接并未被中断，因为代理无法操作硬件去中断链接。只有当代理程序重新启动后才可能发送命令操作底层硬件。当时间非常不巧，且路由器还需要更多额外时间由代理程序为分析而记录运行状态时，这一对路由器之间的链接保持了足够长时间的活跃状态，最终使得对端路由器发现了在活跃线路上心跳消息的缺失，因此进行了后面这种有着更强破坏性的故障转移操作。

7、

When this happened it caused a great deal of churn within the network as all of our aggregated links had to be re-established, leader election for spanning-tree had to take place, and all of the links in the network had to go through a spanning-tree reconvergence. This effectively caused

当发生这些的时候它引起巨大的流量损失并且我们所有的链路要重新建立，leader选择使用生成树协议(spanning-tree网络协议)，并且所有网络中的链路通过生成树协议恢复。
这个过程带来了网络内部的巨大波动，因为所有的聚合链路要重新建立、spanning-tree 协议中要求的领导者选举过程必须完成，而且网络中的所有链路都必须重新进行 spanning-tree 的收敛过程。这一切直接导致了……

8、

We want to be certain that we don't wind up in a "split-brain" situation where data is written to both nodes simultaneously since this could result in potentially unrecoverable data corruption.

 我们想确保我们没有进入一个”精神分裂“状态，比如数据写入到了两个节点中而且还无法回复这个数据传输中断的情况。
我们想确保我们没有进入“精神分裂”（也就是数据被同时写入两个节点）的状态，因为这样可能导致无法恢复的数据错误。

（注：这处前半错误不严重，只是把「唯一」的修饰指代当成了「之一」的举例而已。后半……是机器翻译么？）

9、

When the network recovered and the cluster messaging between nodes came back, a number of pairs were in a state where both nodes expected to be active for the same resource. This resulted in a race where the nodes terminated one another and we wound up with both nodes stopped for a number of our fileserver pairs.
当网络恢复的时候，分布在节点间的簇信息传送回来，很多对结点都在抢相同的资源，由此导致严重竞争，我们关闭了这些节点。
当网络恢复、节点之间的消息被送达之后，许多对服务器都处于这样的状态：两台服务器都认为自己应该接管共享的资源。这个竞争状态导致两台服务器互相停止了对端（译者注：利用前述 STONITH 进程）。我们有多对文件服务器最终都处于全部停止的状态。

10、

We monitored the network for roughly thirty minutes to ensure that it was stable before beginning recovery.
严密监视网络30分钟，以确保是否稳定复苏
（在开始恢复操作之前）我们严密监视网络 30 分钟……

（译者注：把开始之前的操作变成了开始之后序列的最后一条，堪忧）

11、

When both nodes are stopped in this way it's important that the node that was active before the failure is active again when brought back online, since it has the most up to date view of what the current state of the filesystem should be.
当双节点由于上诉故障停止工作后，待再次重新联机的时候再次激发这些活跃节点尤为重要，因为它们影响到文件系统的当前状态。
如前述被停止的成对节点在恢复时，重新激活鼓掌之前已被激活运行的节点尤为重要，因为它拥有对于当前文件系统应该所处的正确状态的最新信息。

12、

This recovery was a very time consuming process and we made the decision to leave the site in maintenance mode until we had recovered every fileserver pair.
这种恢复是一个非常耗时的过程，我们决定在维护模式下离开现场，直到最终恢复每一个文件服务器对。
……，我们决定在恢复每个文件服务器之前保持（GitHub.com 运行在）维护模式。

13、

... and we returned the site to service at 20:23 PST.
……我们在太平洋时间20:23返回现场工作。
……我们在太平洋时间 20:23 重新上线。

14、

our vendor plans to revisit the respective timeouts so that more time is given for link failure to be detected to guard against this type of event.
我们的网络供应商将重新审查个别延迟时间的状况，以应对链路故障的问题并防止此类事情的再次发生。
我们的网络供应商将重新审查个别延迟时间的状况，（通过增加路由器超时设置）使路由器有更多时间去检查判断链路超时，以防止此类事件在此发生。

15、

We are postponing any software upgrades to the aggregation network until we have a functional duplicate of our production environment in staging to test against.
我们已经推迟了所有针对聚合网络的软件升级事宜，直到我们测试成功生产环境的功能复制模式。
……直到我们在测试环境（Staging）建立起与生产环境功能完全一致的副本用于测试。

16、

The fact that the cluster communication between fileserver nodes relies on any network infrastructure has been a known problem for some time. We're actively working with our hosting provider to address this.

文件服务器节点所依赖的网络设施发生故障影响了集群的通信，我们积极地与主机提供商协调并解决了这个问题。
文件服务器节点之间的通信依赖于（其它的）网络基础设施是一个长久以来已知的问题。我们正在与主机提供商积极协调、寻找解决这个问题的方法。

17、

We are reviewing all of our high availability configurations with fresh eyes to make sure that the failover behavior is appropriate.
我们正在重新评估我们的高可用性配置环境，并以全新的方式去实现故障迁移。
我们引入了新的人员对我们的高可用性配置环境配置进行重新评估以确保故障迁移行为是合适的。

以上。