第二部分关于Cassandra1.0.x节点间通讯《草稿》

名剑传奇

于 2011-11-23 13:27:54 发布

阅读量1.8k

点赞数

分类专栏： J2EE技术 Java基础技术电子商务应用开发移动互联网政务应用开发内存数据库数据库群集云计算文章标签： cassandra 通讯八卦 exchange token network

J2EE技术同时被 3 个专栏收录

43 篇文章 0 订阅

订阅专栏

Java基础技术

32 篇文章 0 订阅

订阅专栏

电子商务应用开发

26 篇文章 0 订阅

订阅专栏

第二部分翻译了节点间的通信、群集成员和错误的发现和修复

原文

About Internode Communications (Gossip)

Cassandra uses aprotocol called gossip to discover location and state information about theother nodes participating in a Cassandra cluster. Gossip is a peer-to-peercommunication protocol in which nodes periodically exchange state informationabout themselves and about other nodes they know about.

In Cassandra, thegossip process runs every second and exchanges state messages with up to threeother nodes in the cluster. The nodes exchange information about themselves andabout the other nodes that they have gossiped about, so all nodes quickly learnabout all other nodes in the cluster. A gossip message has a version associatedwith it, so that during a gossip exchange, older information is overwrittenwith the most current state for a particular node.

译文

关于节点间通讯（八卦协议）《翻译草稿》

Cassandra使用一种叫做“八卦-gossip”的协议去发现加入群集的节点的信息和状态，gossip是一个点对点（p-to-p）的协议，它支持群集节点之间的信息和状态交换，这些信息可以节点主动发出的或是其他节点发出而被动接受的。

只要一个群集的节点数大于3个，那么gossip每秒钟都在运行交换节点间的信息。节点间交换的信息主要是关于它自己的和其他节点的，所以群集中得每个节点都在很快的彼此互相学习。

每一条gossip信息都携带有版本信息，以便在gossip进行信息交换式，旧的信息会被当前节点的新的信息所覆盖。

译者注

中国有一种神秘的“技术”叫做“八卦”，据说八卦是四通八达的一种有科学的技术，而且八卦上的每个元素之间既然要连通在一起，就必然有其因果关系，Cassandra的gossip技术确实也和中国的八卦技术有的一拼，而且还拿出实际的应用出来了，并的确可行的方案。

全世界都有一种称作为“八卦新闻”的东西，所谓八卦新闻也是来自八卦的特点，也就是你传我传你，你传他，很类似数据的1对N关系，N是未知数，可以是10，可以使10000，就如接受八卦新闻的人你估计不出来。但是八卦新闻确实传得很快，因为大家都感兴趣。

原文

About Cluster Membership and Seed Nodes

When a node firststarts up, it looks at its configuration file to determine thename of the Cassandra cluster it belongsto and which node(s), calledseeds, to contact to obtain information about the other nodes in the cluster.These cluster contact points are configured in the cassandra.yamlconfiguration file for a node.

Toprevent partitions in gossip communications, all nodes in a cluster should have the same list ofseed nodes listed in their configuration file.This is most criticalthe first time a node starts up. By default, a node will rememberother nodes it has gossiped with between subsequent restarts.

Note

The seed node designation has no purpose other than bootstrapping the gossip process for new nodes joining the cluster.Seed nodes are not a single point of failure, nor do they have any other special purpose in cluster operations beyond the bootstrapping of nodes.

Toknow what range of data it is responsible for, a node must also know its own token andthose of the other nodes in the cluster. When initializing a new cluster, youshould generate tokens for the entire cluster and assign an initial token toeach node before starting up. Each node will then gossip its token to theothers.See About DataPartitioning in Cassandra for more information aboutpartitioners and tokens.

译文

关于群集成员和种子节点

当一个节点第一次启动时，它会通过配置文件确定它所在的群集名称以及群集中得其他节点（叫做种子节点），并连接这样的节点获得群集中其他节点的信息。这些群集的接触点被设置在每一个节点的cassandra.yaml文件里面。

为了防止部分区域通讯故障，群集中每个节点的配置文件中都有相同的种子节点列表，重点是在节点启动时客户起到重要的作用。默认情况下，一个节点将会通过gossip协议记住其他节点的启动已否的情况。

备注

种子节点的设置是那些在加入群集时不需要自举的节点，种子节点不能有单点故障，在群集操作过程中也没有其他特殊的目的。

群集中得每个节点通过“令牌-token”了解自己的数据作用范围和其他节点的素具作用范围。当一个群集初始化时，群众中得每个节点会被指定一个token，不管是在配置时手工自指定的还是群集自动生成的，每个节点将告诉其他节点关于自己的token信息，这部分内容请参考“Cassandra关于分区器和令牌的数据分割”。

原文

About Failure Detection and Recovery

Failure detectionis a method for locally determining, from gossip state, if another node in thesystem is up or down.Failure detection information is also used byCassandra to avoid routing client requests to unreachable nodes wheneverpossible. (Cassandra can also avoid routing requests tonodes that are alive, but performing poorly, through the dynamicsnitch.)

Thegossip process tracks heartbeats from other nodes both directly (nodesgossiping directly to it) and indirectly (nodes heard about secondhand,thirdhand, and so on).Ratherthan have a fixed threshold for marking nodes without a heartbeat as down,Cassandra uses an accrual detection mechanism to calculate a per-node thresholdthat takes into account network conditions, workload, or other conditions thatmight affect perceived heartbeat rate. During gossip exchanges, every node maintains asliding window of inter-arrival times of gossip messages from other nodes inthe cluster. The value of phi is based on thedistribution of inter-arrival time values across all nodes in the cluster. InCassandra, configuring thephi_convict_thresholdproperty adjusts the sensitivity of the failure detector. The default value isfine for most situations, but DataStax recommends increasing it to 12 forAmazon EC2 due to the network congestion frequently experienced on thatplatform.

Node failures canresult from various causes such as hardware failures, network outages, and soon. Node outages are often transient but can last for extended intervals. Anode outage rarely signifies a permanent departure from the cluster, andtherefore does not automatically result in permanent removal of the node fromthe ring. Other nodes will still try to periodically initiate gossip contactwith failed nodes to see if they are back up. To permanently change a node’smembership in a cluster, administrators must explicitly add or remove nodesfrom a Cassandra cluster using the nodetoolutility.

When a node comesback online after an outage, it may have missed writes for the replica data itmaintains. Once the failure detector marks a node as down, missed writes arestored by other replicas ifhintedhandoff is enabled (for a period of time, anyways). However, itis possible that some writes were missed between the interval of a nodeactually going down and when it is detected as down. Or if a node is down forlonger thanmax_hint_window_in_ms(one hour by default), hints will no longer be saved. For that reason, it isbest practice to routinely runnodetoolrepair on all nodes to ensure they have consistent data, and toalso run repair after recovering a node that has been down for an extendedperiod.

译文

关于错误的发现和修复

故障检测可以通过群集的gossip协议，在本机即可确认其他节点是否已经启动或是关闭。故障检测信息也可用于无路由的状态下由客户端直接向节点发出请求而发现节点不可达。（Cassandra也可以在一个运行的节点发出无路由的请求，但是效果不好，而通过动态告密者效果会更好。）

Gossip通过心跳追踪每个节点的状态，不管是直接的方式还是通过间接的方式（比如经过第二手或是第三手中转的方式）。可以通过固定的方式获得节点的心跳状态，判断节点进入网路环境，负载，或是其他可能影响节点心跳速率的条件。在gossip期间，每个节点滚动维护着从其他节点发送过来的信息。Phi的值是基于跨越所有节点的到达值。在Cassandra里面通过设置phi_convict_threshold调整错误检测的灵敏度，默认值是主动发现绝大部分错误，但是在网络比较堵塞的情况下DataStax建议给亚马逊的EC2得错误检测值设置成12。

备注：错误可以来自任何情况，比如硬件，网络等等。节点中断往往是短暂的，但可以持续很长的时间。一个节点中断了就标记着它暂时与群集脱离了，因此它不会永久的从群集群组中脱离。其他节点仍然会尝试定期主动通过gossip与失败的节点接触，看看他们是否有备份。想永久的让一个节点脱离群集，管理员必须明确的通过nodetool把节点从一个群集中取出。

当一个节点修复之后并在此上线了，也许它少了一些已经写入其他节点的数据，那么它将从它故障点开始，从其他备份获得数据（但是不管怎么样都会间隔一小段时间），然而还是有可能被发现断开群集时，它已经缺少了很多写的操作了，或者一个节点离开群集的时间超过max_hint_window_in_ms设置的值，超过部分的写炒作将不会被记录，基于这个理由最好的方式是经常在所有节点上运行nodetool维修，以确保他们有一致的数据，并同时运行一个已经长时间的节点恢复后的修复。

译者注：本篇文章翻译得很差劲，主要的问题是对Cassandra复杂的错误处理机制还需要进一步了解。也发现字句的表达是在很差经，希望有经验的朋友可以帮忙看看。