一致性 hash 算法的实现和平衡性测试

最新推荐文章于 2022-03-27 21:14:18 发布

测试猿David

最新推荐文章于 2022-03-27 21:14:18 发布

阅读量835

点赞数 2

分类专栏：测试软件测试工程师自动化测试文章标签：测试工程师单元测试黑盒测试软件测试 python

本文链接：https://blog.csdn.net/weixin_50271247/article/details/108776895

版权

自动化测试同时被 3 个专栏收录

767 篇文章 21 订阅

订阅专栏

软件测试工程师

690 篇文章 25 订阅

订阅专栏

测试

650 篇文章 17 订阅

订阅专栏

1、用你熟悉的编程语言实现一致性hash算法。

2、编写测试用例测试这个算法，测试100万KV数据，10个服务器节点的情况下，计算这些KV数据在服务器上分布数量的标准差，以评估算法的存储负载不均衡性。

一致性hash算法在1997年由麻省理工学院提出的一种分布式哈希（DHT）实现算法，设计目标是为了解决因特网中的热点(Hot spot)问题，初衷和CARP十分类似。一致性哈希修正了CARP使用的简单哈希算法带来的问题，使得分布式哈希（DHT）可以在P2P环境中真正得到应用。

一致性hash算法提出了在动态变化的Cache环境中，判定哈希算法好坏的四个定义：

1、平衡性(Balance)： 平衡性是指哈希的结果能够尽可能分布到所有的缓冲中去，这样可以使得所有的缓冲空间都得到利用。很多哈希算法都能够满足这一条件。

2、单调性(Monotonicity)： 单调性是指如果已经有一些内容通过哈希分派到了相应的缓冲中，又有新的缓冲加入到系统中。哈希的结果应能够保证原有已分配的内容可以被映射到原有的或者新的缓冲中去，而不会被映射到旧的缓冲集合中的其他缓冲区。

3、分散性(Spread)： 在分布式环境中，终端有可能看不到所有的缓冲，而是只能看到其中的一部分。当终端希望通过哈希过程将内容映射到缓冲上时，由于不同终端所见的缓冲范围有可能不同，从而导致哈希的结果不一致，最终的结果是相同的内容被不同的终端映射到不同的缓冲区中。这种情况显然是应该避免的，因为它导致相同内容被存储到不同缓冲中去，降低了系统存储的效率。分散性的定义就是上述情况发生的严重程度。好的哈希算法应能够尽量避免不一致的情况发生，也就是尽量降低分散性。

4、负载(Load)： 负载问题实际上是从另一个角度看待分散性问题。既然不同的终端可能将相同的内容映射到不同的缓冲区中，那么对于一个特定的缓冲区而言，也可能被不同的用户映射为不同的内容。与分散性一样，这种情况也是应当避免的，因此好的哈希算法应能够尽量降低缓冲的负荷。

一致性hash算法的实现和平衡性测试有3个关键点：

1）哈希函数的选择： 采用MurmurHash，MurmurHash是一种非加密型哈希函数，适用于一般的哈希检索操作。由Austin Appleby在2008年发明，并出现了多个变种，都已经发布到了公有领域(public domain)。与其它流行的哈希函数相比，对于规律性较强的key，MurmurHash的随机分布特征表现更良好。

2）100W数据产生： 采用随机数，保证测试的真实性。

3）查找效率： 结构采用TreeMap提高查找效率，TreeMap 是一个有序的key-value集合，它是通过红黑树实现的。
在这里插入图片描述

代码：

package hash;

//类封装了机器节点的信息 ，如name、ip、port等

public class CacheNode {

	private String name; // 虚拟节点Name

	private String ip; // 虚拟节点IP

	private String port; // 端口

	private int numCache; // Cache存储数量

	public CacheNode(String strName, String strIp, String StrPort) {

		name = strName;

		ip = strIp;

		port = StrPort;

		numCache = 0;

	}

	

	public void AddNumCache(){

		numCache++;

		return;

	}

	

	public int GetNumCache(){

		return numCache;

	}

}



package hash;

import java.util.List;

import java.util.SortedMap;

import java.util.TreeMap;



public class Shard<S> { // S类封装了机器节点的信息



	private TreeMap<Integer, S> nodes; // 虚拟节点

	private List<S> shards; // 真实机器节点列表

	private int VirNodeNum; // 每个机器节点关联的虚拟节点个数

	public Shard(List<S> shards,int nodeNum) {

		super();

		this.shards = shards;

		VirNodeNum= nodeNum;// 每个机器节点关联的虚拟节点个数

		init();

	}



	private void init() { // 初始化一致性hash环

		nodes = new TreeMap<Integer, S>();

		for (int i = 0; i != shards.size(); ++i) { // 每个真实机器节点都需要关联虚拟节点

			final S shardInfo = shards.get(i);

			

			for (int n = 0; n < VirNodeNum; n++)

			{

				// 一个真实机器节点关联VirNodeNum个虚拟节点

				nodes.put(hash("SHARD-" + i + "-NODE-" + n), shardInfo);

			}

		}



	}



	public S getShardInfo(String key) {

		SortedMap<Integer, S> tail = nodes.tailMap(hash(key)); // 沿环的顺时针找到一个虚拟节点

		if (tail.size() == 0) {

			return nodes.get(nodes.firstKey());

		}

		return tail.get(tail.firstKey()); // 返回该虚拟节点对应的真实机器节点的信息

	}

	/**

	 * MurMurHash算法，是非加密HASH算法，性能很高，

	 * 比传统的CRC32,MD5，SHA-1（这两个算法都是加密HASH算法，复杂度本身就很高，带来的性能上的损害也不可避免）

	 * 等HASH算法要快很多，而且据说这个算法的碰撞率很低. http://murmurhash.googlepages.com/

	 */

	

	/** 

     * Generates 32 bit hash from byte array of the given length and

     * seed.

     * 

     * @param data byte array to hash

     * @param length length of the array to hash

     * @param seed initial seed value

     * @return 32 bit hash of the given array

     */

    public static int hash32(final byte[] data, int length, int seed) {

        // 'm' and 'r' are mixing constants generated offline.

        // They're not really 'magic', they just happen to work well.

        final int m = 0x5bd1e995;

        final int r = 24;



        // Initialize the hash to a random value

        int h = seed^length;

        int length4 = length/4;



        for (int i=0; i<length4; i++) {

            final int i4 = i*4;

            int k = (data[i4+0]&0xff) +((data[i4+1]&0xff)<<8)

                    +((data[i4+2]&0xff)<<16) +((data[i4+3]&0xff)<<24);

            k *= m;

            k ^= k >>> r;

            k *= m;

            h *= m;

            h ^= k;

        }

        

        // Handle the last few bytes of the input array

        switch (length%4) {

        case 3: h ^= (data[(length&~3) +2]&0xff) << 16;

        case 2: h ^= (data[(length&~3) +1]&0xff) << 8;

        case 1: h ^= (data[length&~3]&0xff);

                h *= m;

        }



        h ^= h >>> 13;

        h *= m;

        h ^= h >>> 15;



        return h;

    }

    

    /** 

     * Generates 32 bit hash from byte array with default seed value.

     * 

     * @param data byte array to hash

     * @param length length of the array to hash

     * @return 32 bit hash of the given array

     */

    public static int hash32(final byte[] data, int length) {

        return hash32(data, length, 0x9747b28c); 

    }



    /** 

     * Generates 32 bit hash from a string.

     * 

     * @param text string to hash

     * @return 32 bit hash of the given string

     */

    public static int hash(final String text) {

        final byte[] bytes = text.getBytes(); 

        return hash32(bytes, bytes.length);

    }

}



package hash;



import java.util.ArrayList;

import java.util.List;



public class HashTest {



	public static void main(String[] args) {



		final int NUM_CACHENODE = 10;

		String name; // 虚拟节点Name

		String ip; // 虚拟节点IP

		String port = "9000"; // 虚拟节点IP



		for (int VirnodeNum = 10; VirnodeNum < 500; VirnodeNum = VirnodeNum + 20) {

			List<CacheNode> listCacheNode = new ArrayList<CacheNode>();



			// CacheNode准备

			for (int i = 0; i < NUM_CACHENODE; i++) {



				name = String.format("HashCache_.%d", i);

				ip = String.format("192.168.50.%d", i);

				CacheNode node = new CacheNode(name, ip, port);



				listCacheNode.add(node);

			}



			long startTime=System.currentTimeMillis();



			Shard<CacheNode> shardCache = new Shard<CacheNode>(listCacheNode, VirnodeNum);



			//100W数据加入测试

			for (int n = 0; n < 1000000; n++) {

				//Math.random()

				String key = String.format("%f", Math.random());// TODO key 选择

				

				CacheNode keyHash = shardCache.getShardInfo(key);

				//找到加++

				keyHash.AddNumCache();

			}

			

		     long endTime=System.currentTimeMillis();



		     //System.out.println(" 虚拟节点：" + VirnodeNum +"  100万次算法程序运行时间： " + (endTime - startTime ) + "ms");



			// 平静值 100000 =10W 计算标准差

			double std = 0;

			double avg =100000;

			for (int i = 0; i < NUM_CACHENODE; i++) {

				 CacheNode node = listCacheNode.get(i);

			     //System.out.println(" 节点：" + i +"  缓存个数： " + (node.GetNumCache()) );



				std = std

						+ Math.abs(avg - (double) node.GetNumCache()) * Math.abs(avg - (double) node.GetNumCache());

			}



			std = std / NUM_CACHENODE;

			std = Math.sqrt(std);

			

		     System.out.println(" 虚拟节点个数：" + VirnodeNum +"  100万次查找算法程序运行时间： " + (endTime - startTime ) + "ms" +" 标准差："+std);



			listCacheNode.clear();

		}

	}

}

测试结果数据展示
在这里插入图片描述

结论

算法的存储负载不均衡性使用标准差来评估，结合查找时间，从测试数据来看建议虚拟节点取150-250个
在这里插入图片描述
上面是我收集的一些视频资源，在这个过程中帮到了我很多。如果你不想再体验一次自学时找不到资料，没人解答问题，坚持几天便放弃的感受的话，可以加入我们扣扣群【313782132 】，里面有各种软件测试资源和技术讨论。
在这里插入图片描述
当然还有面试，面试一般分为技术面和hr面，形式的话很少有群面，少部分企业可能会有一个交叉面，不过总的来说，技术面基本就是考察你的专业技术水平的，hr面的话主要是看这个人的综合素质以及家庭情况符不符合公司要求，一般来讲，技术的话只要通过了技术面hr面基本上是没有问题（也有少数企业hr面会刷很多人）
我们主要来说技术面，技术面的话主要是考察专业技术知识和水平，上面也是我整理好的精选面试题。