hash 算法面试题解析

最新推荐文章于 2024-07-16 10:21:55 发布

enyes_fang

最新推荐文章于 2024-07-16 10:21:55 发布

阅读量1k

点赞数

分类专栏： Algorithm 文章标签：面试题 hash hashmap

本文链接：https://blog.csdn.net/fangchao2061/article/details/17242563

版权

Algorithm 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

面试题：

搜索的输入信息是一个字符串，总共有3000万的数据，但是大部分都是重复的，去重后，估计有300万数据，统计3000万输入信息中的最热门的前10条，我们每次输入的一个字符串为不超过255byte，内存使用只有1G。请描述思想，写出算法（java语言），空间和时间复杂度。

思路：

300万个字符串最多（假设没有重复，都是最大长度）占用内存3M*1K/4=0.75G。所以可以将所有字符串都存放在内存中进行处理。

可以使用key为字符串（事实上是字符串的hash值），值为字符串出现次数的hash来统计每个每个字符串出现的次数。并用一个长度为10的数组/链表来存储目前出现次数最多的10个字符串。

这样空间和时间的复杂度都是O(n)。

代码实现：

package org.enyes.algorithm.hash;
 
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
 
public class SearchTimeCounter
{
   private int maxCount = 10;
   
   private Map<String, Integer> searchTimeMap = null;
   
   public SearchTimeCounter()
    {
       searchTimeMap = new HashMap<String, Integer>();
    }
   
   public SearchTimeCounter(int size)
    {
       searchTimeMap = new HashMap<String, Integer>(size);
    }
   
   public void addSearchString(String searchString)
    {
       Integer searchCount = searchTimeMap.get(searchString);
       searchTimeMap.put(searchString, searchCount == null ? 1 : searchCount +1);
    }
   
   public void printTopSearch ()
    {
       List<Map.Entry<String, Integer>> result = newArrayList<Map.Entry<String, Integer>>(maxCount);
       Map.Entry<String, Integer> ct = null;
       int minIndex = -1;
        Iterator<Map.Entry<String,Integer>> it = searchTimeMap.entrySet().iterator();
       while (it.hasNext())
       {
           ct = it.next();
           if (minIndex == -1)
           {
                result.add(ct);
                minIndex =minSearchIndex(result);
           }
           else if (result.get(minIndex).getValue() < ct.getValue())
           {
                result.set(minIndex, ct);
                minIndex =minSearchIndex(result);
           }
       }
       System.out.println(searchTimeMap.size());
       System.out.println(result);
    }
   
   private int minSearchIndex(List<Map.Entry<String, Integer>>topSearchList)
    {
       int size = topSearchList.size();
       if (size < maxCount)
        {
           return -1;
       }
       int minIndex = 0;
       for (int i = 1; i < size; i++)
       {
           if (topSearchList.get(i).getValue() <topSearchList.get(minIndex).getValue())
           {
                minIndex = i;
           }
       }
       return minIndex;
    }
}

代码测试：

package org.enyes.algorithm.hash;
 
import java.util.Random;
 
import org.junit.Test;
 
public class SearchTimeCounterTest
{
    private static final char[] CHARS = "qazwsxedcrfvtgbyhnujmikolp".toCharArray();
   
    private Random random = new Random();
   
    @Test
    public void testGetTopSearch()
    {
        long startTime = System.currentTimeMillis();
        int count = 1000 * 10000;
        for (int i = 0; i < count; i++)
        {
            getRomdonString();
        }
        long endTime = System.currentTimeMillis();
        System.out.println("随机生成[" + count + "] 条随机数，cost time [" + (endTime -startTime) + "]ms");
        startTime = System.currentTimeMillis();
        SearchTimeCounter counter = newSearchTimeCounter();
        for (int i = 0; i < count; i++)
        {
           counter.addSearchString(getRomdonString());
        }
        counter.printTopSearch();
        endTime = System.currentTimeMillis();
        System.out.println("total cost time [" + (endTime- startTime) + "]ms");
}
   
    private String getRomdonString()
    {
        StringBuffer result = new StringBuffer();
        for (int i = 0; i < 4; i++)
        {
            result.append(CHARS[random.nextInt(26)]);
        }
        return result.toString();
    }
}

结果：

随机生成[10000000] 条随机数，cost time [4344]ms

456976

[jeuw=45, njwf=44, govv=44,ghkf=45, pdpj=44, lcyz=44, qwsb=44, shqc=45, elbc=47, dgvi=45]

totalcost time [7969]ms

由于采用随机数，所以产生的字符串频率都差不多。从结果看：随机产生1000W随机字符串话费了4秒，产生了45W的字符串总数，总耗时8秒，包括随机产生字符串的4秒。