由数据库的散列链接说起-CSDN博客

2019独角兽企业重金招聘Python工程师标准>>>

引子

数据库目前就常见的三种链接机制：嵌套，散列，归并

嵌套链接：较为简单，一般以表的数据集作为外部表，探查内部表（大表）的索引（如果有改索引的话），在此就不多说了，

有感于习惯sql server和oracle的强大，现在用mysql，见到表链接分分钟吐血，此文就作为缅怀过去吧

散列链接：在基于CBO分析的Sql Server这种方式经常会被用到。也是比嵌套更为高效的做法，oracle也一样，当然了这种做法在应用程序也是相当常见的

散列在数据库表链接的运用：

下图是最为简单散列的过程（图例根据看到资料和掌握的已有知识画出，可能不大准确，但是整个过程可以保证正确）

如上图

1，通过表的统计信息，查询的语句执行计划得出预估行（这是RBO中决定启用哪种链接机制的关键）选出结果集较小的，在一行的链接列上

调用hash函数得到一个结果，如果该结果与表头的号码一致，那么该结果就是属于该表头的后继节点。

2，大数据集采用相同的hash函数，用某种访问机制，匹配hash链表

3，如果链表表头没有对应的大数据集合通过hash函数计算的结果，那么数据库就会丢弃这行，选取下一行。

当每个链表只有一个节点，散列链接的效率为最高。

备注：数据库的实现远比上图来得更复杂，比如当内存不足以容纳一个散列表的时候，会采用多遍散列。但是就上图这一点足以启迪我们在代码的运用该方式，加快我们代码的效率

散列在应用程序中的运用1

在实际的运用中，会将一些基础信息加载到内存中，

如果采用一个一维的结构存储数据，然后遍历数据，平均效率情况下O(n^2)

采用下图结构，时间取决于字典结构的查找效率，

                    //创建一个二维结构
                    Dictionary<string, Dictionary<string, ResourceInfo>> resDict = new Dictionary<string, Dictionary<string, ResourceInfo>>(2000);
                    string cmdText = string.Format(@"
                                    SELECT distinct resourceid,ResourceType into #temp
                                      FROM [.....{0}] with(nolock)
                                      where DownloadDate={1};

                                    select a.f_id,a.restype,isnull(f_parent_cateid,0) PCID,isnull(f_cateid,0) CID,isnull(f_isgrant,0) IsGrant,f_isgame IsGame,
                                     f_identifier ResIdentifier,f_authorid AuthorId,f_License License,f_promotionprice PromotionPrice
                                    from  .... a with(nolock) inner join #temp b
                                      on a.restype=b.ResourceType and a.f_id=b.ResourceID;
                                    
                                    drop table #temp;"
                        , statdate.AddDays(-1).Year
                        ,statdate.AddDays(-1).ToString("yyyyMMdd"));

                    using (SqlDataReader reader = SqlHelper.ExecuteReader(computingDB_ConnString, CommandType.Text, cmdText))
                    {
                        while (reader.Read())
                        {
                            int resourceType = Convert.ToInt32(reader["restype"]);
                            ResourceInfo resInfo = new ResourceInfo
                            {
                                ResId = Convert.ToInt32(reader["f_id"]),
                                PCID = Convert.ToInt32(reader["PCID"] == DBNull.Value ? 0 : reader["PCID"]),
                                CID = Convert.ToInt32(reader["CID"] == DBNull.Value ? 0 : reader["CID"]),
                                IsGrant = Convert.ToByte(reader["IsGrant"] == DBNull.Value ? 0 : reader["IsGrant"]),
                                IsGame = Convert.ToByte(reader["IsGame"] == DBNull.Value ? 0 : reader["IsGame"]),
                                ResIdentifier =Convert.ToString(reader["ResIdentifier"] == DBNull.Value?"":reader["ResIdentifier"]),
                                AuthorId = Convert.ToInt32(reader["AuthorId"] == DBNull.Value ? 0 : reader["AuthorId"]),
                                //license 0标识付费
                                License = Convert.ToInt32(reader["License"] == DBNull.Value ? -1 : reader["License"]),
                                PromotionPrice = Convert.ToInt32(reader["PromotionPrice"] == DBNull.Value ? 0 : reader["PromotionPrice"])
                            };
                            
                            //二维结构中存储数据
                            string key = (resInfo.ResId % 200).ToString();
                            if (!resDict.ContainsKey(key))
                            {
                                resDict.Add(key, new Dictionary<string, ResourceInfo>(4000));
                            }
                            resDict[key].Add(resourceType.ToString() + "-" + resInfo.ResId.ToString(), resInfo);
                        }
                    }

运用2

下图是在运用hadoop 中reduce的运用，但是如果存在数据倾斜可能导致单个节点的JVM内存溢出

@ReduceConfig
public static class ReduceTask extends Reducer<Text, Text, Text, IntWritable> {

    private IntWritable rval = new IntWritable();
    private Multiset<Text> multiset=HashMultiset.create();

    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {

        multiset.clear();
        for (Text item:values){
            multiset.add(item);
        }

        rval.set(multiset.elementSet().size());
        context.write(key, rval);
    }
}

end

转载于:https://my.oschina.net/osenlin/blog/517206