大规模正整数求中位数

最新推荐文章于 2022-06-22 22:06:09 发布

bnuf

最新推荐文章于 2022-06-22 22:06:09 发布

阅读量950

点赞数

分类专栏：面试题文章标签：算法测试大数据

本文链接：https://blog.csdn.net/bnufq/article/details/7492370

版权

面试题专栏收录该内容

17 篇文章 0 订阅

订阅专栏

1. 中位数定义

（1）给定长度为len的数组item；

（2）当len为奇数时，中位数为item中第round(len/2)+1大的数；

（3）当len为偶数时，中位数为item中第round(len/2)和round(len/2)+1大的数。

2. 分析

（1）根据中位数的定义，很自然的想法是对item进行排序，然后二分查找第k大的数，但

大规模数据排序太复杂。而且对于求中位数而言，排序得到的信息太多，很多是不需要的；

（2）由于寻找的是中位数，很多部分的顺序对求中位数不会造成影响，重要的是中位数

附近的顺序；

（3）考虑对数据进行分箱，算法见3。

3. 算法--寻找第k大的数

（1）将原始数据分成2>>16个区间，扫描一遍item，统计出每个区间内的数据个数；

（2）找到中位数所在的区间space的id以及中位数在该区间内是第几个数，记为index；

（3）扫描一遍item，将属于space中的整数模2>>16（相当于减去id*(2>>16)）后的数据进行统计；

（4）寻找第index个数据所在的区间序号id1，中位数为id1+id*(2>>16)；

4. 算法--寻找中位数

（1）当len为奇数时，寻找第round(len/2)+1大的数并输出；

（2）当len问偶数时，寻找第round(len/2)和round(len/2)+1大的数并输出，其中3中的第（1）

步只用执行一遍。

4. 代码（未测试）

#include <iostream>
#include <cstring>
using namespace std;
/*
 * 程序实现大规模数据找中位数，数据都在整形范围内
 * 且为正数
 * 实际应用中从文件输入数据，为了简便，程序中直接
 * 从数组item中寻找中位数。
 * 时间复杂度:O(3n)，n为寻找中位数的数组的长度。
 * 空间复杂度:256Kb
*/

void findMiddleNum(int* item, int len)
{
    int count[65536];
    memset(count, 0, 65536*sizeof(int));
    for(int i = 0; i < len; i++)
    {
        count[item[i]>>16] ++;
    }

    int sum = 0, i = 0, j, k, pos1, pos2, index1, index2;
    while(sum <= len/2-1)
    {
        sum += count[i];
        i ++;
    }
    pos1 = i-1;
    index1 = len/2-(sum-count[pos1]);
    while (sum <= len/2)
    {
        sum += count[i];
        i ++;
    }
    pos2 = i-1;
    index2 = len/2+1-(sum-count[pos2]);

    if(len%2 == 0)
    {
        memset(count, 0, 65536*sizeof(int));
        for(i = 0; i < len; i ++)
        {
            if(item[i]>>16 == pos1)
            {
                count[item[i]-pos1*65536] ++;
            }
        }
        sum = 0;
        i = 0;
        while(sum < index1)
        {
            sum += count[i];
            i ++;
        }
        cout << (i-1)+pos1*65536 << " ";
    }
    memset(count, 0, 65536*sizeof(int));
    for(i = 0; i < len; i ++)
    {
        if(item[i]>>16 == pos2)
        {
            count[item[i]-pos2*65536] ++;
        }
    }
    sum = 0;
    i = 0;
    while(sum < index2)
    {
        sum += count[i];
        i ++;
    }
    cout << (i-1)+pos2*65536 << endl;
}

int main()
{
    int a[6] = {1, 2, 4, 3, 5, 6};
    findMiddleNum(a, 6);
    return 0;
}

5. 总结

寻找中位数的过程中对源数据的扫描次数为2~3次（当源数据个数为偶数时需要找两个数）；空间复杂度

和时间复杂度都较低。

bnuf

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
大规模正整数求中位数

1. 中位数定义（1）给定长度为len的数组item；（2）当len为奇数时，中位数为item中第round(len/2)+1大的数；（3）当len为偶数时，中位数为item中第round(len/2)和round(len/2)+1大的数。2. 分析（1）根据中位数的定义，很自然的想法是对item进行排序，然后二分查找第k大的数，但大规模数据排序太复杂。而且对于求中位数而言
复制链接

扫一扫

专栏目录