java百分之十分位数怎么算_一个求90分位数的算法优化

最新推荐文章于 2023-02-15 21:45:48 发布

Tim Pan

最新推荐文章于 2023-02-15 21:45:48 发布

阅读量1.7k

点赞数

文章标签： java百分之十分位数怎么算

本文链接：https://blog.csdn.net/weixin_30062561/article/details/114599169

版权

题目

给定文件，每个文件中有一行逗号分隔的数据，请找出该数据流中tp90 line，即第90百分位数。即按顺序处于第90%位置的数。

说明

比如文件内容： 2,3,4,5,10,8,9,1,6,7 排序后第90%位置为第9个，即为9。注意如果第90%长度不是整数，则向下取整。如数据流长度为115，115 * 90% = 103.5，则取第103个数。

思路

其实很简单，最简单的做法就是转成数组，使用jdk自带的sort(TimSort)方法排序，然后求出对应值即可。1000万数据在笔记本(SSD、8G内存、i5处理器)上运行，大概耗时7s左右。

优化

优化思路

不用全部排序，我们只需要大致定位区间，然后在这个区间排序即可

亿级以内无需使用多线程，多线程的损耗大于收益，比如对cpu的二级缓存不友好等，这是很重要的优化点

jdk的集合类，包装类能不用就不用，哪怕是一个字符串转数字，就有不小的优化空间，同样也是缓存友好

上代码

import java.io.BufferedInputStream;

import java.io.File;

import java.io.FileInputStream;

import java.util.Arrays;

public class Solution {

public static final int ASCII_0 = 48;

public int getTp90Line(File file){

int[] record = new int[1024*10];//初始化一堆区间，用来记录这个区间内有多少个数

byte[] buffer = new byte[1024*1024*10];//IO缓冲

int all = 0;

int temp = 0;

try(BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file))){

int i;

while((i = bis.read(buffer,0,buffer.length)) > 0){

for(int a = 0;a

if(buffer[a] == ','){

all++;

int index = temp >> 10;//除以1024，用位运算

if(index >= record.length){

//扩容

int[] newRecord = new int[index+1];

System.arraycopy(record,0,newRecord,0,record.length);

record = newRecord;

}

record[index]++;

temp = 0;

}else{

int n = buffer[a] - ASCII_0;

temp = temp*10+n;//字符串转数字，使用基本类型计算

}

all++;

int index = temp >> 10;

if(index >= record.length){

int[] newRecord = new int[index+1];

System.arraycopy(record,0,newRecord,0,record.length);

record = newRecord;

}

record[index]++;

} catch (Exception e){

e.printStackTrace();

}

int tp90 = (int) (all*0.9d)-1;

int start = 0;

int lessThanTp90 = 0;

while(start < record.length){

int t = lessThanTp90 + record[start];

if(t > tp90){

break;

}else{

lessThanTp90 = t;

}

start++;

}

//找到目标所在区间

int[] targetSection = new int[record[start]];

int len = 0;

try(BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file))){

int i;//第二次读取，因为已经确定了90分位数所在区间，那么只需要记录该区间的值，然后排序即可

while((i = bis.read(buffer,0,buffer.length)) > 0){

for(int a = 0;a

if(buffer[a] == ','){

int index = temp >> 10;

if(index == start ){

targetSection[len++]=temp;

}

temp = 0;

}else{

int n = buffer[a] - ASCII_0;

temp = temp*10+n;

}

int index = temp >> 10;

if(index == start ){

targetSection[len++]=temp;

}

} catch (Exception e){

e.printStackTrace();

}

Arrays.sort(targetSection);

return targetSection[tp90 - lessThanTp90];

}

实际效果

同样的配置，1000万数据大概在300~400ms，内存占用更是小了很多，时间上有大概20倍的提升，空间提升(应该也超过20倍)不考虑，需要注意的是，要根据数据的分布因地制宜，这个算法只是一些优化的思路，也就是说，在一些场景下，自己实现一些比较简陋的类库，结合自身的数据情况，大幅提升应用的性能，毕竟7秒到350毫秒，体验上有云泥之别，而350毫秒到35毫秒，却没有那么明显。

Tim Pan

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java百分之十分位数怎么算_一个求90分位数的算法优化

题目给定文件，每个文件中有一行逗号分隔的数据，请找出该数据流中tp90 line，即第90百分位数。即按顺序处于第90%位置的数。说明比如文件内容： 2,3,4,5,10,8,9,1,6,7 排序后第90%位置为第9个，即为9。注意如果第90%长度不是整数，则向下取整。如数据流长度为115，115 * 90% = 103.5，则取第103个数。思路其实很简单，最简单的做法就是转成数组，使用jdk...
复制链接

扫一扫