stream对多个字段分组_在Java 8中对具有聚合的多个字段进行分组

I have a list of domain objects that relate to web access records. These domain objects can stretch into the thousands in number.

I don't have the resources or requirement to store them in a database in raw format, so instead I want to precompute aggregations and put the aggregated data in a database.

I need to aggregate the total bytes transferred in 5 minute windows, like the following SQL query

select

round(request_timestamp, '5') as window, --round timestamp to the nearest 5 minute

cdn,

isp,

http_result_code,

transaction_time,

sum(bytes_transferred)

from web_records

group by

round(request_timestamp, '5'),

cdn,

isp,

http_result_code,

transaction_time

In Java 8 my first current stab looks like this, I am aware this solution is similar to this response in Group by multiple field names in java 8

Map>>>>>> aggregatedData =

webRecords

.stream()

.collect(Collectors.groupingBy(WebRecord::getFiveMinuteWindow,

Collectors.groupingBy(WebRecord::getCdn,

Collectors.groupingBy(WebRecord::getIsp,

Collectors.groupingBy(WebRecord::getResultCode,

Collectors.groupingBy(WebRecord::getTxnTime,

Collectors.reducing(0,

WebRecord::getReqBytes(),

Integer::sum)))))));

This works, but it's ugly, all those nested maps are a nightmare! To "flatten" or "unroll" the map out into rows I have to do this

for (Date window : aggregatedData.keySet()) {

for (String cdn : aggregatedData.get(window).keySet()) {

for (String isp : aggregatedData.get(window).get(cdn).keySet()) {

for (String resultCode : aggregatedData.get(window).get(cdn).get(isp).keySet()) {

for (String txnTime : aggregatedData.get(window).get(cdn).get(isp).get(resultCode).keySet()) {

Integer bytesTransferred = aggregatedData.get(window).get(cdn).get(distId).get(isp).get(resultCode).get(txnTime);

AggregatedRow row = new AggregatedRow(window, cdn, distId...

As you can see this is pretty messy and difficult to maintain.

Anyone have any ideas of a better way to do this? Any help would be greatly appreciated.

I'm wondering if there is a nicer way to unroll the nested maps, or if there is a library that allows you to do a GROUP BY on a collection.

解决方案

You should create the custom key for your map. The simplest way is to use Arrays.asList:

Function> keyExtractor = wr ->

Arrays.asList(wr.getFiveMinuteWindow(), wr.getCdn(), wr.getIsp(),

wr.getResultCode(), wr.getTxnTime());

Map, Integer> aggregatedData = webRecords.stream().collect(

Collectors.groupingBy(keyExtractor, Collectors.summingInt(WebRecord::getReqBytes)));

In this case the keys are lists of 5 elements in fixed order. Not quite object-oriented, but simple. Alternatively you can define your own type which represents the custom key and create proper hashCode/equals implementations.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值