java中row类,如何最好地从CSV中的Java类中存储数据?是Row对象的单个列表,还是嵌套对象的单个对象?...

Date,Locality,District,New Cases,Hospitalizations,Deaths

5/21/2020,Accomack,Eastern Shore,709,40,11

5/21/2020,Albemarle,Thomas Jefferson,142,19,4

5/21/2020,Alleghany,Alleghany,9,4,0

5/21/2020,Amelia,Piedmont,22,7,1

5/21/2020,Amherst,Central Virginia,25,3,0

5/21/2020,Appomattox,Central Virginia,25,1,0

5/21/2020,Arlington,Arlington,1763,346,89

... // skipped down to the next day

5/20/2020,Accomack,Eastern Shore,709,39,11

5/20/2020,Albemarle,Thomas Jefferson,142,18,4

5/20/2020,Alleghany,Alleghany,10,4,0

5/20/2020,Amelia,Piedmont,21,7,1

5/20/2020,Amherst,Central Virginia,25,3,0

5/20/2020,Appomattox,Central Virginia,24,1,0

5/20/2020,Arlington,Arlington,1728,334,81

5/20/2020,Augusta,Central Shenandoah,88,4,1

... // continued

I have data for a State in the US like the above in a CSV and would like to do some data analysis on it so that I can send it through a rest API. The data analysis that I would like to do are various aggregations, such as: total cases across the state by date, total cases for the entire state , total cases grouped by district, total cases for a district by date, total cases for a county by date, etc. Just all the basic groupby's that one could do with this data.

Now, my problem is figuring out how to properly store this data in java, without a database. I have one successful implementation using a list of Row objects, where each Row object contains just one row in the CSV. Then using java's Stream api I have been able to filter and get some of these statistics. I then package these statistics into a single Row object or a List and send it to the API to be parsed into JSON. This has worked ok, but I feel that this is not the best way.

Is there some other more object-oriented way to utilize the Date, District, County, Cases column.

I was thinking of doing something like this :

class State {

List districtList;

String name;

}

class District {

List countyList;

String name;

}

class County {

LocalDate date;

String name;

int cases;

// more stuff

}

Then I would create one State object with a list of District objects, each with a list of many County objects, one per date.

Does this seem like overkill? Is there some other clean way to read this dataset into a data structure that allows for easily aggregating summary information.

The way that I'm currently doing it now works, but I am looking for a better way!

解决方案

From your description, your approach seems sound, and properly object-oriented. However, without additional information (e.g. specific aggregations which may dictate otherwise), it seems odd you would have multiple "duplicate" 'County' objects in your District objects. For example:

[{"date":"5/21/2020","name":"Accomack"},

{"date":"5/20/2020","name":"Accomack"}]

From an object-oriented view, it seems you'd want an additional level of aggregation, by "Date" (with each date containing a list of 'County' rows).

One consideration: if your aggregations align better with a database approach, I would think each row from the source data should be kept and queried AS/IS, filtered and sorted via Stream lambdas.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值