有规则不定行文本结构化

【问题】

I have a CSV file with a non standardized content, it goes something like this:

John, 001
01/01/2015, hamburger
02/01/2015, pizza
03/01/2015, ice cream
Mary, 002
01/01/2015, hamburger
02/01/2015, pizza
John, 003
04/01/2015, chocolate

Now, what I'm trying to do is to write a logic in java to separate them.I would like"John, 001"as the header and to put all the rows under John, before Mary to be John's.

Will this be possible? Or should I just do it manually?

Edit: 
For the input, even though it is not standardized, a noticeable pattern is that the row that do not have names will always starts with a date.
My output goal would be a java object, where I can store it in the database eventually in the format below.

Name, hamburger, pizza, ice cream, chocolate
John, 01/01/2015, 02/01/2015, 03/01/2015, NA
Mary, 01/01/2015, 02/01/2015, NA, NA
John, NA, NA, NA, 04/01/2015

【回答】

本问题需要大量的结构化计算才能实现,JAVA缺乏相关的类库,实现过程复杂,代码可读性差。这种情况下可以用SPL辅助实现,代码更直观易懂:

AB
1=file("D:\\noneStand.csv").cursor@c()=["hamburger","pizza","ice   cream","chocolate"]
2=create(name,${foodlist})
3for A1;!isdigit(left(#1,1))=A3.to(2,).align(B1,#2)
4=A2.record(A3.#1 | B3.(#1))

A1:以游标方式读入文件noneStand.csv,分隔符是逗号。

A2:创建存放结果的二维表。${foodlist}会将参数动态解析为表达式。foodlist为参数,参数值为hamburger,pizza,'ice cream',chocolate

A3:循环A1,每次将完整的一组数据存入A3。当某行第1个字段的首字符是字母时,这行之前的数据分为一组。B3,B4是循环的作用范围。

B3:将A3(循环变量)的第2条以后的数据按foodlist对齐。比如Mary组对齐的结果是:

01/01/2015, hamburger

02/01/2015, pizza

NA,NA

NA,NA

B4:向A2追加记录。A3.#1返回A3的第1条记录的第1个字段(比如:Mary)。B3.(#1)表示B3的第1个字段形成的集合,即[01/01/2015, 02/01/2015,NA,NA]。"|"表示合并。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值