通往最佳CSV解析?(Java)

1. CSV解析还需要工具包吗?

如果能问出这个问题,根本还是在于对问题严重性不了解,要保证文本能够被正确解析需要有一个规范。

下面是Wikipidia - Comma-separated values中对RFC-4180介绍的(了解一下):

RFC 4180 formalized CSV. It defines the MIME type “text/csv”, and CSV files that follow its rules should be very widely portable. Among its requirements:

  • MS-DOS-style lines that end with (CR/LF) characters (optional for the last line).
  • An optional header record (there is no sure way to detect whether it is present, so care is required when importing).
  • Each record “should” contain the same number of comma-separated fields.
  • Any field may be quoted (with double quotes).
  • Fields containing a line-break, double-quote or commas should be quoted. (If they are not, the file will likely be impossible to process correctly).
  • A (double) quote character in a field must be represented by two (double) quote characters.

话不多说,如果你使用过:

  1. 手工解析 - 生死由天,没有RFC 4180加持
  2. commons-csv - 基本解析
  3. opencsv - 基本解析+Java Bean映射+嵌套解析(已经很棒了)

但愿你已经有意识地走到了3,到达(java) bean的水平然后再根据csv parsers comparison,应该使用univocity parsers

可能你还是会觉得:这是什么玩意为,凭什么要用?

univocity-parsers is currently used by many commercial and open-source projects, including Spark-CSV, Apache Camel and Apache Drill.

2. univocity-parsers解析嵌套结构案例

下面只举一例如何解析如下的数据,其中需要注意的点:

  1. 标题差异比较明显;
  2. rdfs:label是用;分割的,嵌套字段。

index,IRI,skos:prefLabel,rdfs:label(include skos:prefLabel),rdfs:subClassOf,vertical
1,user:Activity,活动,user:Event,
2,user:AdministrativeDepartment,行政机关,行政机构,user:Organization,
3,user:AdministrativeEnforcementOfLawDepartment,行政执法机关,user:LegalDepartment,
4,user:AdministrativeRegion,行政区,user:GeographicalArea,地点
5,user:AgentCompany,经纪公司,经纪人公司,user:Company,
6,user:Album,专辑,user:Audio,
7,user:Animal,动物界,user:Kingdom,
8,user:ArtWorks,美术作品,user:CreativeWork,
9,user:Athlete,运动员,user:SportsPerson,
10,user:Audio,音频,user:CreativeWork,
11,user:Biology,生物,owl:Thing,
12,user:Books,图书,user:Publication,
13,user:Building,建筑物,user:Place,地点
14,user:BusinessEvent,商业事件,user:Event,
15,user:BusinessPerson,商业人物,user:Person,
16,user:Cartoon,动画,user:Video,
17,user:CharitableOrganization,慈善组织,公益组织;公益团体;慈善机构;慈善团体,user:Organization,

2.1 Java Bean定义


                
  • 4
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值