1. CSV解析还需要工具包吗?
如果能问出这个问题,根本还是在于对问题严重性不了解,要保证文本能够被正确解析需要有一个规范。
下面是Wikipidia - Comma-separated values中对RFC-4180介绍的(了解一下):
RFC 4180 formalized CSV. It defines the MIME type “text/csv”, and CSV files that follow its rules should be very widely portable. Among its requirements:
- MS-DOS-style lines that end with (CR/LF) characters (optional for the last line).
- An optional header record (there is no sure way to detect whether it is present, so care is required when importing).
- Each record “should” contain the same number of comma-separated fields.
- Any field may be quoted (with double quotes).
- Fields containing a line-break, double-quote or commas should be quoted. (If they are not, the file will likely be impossible to process correctly).
- A (double) quote character in a field must be represented by two (double) quote characters.
话不多说,如果你使用过:
- 手工解析 - 生死由天,没有RFC 4180加持
- commons-csv - 基本解析
- opencsv - 基本解析+Java Bean映射+嵌套解析(已经很棒了)
但愿你已经有意识地走到了3,到达(java) bean的水平然后再根据csv parsers comparison,应该使用univocity parsers。
可能你还是会觉得:这是什么玩意为,凭什么要用?
univocity-parsers is currently used by many commercial and open-source projects, including Spark-CSV, Apache Camel and Apache Drill.
2. univocity-parsers解析嵌套结构案例
下面只举一例如何解析如下的数据,其中需要注意的点:
- 标题差异比较明显;
rdfs:label
是用;
分割的,嵌套字段。
index,IRI,skos:prefLabel,rdfs:label(include skos:prefLabel),rdfs:subClassOf,vertical
1,user:Activity,活动,user:Event,
2,user:AdministrativeDepartment,行政机关,行政机构,user:Organization,
3,user:AdministrativeEnforcementOfLawDepartment,行政执法机关,user:LegalDepartment,
4,user:AdministrativeRegion,行政区,user:GeographicalArea,地点
5,user:AgentCompany,经纪公司,经纪人公司,user:Company,
6,user:Album,专辑,user:Audio,
7,user:Animal,动物界,user:Kingdom,
8,user:ArtWorks,美术作品,user:CreativeWork,
9,user:Athlete,运动员,user:SportsPerson,
10,user:Audio,音频,user:CreativeWork,
11,user:Biology,生物,owl:Thing,
12,user:Books,图书,user:Publication,
13,user:Building,建筑物,user:Place,地点
14,user:BusinessEvent,商业事件,user:Event,
15,user:BusinessPerson,商业人物,user:Person,
16,user:Cartoon,动画,user:Video,
17,user:CharitableOrganization,慈善组织,公益组织;公益团体;慈善机构;慈善团体,user:Organization,