一般情况下对于CSV格式文件数据,有多种第三方SerDer来处理。本文采用CSVSerDe:
一、添加第三方SerDe
首先在Hive classpath中添加第三方SerDe JAR包,命令如下:
hive> add jar /home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/lib/csv-serde-1.1.2.jar;
Added /home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/lib/csv-serde-1.1.2.jar to class path
Added resource: /home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/lib/csv-serde-1.1.2.jar
可以从该链接下载:csv-serde-1.1.2.jar,以某CSV文件为例介绍处理过程
二、某CSV日志文件格式如下:
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.0