文章转载自 : http://blog.csdn.net/mango_song/article/details/12562137
1. 概述
一个文本f1.txt的格式如下:
- 1 tom
- 2 jame
- 3 mango
它的第一列是id,第二列是name,第一列和第二列间通过不固定长度的空白(如空格 制表符等)分割;
我们希望创建一个user表,能够识别f1.txt ,通过创建表时执行分隔符的方法就不行了,这就需要用到Hive的序列化(SerDe)了。
2. 新建一个maven项目,添加hive-serde 0.11.0 , Hadoop-core 1.0.3的依赖。
创建SerdeTest类,实现Deserializer接口,
- 在initialize()方法中,描述表的各个字段及其类型
- 在deserialize(Writable text)方法中将text解析成id和name
- getObjectInspector()方法返回ObjectInspectorFactory.getStandardStructObjectInspector(structFieldNames,structFieldObjectInspectors)
- package com.renren.hive.tools;
- public class SerdeTest implements Deserializer {
- private List<String> structFieldNames = new ArrayList<String>();
- private List<ObjectInspector> structFieldObjectInspectors = new ArrayList<ObjectInspector>();
- @Override
- public ObjectInspector getObjectInspector() throws SerDeException {
- // TODO Auto-generated method stub
- return ObjectInspectorFactory.getStandardStructObjectInspector(
- structFieldNames, structFieldObjectInspectors);
- }
- @Override
- public Object deserialize(Writable text) throws SerDeException {
- // TODO Auto-generated method stub
- List<Object> result = new ArrayList<Object>();
- StringTokenizer tokenizer = new StringTokenizer(text.toString());
- int index = 0;
- while (tokenizer.hasMoreTokens()) {
- if (index == 0) {
- result.add(Integer.valueOf(tokenizer.nextToken()).intValue());
- } else {
- result.add(tokenizer.nextToken());
- }
- index++;
- }
- return result;
- }
- @Override
- public void initialize(Configuration arg0, Properties arg1)
- throws SerDeException {
- // TODO Auto-generated method stub
- structFieldNames.add("id");
- structFieldObjectInspectors.add(ObjectInspectorFactory
- .getReflectionObjectInspector(Integer.TYPE,
- ObjectInspectorOptions.JAVA));
- structFieldNames.add("name");
- structFieldObjectInspectors.add(ObjectInspectorFactory
- .getReflectionObjectInspector(String.class,
- ObjectInspectorOptions.JAVA));
- }
- @Override
- public SerDeStats getSerDeStats() {
- // TODO Auto-generated method stub
- return null;
- }
- }
- mvn clean package
将生成的jar包:hive-serde-tool-1.0.1-SNAPSHOT.jar 添加到hive_home/lib下,并在hive-site.xml中添加:
- <property>
- <name>hive.aux.jars.path</name>
- <value>file:///home/dp/hive/lib/hive-serde-tool-1.0.1-SNAPSHOT.jar</value>
- </property>
- hive -e "create table test row formated serde 'com.renren.hive.tools.SerdeTest'"
5.加载并查询数据
- hive -e "load data local inpath 'f1.txt' overwrite into table test"
- hive -e "select * from test"