1. 首先创建Configuration类
Configuration config = new Configuration();
2.然后创建DatasetRepositoryFactory类
DatasetRepositoryFactory datasetRepositoryFactory = new DatasetRepositoryFactory();
有三个方法
setConf() 里面放Configuration的实例
setNamespace() 里面放数据目录
setBasePath() 里面放hdfs的地址- hdfs://......
DatasetDefinition类
DatasetDefinition definition = new DatasetDefinition();
definition.setFormat(Formats.AVRO.getName()); // file type
definition.setTargetClass(实体.class);
definition.setAllowNullValues(false); // 是否允许空值
definition.setPartitionStrategy(
new PartitionStrategy.Builder()
.identity("实体中的分区属性名", "表中的分区字段")
.build());
DataStoreWriter<实体>类
new AvroPojoDatasetStoreWriter<ImportPojo>(ImportPojo.class, datasetRepositoryFactory(),
刚刚创建的DatasetDefinition实例)
DatasetOperations类
DatasetTemplate datasetOperations = new DatasetTemplate();
datasetOperations.setDatasetDefinitions(Collections.singletonList(刚刚创建的DatasetDefinition实例));
datasetOperations.setDatasetRepositoryFactory(刚刚创建的DatasetRepositoryFactory实例);
(DatasetOperations)datasetOperations // 强转一下就好,接口关系
这么一串下来,获得一个DatasetOperations类的实例对象
DatasetOperations类
write(Collection<T> records)方法,接受一个Collection参数,list<XXX>,会将数据以刚刚上面设置的格式写入hdfs,我设置的avro
由于kerberos的原因,还为测试能否成功,再测
==================================================
上面比较乱,下面贴一个简单的测试,将实体生成为avro文件上传至hdfs
package com.avro;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.junit.Test;
import org.kitesdk.data.Formats;
import org.kitesdk.data.PartitionStrategy;
import org.springframework.data.hadoop.store.DataStoreWriter;
import org.springframework.data.hadoop.store.dataset.AvroPojoDatasetStoreWriter;
import org.springframework.data.hadoop.store.dataset.DatasetDefinition;
import org.springframework.data.hadoop.store.dataset.DatasetOperations;
import org.springframework.data.hadoop.store.dataset.DatasetRepositoryFactory;
import org.springframework.data.hadoop.store.dataset.DatasetTemplate;
import com.avro.pojo.ImportPOJO;
public class AVROTest {
@Test
public void test() throws Exception {
Configuration config = new Configuration();
DatasetRepositoryFactory datasetRepositoryFactory = new DatasetRepositoryFactory();
datasetRepositoryFactory.setConf(config);
datasetRepositoryFactory.setNamespace("files");
datasetRepositoryFactory.setBasePath("hdfs://localhost:9000/");
datasetRepositoryFactory.afterPropertiesSet();
DatasetDefinition definition = new DatasetDefinition();
definition.setFormat(Formats.AVRO.getName());
definition.setTargetClass(ImportPOJO.class);
definition.setAllowNullValues(false);
definition.setPartitionStrategy(
new PartitionStrategy.Builder()
.identity("id", "pre_id")
.build());
DataStoreWriter<ImportPOJO> dataStoreWriter = new AvroPojoDatasetStoreWriter<ImportPOJO>(ImportPOJO.class, datasetRepositoryFactory,definition);
DatasetTemplate datasetOperations = new DatasetTemplate();
datasetOperations.setDatasetDefinitions(Collections.singletonList(definition));
datasetOperations.setDatasetRepositoryFactory(datasetRepositoryFactory);
DatasetOperations DatasetOperations = (DatasetOperations)datasetOperations;
ImportPOJO pojo = new ImportPOJO();
pojo.setId(4);
pojo.setName("rihanna");
pojo.setAge(24);
List<ImportPOJO> list = new ArrayList<ImportPOJO>();
list.add(pojo);
DatasetOperations.write(list);
}
}
Configuration我没有加其他的东西,因为环境就是本机器的,由于我的是linux系统,用户是root,和是hdfs上的用户是hadoop,我在eclipse中测试,为了方便,我在运行的时候加了参数-DHADOOP_USER_NAME=hadoop,然后就行了,先测试把