在Spring中集成Hadoop流程梳理:
(1)maven添加spring-data-hadoop依赖
<dependency>
?? ?<groupId>org.springframework.data</groupId>
?? ?<artifactId>spring-data-hadoop</artifactId>
?? ?<version>2.5.0.RELEASE</version>
</dependency>
(2)resources中的beans.xml定义命名空间
从官网拷贝内容粘贴:
https://docs.spring.io/spring-hadoop/docs/2.5.0.RELEASE/reference/html/springandhadoop-config.html
<hdp:configuration id=“hadoopConfiguration”>
fs.defaultFS=hdfs://hadoop000:8020
</hdp:configuration>
<hdp:file-system id=“fileSystem” configuration-ref=“hadoopConfiguration” user=“root”/>
可变的地方放进文件中application.properties,比如:spring.hadoop.fsUri=fs.defaultFS=hdfs://hadoop000:8020
<hdp:configuration id="hadoopConfiguration>
${spring.hadoop.fsUri}
</hdp:configuration><context:property-placeholder location=“application.properties”/>
(3)fileSystem通过spring注入进来,其他方式不变
使用Spring Hadoop访问HDFS系统
Test下创建spring包,创建springHadoopAppprivate ApplicationContext ctx;
private FileSystem fileSystem;setup()
{
ctx = new ClassPathXmlApplicationContext(“beans.xml”);
fileSystem = (FileSystem)ctx.getBean(“fileSystem”);
}
tearDown(){
ctx = null;
}
//创建文件夹
public void testMkdir() throws Exception{
fileSystem.mkdirs(new Path(“/springhdfs/”));
}//读取hdfs文件内容
public void testCat() throws Exception{
FSDataInputStream in = fileSystem.open(new Path(“/springhdfs/hello.txt”));
IOUtiles.copyBytes(in,System.out,1024);
in.close();
}
令附:Springboot访问hdfs
(1)添加依赖
org.springframework.data spring-data-hadoop-boot 2.5.0.RELEASE-hadoop25
(2)注入FsShell
@SpringBootApplication
public class springBootHDFSApp implements CommandLineRunner@Autowired
FsShell fsShell;//有很多方法public void run(String… strings) throws Exception{
for(FileStatus fileStatus:fsShell.lsr(“/springhdfs”)){
//打印
}
}public static void main(String[] args){
SpringApplication.run(SpringBootHDFSApp.class,args);
}
自己熟悉MapReduce,Hive