由于最近需要使用Spring boot整合Hadoop的HDFS,但是在整合的过程遇到了很多问题,网上也没有现成教程,都是自己摸索出来的,配置了很久都没能把项目搭建出来,希望对大家有帮助。
使用Spring boot整合HDFS主要是为了从数据库获取List,将List数据生产CSV文件,导入到HDFS进行机器学习。
本文主要讲解如何整合成功和如果将List数据变成CSV文件存进HDFS当中。
简单整理下会出现的问题:
1.使用过程使用了@Slf4j,但是使用了Hadoop自动会导入log4j,会出现日志冲突
2.整合后,会出现tomcat无法启动的问题
3.依赖经常没法下载完成(这个我不断地重复下载,就解决了)
下面我先放上Pom.xml文件,这个文件比较重要,主要解决整合也是Pom.xml文件
参考如下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.ratings</groupId>
<artifactId>ratings</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>war</packaging>
<name>ratings</name>
<description>ratings</description>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.0.3.RELEASE</version>
<relativePath /> <!-- lookup parent from repository -->
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<java.version>1.8</java.version>
</properties>
<dependencies>
<!--<tomcat.version>8.0.9</tomcat.version> <dependency> <groupId>org.apache.tomcat</groupId>
<artifactId>tomcat-juli</artifactId> <version>${tomcat.version}</version>
</dependency> -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.2.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>net.sourceforge.javacsv</groupId>
<artifactId>javacsv</artifactId>
<version>2.0</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
<!-- 热部署 <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency> -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.mybatis.spring.boot</groupId>
<artifactId>mybatis-spring-boot-starter</artifactId>
<version>1.3.2</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-devtools</artifactId>
<optional>true</optional>
<version>2.0.2.RELEASE</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
上面就是最重要的Pom.xml,也是我解决整合的关键,只要这个Pom.xml搭建出来了,问题就基本解决了。
我是用的是Spring boot和Mybatis再加上HDFS,关键的代码主要我放下HDFS的操作,供给大家参考。
生产CSV过程我是在ServiceImpl中将数据封装导入HDFS,所以你必须知道
1.如何在CSV中创建文件
2.如何将数据导入CSV文件
3.如何下载HDFS文件到本地。
所以先说说如何创建文件,以下是我的代码:
// 创建HDFS的文件夹包含创建csv文件,逻辑无误!,已经修正
public String mkdir(String filename,String filepath) throws IOException {
Configuration conf = new Configuration();
conf.set(name, url);
Path srcPath = new Path(filepath);
FileSystem fs = srcPath.getFileSystem(conf);
boolean ishere = fs.isDirectory(srcPath);
if (ishere) {
System.out.println("文件夹已经存在!");
byte[] content = "".getBytes();
String path = filepath + "/" + filename + ".csv";
Path filePath = new Path(path);
FSDataOutputStream outputStream = fs.create(filePath);
outputStream.write(content);
outputStream = fs.create(filePath);
outputStream.close();
System.out.println("CSV文件创建成功!");
return path;
} else {
boolean isok = fs.mkdirs(srcPath);
if (isok) {
System.out.println("创建文件夹成功!");
byte[] content = "".getBytes();
conf.set(name, url);
String path = filepath + "/" + filename + ".csv";
Path filePath = new Path(path);
FSDataOutputStream outputStream = fs.create(filePath);
outputStream.write(content);
outputStream = fs.create(filePath);
outputStream.close();
System.out.println("CSV文件创建成功!");
return path;
} else {
System.out.println("创建文件夹失败!");
return "500";
}
}
}
以上是创建文件的一个过程
下面是如何将数据导入CSV中(不管你CSV在HDFS还是本地的window都这么操作,亲测可行)
@Override
public String u_output(int userId, String initPath) {
// TODO Auto-generated method stub
HdfsFile hdfs = new HdfsFile();
if (baseMapper.u_output(userId) != null) {
List<Ratings> list = new ArrayList<Ratings>();
list = baseMapper.u_output(userId);
for (Iterator iterator = list.iterator(); iterator.hasNext();) {
Ratings ratings = (Ratings) iterator.next();
ratings.setUserId(userId);
}
if (list.size() > 0) {
try {
DateUntil date = new DateUntil();
String filename = date.getDate() + userId;
System.out.println("文件名字:" + filename);
String filepath = hdfs.mkdir(filename, initPath);
System.out.println("文件地址:" + filepath);
CsvWriter csvWriter = null;
if (filepath != "500" && filepath != "") {
try {
csvWriter = new CsvWriter(filepath, ',', Charset.forName("UTF-8"));
String[] csvHeader = { "userId", "movieId" };
csvWriter.writeRecord(csvHeader);
for (int i = 0; i < list.size(); i++) {
Ratings data = list.get(i);
String uid = String.valueOf(data.getUserId());
String mid = String.valueOf(data.getMovieId());
String[] csvContent = { uid, mid };
csvWriter.writeRecord(csvContent);
}
} finally {
csvWriter.close();
System.out.println("--------CSV文件已经写入--------");
String path = initPath + "/." + filename + ".csv.crc";
System.out.println("crc的文件路径:" + path);
File fn = new File(path);
if (fn.exists()) {
fn.delete();
System.out.println("crc文件被删除");
}
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
return "200";
} else
return "500";
}
以上是接口的实现方法,具体需要怎么改参数也就是userId和initPath是你们自己的需求。(可看Pom.xml导入了一个JavaCSV的依赖,就是这个可以帮我们快速地写入CSV文件!)
最后的一个是下载CSV文件
代码奉上:
// src是hdfs的,dstpath是本地
public void downloadFile(String dstPath, String srcPath) throws IOException {
Path path = new Path(srcPath);
Configuration conf = new Configuration();
FileSystem hdfs;
conf.set(name, url);
hdfs = path.getFileSystem(conf);
File rootfile = new File(dstPath);
if (!rootfile.exists()) {
rootfile.mkdirs();
}
try {
if (hdfs.isFile(path)) {
String fileName = path.getName();
if (fileName.toLowerCase().endsWith("csv")) {
FSDataInputStream in = null;
FileOutputStream out = null;
try {
in = hdfs.open(path);
File srcfile = new File(rootfile, path.getName());
if (!srcfile.exists())
srcfile.createNewFile();
out = new FileOutputStream(srcfile);
IOUtils.copyBytes(in, out, 4096, false);
System.out.println("下载成功!");
} finally {
IOUtils.closeStream(in);
IOUtils.closeStream(out);
}
} else if (hdfs.isDirectory(path)) {
File dstDir = new File(dstPath);
if (!dstDir.exists()) {
dstDir.mkdirs();
}
// 在本地目录上加一层子目录
String filePath = path.toString();// 目录
String subPath[] = filePath.split("/");
String newdstPath = dstPath + subPath[subPath.length - 1] + "/";
System.out.println("newdstPath=======" + newdstPath);
if (hdfs.exists(path) && hdfs.isDirectory(path)) {
FileStatus[] srcFileStatus = hdfs.listStatus(path);
if (srcFileStatus != null) {
for (FileStatus status : hdfs.listStatus(path)) {
// 下载子目录下文件
downloadFile(newdstPath, status.getPath().toString());
}
}
}
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
最后,如果有什么不懂的话可以联系本人,或者发现该方法有问题的也可以联系本人,会尽快回复。