Eclipse中Spring boot项目整合Hadoop的HDFS（获得的数据会以CSV文件存入HDFS）

最新推荐文章于 2024-06-11 14:59:40 发布

嘎嘎哇啦哈

最新推荐文章于 2024-06-11 14:59:40 发布

阅读量3.6k

点赞数

分类专栏： Spring boot 文章标签： HDFS spring boot javacsv

本文链接：https://blog.csdn.net/qq_19277297/article/details/80933199

版权

Spring boot 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

由于最近需要使用Spring boot整合Hadoop的HDFS，但是在整合的过程遇到了很多问题，网上也没有现成教程，都是自己摸索出来的，配置了很久都没能把项目搭建出来，希望对大家有帮助。

使用Spring boot整合HDFS主要是为了从数据库获取List，将List数据生产CSV文件，导入到HDFS进行机器学习。

本文主要讲解如何整合成功和如果将List数据变成CSV文件存进HDFS当中。

简单整理下会出现的问题：

1.使用过程使用了@Slf4j，但是使用了Hadoop自动会导入log4j，会出现日志冲突

2.整合后，会出现tomcat无法启动的问题

3.依赖经常没法下载完成（这个我不断地重复下载，就解决了）

下面我先放上Pom.xml文件，这个文件比较重要，主要解决整合也是Pom.xml文件

参考如下：

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>com.ratings</groupId>
	<artifactId>ratings</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>war</packaging>

	<name>ratings</name>
	<description>ratings</description>

	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.0.3.RELEASE</version>
		<relativePath /> <!-- lookup parent from repository -->
	</parent>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
		<java.version>1.8</java.version>

	</properties>

	<dependencies>
		<!--<tomcat.version>8.0.9</tomcat.version> <dependency> <groupId>org.apache.tomcat</groupId> 
			<artifactId>tomcat-juli</artifactId> <version>${tomcat.version}</version> 
			</dependency> -->

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-tomcat</artifactId>
			<scope>provided</scope>
		</dependency>
		<dependency>
			<groupId>ch.qos.logback</groupId>
			<artifactId>logback-classic</artifactId>
			<version>1.2.3</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-client</artifactId>
			<version>2.7.3</version>
			 
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-common</artifactId>
			<version>2.7.3</version>
			 
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-hdfs</artifactId>
			<version>2.7.3</version>
			 
		</dependency>
		<dependency>
			<groupId>net.sourceforge.javacsv</groupId>
			<artifactId>javacsv</artifactId>
			<version>2.0</version>
		</dependency>
		<dependency>
			<groupId>org.projectlombok</groupId>
			<artifactId>lombok</artifactId>
		</dependency>

		<!-- 热部署 <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-jpa</artifactId> 
			</dependency> -->

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-jdbc</artifactId>
			 
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
			 
		</dependency>
		<dependency>
			<groupId>org.mybatis.spring.boot</groupId>
			<artifactId>mybatis-spring-boot-starter</artifactId>
			<version>1.3.2</version>
			 
		</dependency>
	<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-devtools</artifactId>
			<optional>true</optional>
			<version>2.0.2.RELEASE</version>
		</dependency>
		<dependency>
			<groupId>mysql</groupId>
			<artifactId>mysql-connector-java</artifactId>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>

		</dependency>

	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>


</project>

上面就是最重要的Pom.xml，也是我解决整合的关键，只要这个Pom.xml搭建出来了，问题就基本解决了。

我是用的是Spring boot和Mybatis再加上HDFS，关键的代码主要我放下HDFS的操作，供给大家参考。

生产CSV过程我是在ServiceImpl中将数据封装导入HDFS，所以你必须知道

1.如何在CSV中创建文件

2.如何将数据导入CSV文件

3.如何下载HDFS文件到本地。

所以先说说如何创建文件，以下是我的代码：

		// 创建HDFS的文件夹包含创建csv文件，逻辑无误！，已经修正
	public String mkdir(String filename,String filepath) throws IOException {
		Configuration conf = new Configuration();
		conf.set(name, url);
		Path srcPath = new Path(filepath);
		FileSystem fs = srcPath.getFileSystem(conf);
		boolean ishere = fs.isDirectory(srcPath);
		if (ishere) {
			System.out.println("文件夹已经存在！");
			byte[] content = "".getBytes();
			String path = filepath + "/" + filename + ".csv";
			Path filePath = new Path(path);
			FSDataOutputStream outputStream = fs.create(filePath);
			outputStream.write(content);
			outputStream = fs.create(filePath);
			outputStream.close();
			System.out.println("CSV文件创建成功！");
			return path;
		} else {
			boolean isok = fs.mkdirs(srcPath);
			if (isok) {
				System.out.println("创建文件夹成功！");
				byte[] content = "".getBytes();
				conf.set(name, url);
				String path = filepath + "/" + filename + ".csv";
				Path filePath = new Path(path);
				FSDataOutputStream outputStream = fs.create(filePath);
				outputStream.write(content);
				outputStream = fs.create(filePath);
				outputStream.close();
				System.out.println("CSV文件创建成功！");
				return path;
			} else {
				System.out.println("创建文件夹失败！");
				return "500";
			}


		}


	}

以上是创建文件的一个过程

下面是如何将数据导入CSV中（不管你CSV在HDFS还是本地的window都这么操作，亲测可行）

	@Override
	public String u_output(int userId, String initPath) {
		// TODO Auto-generated method stub
		HdfsFile hdfs = new HdfsFile();
		if (baseMapper.u_output(userId) != null) {
			List<Ratings> list = new ArrayList<Ratings>();
			list = baseMapper.u_output(userId);
			for (Iterator iterator = list.iterator(); iterator.hasNext();) {
				Ratings ratings = (Ratings) iterator.next();
				ratings.setUserId(userId);
			}
			if (list.size() > 0) {
				try {
					DateUntil date = new DateUntil();
					String filename = date.getDate() + userId;
					System.out.println("文件名字：" + filename);
					String filepath = hdfs.mkdir(filename, initPath);
					System.out.println("文件地址：" + filepath);
					CsvWriter csvWriter = null;
					if (filepath != "500" && filepath != "") {
						try {
							csvWriter = new CsvWriter(filepath, ',', Charset.forName("UTF-8"));
							String[] csvHeader = { "userId", "movieId" };
							csvWriter.writeRecord(csvHeader);
							for (int i = 0; i < list.size(); i++) {
								Ratings data = list.get(i);
								String uid = String.valueOf(data.getUserId());
								String mid = String.valueOf(data.getMovieId());
								String[] csvContent = { uid, mid };
								csvWriter.writeRecord(csvContent);
							}
						} finally {
							csvWriter.close();
							System.out.println("--------CSV文件已经写入--------");
							String path = initPath + "/." + filename + ".csv.crc";
							System.out.println("crc的文件路径：" + path);
							File fn = new File(path);
							if (fn.exists()) {
								fn.delete();
								System.out.println("crc文件被删除");
							}
						}
					}


				} catch (IOException e) {
					// TODO Auto-generated catch block
					e.printStackTrace();
				}
			}
			return "200";
		} else
			return "500";
	}

以上是接口的实现方法，具体需要怎么改参数也就是userId和initPath是你们自己的需求。（可看Pom.xml导入了一个JavaCSV的依赖，就是这个可以帮我们快速地写入CSV文件！）

最后的一个是下载CSV文件

代码奉上：

// src是hdfs的，dstpath是本地
	public void downloadFile(String dstPath, String srcPath) throws IOException {
		Path path = new Path(srcPath);
		Configuration conf = new Configuration();
		FileSystem hdfs;
		conf.set(name, url);
		hdfs = path.getFileSystem(conf);

		File rootfile = new File(dstPath);
		if (!rootfile.exists()) {
			rootfile.mkdirs();
		}
		try {
			if (hdfs.isFile(path)) {
				String fileName = path.getName();
				if (fileName.toLowerCase().endsWith("csv")) {
					FSDataInputStream in = null;
					FileOutputStream out = null;
					try {
						in = hdfs.open(path);
						File srcfile = new File(rootfile, path.getName());
						if (!srcfile.exists())
							srcfile.createNewFile();
						out = new FileOutputStream(srcfile);
						IOUtils.copyBytes(in, out, 4096, false);
						System.out.println("下载成功！");
					} finally {
						IOUtils.closeStream(in);
						IOUtils.closeStream(out);
					}
				} else if (hdfs.isDirectory(path)) {
					File dstDir = new File(dstPath);
					if (!dstDir.exists()) {
						dstDir.mkdirs();
					}
					// 在本地目录上加一层子目录
					String filePath = path.toString();// 目录
					String subPath[] = filePath.split("/");
					String newdstPath = dstPath + subPath[subPath.length - 1] + "/";
					System.out.println("newdstPath=======" + newdstPath);
					if (hdfs.exists(path) && hdfs.isDirectory(path)) {
						FileStatus[] srcFileStatus = hdfs.listStatus(path);
						if (srcFileStatus != null) {
							for (FileStatus status : hdfs.listStatus(path)) {
								// 下载子目录下文件
								downloadFile(newdstPath, status.getPath().toString());
							}
						}
					}

				}
			}
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}

最后，如果有什么不懂的话可以联系本人，或者发现该方法有问题的也可以联系本人，会尽快回复。

嘎嘎哇啦哈

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
Eclipse中Spring boot项目整合Hadoop的HDFS（获得的数据会以CSV文件存入HDFS）

由于最近需要使用Spring boot整合Hadoop的HDFS，但是在整合的过程遇到了很多问题，配置了很久都没能把项目搭建出来。使用Spring boot整合HDFS主要是为了从数据库获取List，将List数据生产CSV文件，导入到HDFS进行机器学习。本文主要讲解如何整合成功和如果将List数据变成CSV文件存进HDFS当中。简单整理下会出现的问题：1.使用过程使用了@Slf4j，但是使用了...
复制链接

扫一扫