[Hadoop实现Springboot之HDFS数据查询和插入 ]

目录

🎃前言: 

🎃Spring Boot项目中添加Hadoop和HDFS的依赖。可以使用Apache Hadoop的Java API或者使用Spring Hadoop来简化操作。

🎃 需要配置Hadoop和HDFS的连接信息,包括Hadoop的配置文件和HDFS的连接地址等。

🎃可以使用Hadoop的API来实现数据的查询和添加。例如,使用HDFS的FileSystem API来读取和写入文件,使用MapReduce来处理数据等。

🎃MapReduce处理数据:

创建一个MapReduce任务,用于统计HDFS中文本文件中单词的出现次数

🎃实现MapReduce任务的API

🎃main方法测试

api请求:


🎃前言: 

  🎃这只是一个笔记而已

🎃Spring Boot项目中添加Hadoop和HDFS的依赖。可以使用Apache Hadoop的Java API或者使用Spring Hadoop来简化操作。

根据情况选取依赖

 
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>3.3.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>3.3.1</version>
    </dependency>
</dependencies>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>3.3.1</version>
</dependency>

<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-hadoop</artifactId>
    <version>2.5.0.RELEASE</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>3.3.1</version>
</dependency>


🎃 需要配置Hadoop和HDFS的连接信息,包括Hadoop的配置文件和HDFS的连接地址等。

# Hadoop configuration
spring.hadoop.config.fs.defaultFS=hdfs://localhost:9000
spring.hadoop.config.dfs.replication=1
spring.hadoop.config.dfs.blocksize=128m
spring.hadoop.config.dfs.client.use.datanode.hostname=true
spring.hadoop.config.dfs.client.read.shortcircuit=true
spring.hadoop.config.dfs.domain.socket.path=/var/run/hadoop-hdfs/dn._PORT

# HDFS configuration
spring.hadoop.fsUri=hdfs://localhost:9000
spring.hadoop.fsUser=root

🎃可以使用Hadoop的API来实现数据的查询和添加。例如,使用HDFS的FileSystem API来读取和写入文件,使用MapReduce来处理数据等。

@RestController
@RequestMapping("/hdfs")
public class HdfsController {

    @Autowired
    private FileSystem fileSystem;

    @GetMapping("/read/{path}")
    public ResponseEntity<String> read(@PathVariable String path) throws IOException {
        Path filePath = new Path(path);
        if (!fileSystem.exists(filePath)) {
            return ResponseEntity.notFound().build();
        }
        FSDataInputStream inputStream = fileSystem.open(filePath);
        String content = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
        inputStream.close();
        return ResponseEntity.ok(content);
    }

    @PostMapping("/write/{path}")
    public ResponseEntity<Void> write(@PathVariable String path, @RequestBody String content) throws IOException {
        Path filePath = new Path(path);
        if (fileSystem.exists(filePath)) {
            return ResponseEntity.badRequest().build();
        }
        FSDataOutputStream outputStream = fileSystem.create(filePath);
        IOUtils.write(content, outputStream, StandardCharsets.UTF_8);
        outputStream.close();
        return ResponseEntity.ok().build();
    }
}

使用curl或其他HTTP客户端发送GET和POST请求来测试API: 或者postman去测试

# 读取文件
curl http://localhost:8080/hdfs/read/test.txt

# 写入文件
curl -X POST -H "Content-Type: text/plain" -d "Hello, HDFS!" http://localhost:8080/hdfs/write/test.txt
 

🎃MapReduce处理数据:

创建一个MapReduce任务,用于统计HDFS中文本文件中单词的出现次数

public class WordCount {

    public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

        private IntWritable result = new IntWritable();

        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }
}

🎃实现MapReduce任务的API

@RestController
@RequestMapping("/hdfs")
public class HdfsController {

    @Autowired
    private FileSystem fileSystem;

    @Autowired
    private Configuration configuration;

    @PostMapping("/wordcount")
    public ResponseEntity<Void> wordCount(@RequestParam String inputPath, @RequestParam String outputPath) throws Exception {
        Job job = Job.getInstance(configuration, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCount.WordCountMapper.class);
        job.setCombinerClass(WordCount.WordCountReducer.class);
        job.setReducerClass(WordCount.WordCountReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(inputPath));
        FileOutputFormat.setOutputPath(job, new Path(outputPath));
        boolean success = job.waitForCompletion(true);
        return success ? ResponseEntity.ok().build() : ResponseEntity.badRequest().build();
    }
}

🎃main方法测试

@SpringBootApplication
public class Application {

    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }

    @Bean
    public FileSystem fileSystem() throws IOException {
        Configuration configuration = new Configuration();
        configuration.set("fs.defaultFS", "hdfs://localhost:9000");
        return FileSystem.get(configuration);
    }

    @Bean
    public Configuration configuration() {
        return new Configuration();
    }
}

api请求:

curl -X POST http://localhost:8080/hdfs/wordcount?inputPath=/user/input&outputPath=/user/output

最后,需要注意Hadoop和HDFS的安全性和性能问题,例如数据的加密和压缩,数据的分片和并行处理等。可以使用Hadoop的安全和性能优化工具来提高系统的稳定性和效率。

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
实现Spring Boot与HDFS和MySQL的文件上传和下载,需要先配置Hadoop和MySQL环境。然后,需要添加相应的依赖项并编写以下代码: 1. 配置HDFS 在application.properties文件中添加以下配置: ``` # HDFS配置 hadoop.hdfs.path=hdfs://localhost:9000 hadoop.hdfs.username=hadoop ``` 2. 配置MySQL 在application.properties文件中添加以下配置: ``` # MySQL配置 spring.datasource.url=jdbc:mysql://localhost:3306/test spring.datasource.username=root spring.datasource.password=root spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver ``` 3. 添加依赖项 在pom.xml文件中添加以下依赖项: ``` <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>3.2.1</version> </dependency> <dependency> <groupId>com.mysql.cj</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.23</version> </dependency> ``` 4. 编写上传和下载代码 上传代码: ```java @Service public class HdfsService { @Value("${hadoop.hdfs.path}") private String hdfsPath; @Value("${hadoop.hdfs.username}") private String hdfsUsername; @Value("${spring.servlet.multipart.location}") private String uploadPath; @Autowired private FileSystem fileSystem; @Autowired private JdbcTemplate jdbcTemplate; public void upload(MultipartFile file) throws IOException { String fileName = file.getOriginalFilename(); String filePath = "/upload/" + fileName; Path path = new Path(hdfsPath + filePath); FSDataOutputStream outputStream = fileSystem.create(path); outputStream.write(file.getBytes()); outputStream.close(); jdbcTemplate.update("INSERT INTO file (name, path) VALUES (?, ?)", fileName, filePath); } } ``` 下载代码: ```java @Service public class HdfsService { @Value("${hadoop.hdfs.path}") private String hdfsPath; @Value("${hadoop.hdfs.username}") private String hdfsUsername; @Value("${spring.servlet.multipart.location}") private String uploadPath; @Autowired private FileSystem fileSystem; @Autowired private JdbcTemplate jdbcTemplate; public void download(HttpServletResponse response, String fileName) throws IOException { String filePath = jdbcTemplate.queryForObject("SELECT path FROM file WHERE name = ?", String.class, fileName); Path path = new Path(hdfsPath + filePath); FSDataInputStream inputStream = fileSystem.open(path); response.setContentType("application/octet-stream"); response.setHeader("Content-Disposition", "attachment; filename=\"" + fileName + "\""); IOUtils.copy(inputStream, response.getOutputStream()); response.flushBuffer(); } } ``` 以上代码将文件存储在HDFS中,并将文件名和路径保存到MySQL中。下载时,从MySQL中查询文件路径并将文件流发送到响应中。注意,在这里我们使用了Apache Commons IO库的IOUtils类来将文件流复制到响应中。 同时,我们还需要在控制器中编写上传和下载的端点: ```java @RestController public class FileController { @Autowired private HdfsService hdfsService; @PostMapping("/upload") public void upload(@RequestParam("file") MultipartFile file) throws IOException { hdfsService.upload(file); } @GetMapping("/download") public void download(HttpServletResponse response, @RequestParam("fileName") String fileName) throws IOException { hdfsService.download(response, fileName); } } ``` 现在,我们已经完成了Spring Boot与HDFS和MySQL的文件上传和下载。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

是汤圆丫

怎么 给1分?

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值