[Hadoop实现Springboot之HDFS数据查询和插入 ]

最新推荐文章于 2024-03-29 01:55:15 发布

是汤圆丫

最新推荐文章于 2024-03-29 01:55:15 发布

阅读量959

点赞数 1

分类专栏：大数据文章标签： hdfs hive 数据库

汤圆

本文链接：https://blog.csdn.net/sqL520lT/article/details/131249329

版权

大数据专栏收录该内容

10 篇文章 0 订阅

订阅专栏

🎃前言:

🎃Spring Boot项目中添加Hadoop和HDFS的依赖。可以使用Apache Hadoop的Java API或者使用Spring Hadoop来简化操作。

🎃 需要配置Hadoop和HDFS的连接信息，包括Hadoop的配置文件和HDFS的连接地址等。

🎃可以使用Hadoop的API来实现数据的查询和添加。例如，使用HDFS的FileSystem API来读取和写入文件，使用MapReduce来处理数据等。

🎃MapReduce处理数据:

创建一个MapReduce任务，用于统计HDFS中文本文件中单词的出现次数

🎃实现MapReduce任务的API

🎃main方法测试

api请求:

🎃前言:

🎃这只是一个笔记而已

🎃Spring Boot项目中添加Hadoop和HDFS的依赖。可以使用Apache Hadoop的Java API或者使用Spring Hadoop来简化操作。

根据情况选取依赖

 
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>3.3.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>3.3.1</version>
    </dependency>
</dependencies>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>3.3.1</version>
</dependency>

<dependency>
    <groupId>org.springframework.data</groupId>
    <artifactId>spring-data-hadoop</artifactId>
    <version>2.5.0.RELEASE</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>3.3.1</version>
</dependency>

🎃 需要配置Hadoop和HDFS的连接信息，包括Hadoop的配置文件和HDFS的连接地址等。

# Hadoop configuration
spring.hadoop.config.fs.defaultFS=hdfs://localhost:9000
spring.hadoop.config.dfs.replication=1
spring.hadoop.config.dfs.blocksize=128m
spring.hadoop.config.dfs.client.use.datanode.hostname=true
spring.hadoop.config.dfs.client.read.shortcircuit=true
spring.hadoop.config.dfs.domain.socket.path=/var/run/hadoop-hdfs/dn._PORT

# HDFS configuration
spring.hadoop.fsUri=hdfs://localhost:9000
spring.hadoop.fsUser=root

🎃可以使用Hadoop的API来实现数据的查询和添加。例如，使用HDFS的FileSystem API来读取和写入文件，使用MapReduce来处理数据等。

@RestController
@RequestMapping("/hdfs")
public class HdfsController {

    @Autowired
    private FileSystem fileSystem;

    @GetMapping("/read/{path}")
    public ResponseEntity<String> read(@PathVariable String path) throws IOException {
        Path filePath = new Path(path);
        if (!fileSystem.exists(filePath)) {
            return ResponseEntity.notFound().build();
        }
        FSDataInputStream inputStream = fileSystem.open(filePath);
        String content = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
        inputStream.close();
        return ResponseEntity.ok(content);
    }

    @PostMapping("/write/{path}")
    public ResponseEntity<Void> write(@PathVariable String path, @RequestBody String content) throws IOException {
        Path filePath = new Path(path);
        if (fileSystem.exists(filePath)) {
            return ResponseEntity.badRequest().build();
        }
        FSDataOutputStream outputStream = fileSystem.create(filePath);
        IOUtils.write(content, outputStream, StandardCharsets.UTF_8);
        outputStream.close();
        return ResponseEntity.ok().build();
    }
}

使用curl或其他HTTP客户端发送GET和POST请求来测试API：或者postman去测试

# 读取文件
curl http://localhost:8080/hdfs/read/test.txt

# 写入文件
curl -X POST -H "Content-Type: text/plain" -d "Hello, HDFS!" http://localhost:8080/hdfs/write/test.txt

🎃MapReduce处理数据:

创建一个MapReduce任务，用于统计HDFS中文本文件中单词的出现次数

public class WordCount {

    public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

        private IntWritable result = new IntWritable();

        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }
}

🎃实现MapReduce任务的API

@RestController
@RequestMapping("/hdfs")
public class HdfsController {

    @Autowired
    private FileSystem fileSystem;

    @Autowired
    private Configuration configuration;

    @PostMapping("/wordcount")
    public ResponseEntity<Void> wordCount(@RequestParam String inputPath, @RequestParam String outputPath) throws Exception {
        Job job = Job.getInstance(configuration, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCount.WordCountMapper.class);
        job.setCombinerClass(WordCount.WordCountReducer.class);
        job.setReducerClass(WordCount.WordCountReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(inputPath));
        FileOutputFormat.setOutputPath(job, new Path(outputPath));
        boolean success = job.waitForCompletion(true);
        return success ? ResponseEntity.ok().build() : ResponseEntity.badRequest().build();
    }
}

🎃main方法测试

@SpringBootApplication
public class Application {

    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }

    @Bean
    public FileSystem fileSystem() throws IOException {
        Configuration configuration = new Configuration();
        configuration.set("fs.defaultFS", "hdfs://localhost:9000");
        return FileSystem.get(configuration);
    }

    @Bean
    public Configuration configuration() {
        return new Configuration();
    }
}