项目-数据接收处理入库查询

最新推荐文章于 2024-09-13 21:40:38 发布

yiowoc

最新推荐文章于 2024-09-13 21:40:38 发布

阅读量150

点赞数

文章标签： java 开发语言 kafka

本文链接：https://blog.csdn.net/qq_45440921/article/details/130422520

版权

项目-数据接收处理入库查询

记录一下刚入公司做的员工测验项目

目标

读取本地.zip文件上传后处理数据发送kafka，接收kafka数据，处理，格式化，存入elasticsearch，提供分页查询页面。

一、上传文件

通过页面提交文件：xxx.zip

通过HTML表单提交文件

<form th:action="@{/upload/file}" method="post" enctype="multipart/form-data">
    <input type="file" name="file" id="file">
    <input type="submit" value="上传">
</form>

后端收到文件对象，将文件对象存储到项目文件下（这样才能获取到文件的绝对路径，Java能直接读取zip）

@PostMapping("/upload/file")
public String upload(@RequestParam("file") MultipartFile file) throws IOException, InterruptedException {
    if (file == null) {
        return "index";
    }
    File savePos = new File("src/main/resources/upload");
    if (!savePos.exists()) {  // 不存在，则创建该文件夹
        savePos.mkdir();
    }
    //生成新的文件名
    String uuid = UUID.randomUUID().toString().replace("-", "").toLowerCase();
    String originalFilename = file.getOriginalFilename();
    int i = originalFilename.lastIndexOf(".");
    String newFileName = uuid.concat(originalFilename.substring(i));

    //将文件转存之项目中src/main/resources/upload/xxxFile.xx
    String fullPath = savePos.getCanonicalPath() + "/" + newFileName;
    file.transferTo(new File(fullPath));
    ...
}

java直接读取zip

通过ZipInputStream读取信息

@PostMapping("/upload/file")
public String upload(@RequestParam("file") MultipartFile file) throws IOException, InterruptedException {
    ...
    //读取文件信息
    FileInputStream fileInputStream = new FileInputStream(fullPath);
    ZipInputStream zipInputStream = new ZipInputStream(new BufferedInputStream(fileInputStream), Charset.forName("UTF-8"));
    ZipEntry entry = zipInputStream.getNextEntry();
    BufferedReader br = new BufferedReader(new InputStreamReader(zipInputStream, Charset.forName("UTF-8")));
    String line;
    //内容不为空，打印数据
    while ((line = br.readLine()) != null) {
        System.out.println(line);
    }
}

二、Kafka发送、接收数据

配置kafka环境（中间比较重要的点：设置TOPIC，服务器地址、端口，消费者设置group.id）

生产者发送数据

//读取文件信息
...
String line;
//获取kafka生产者
Properties kafkaProps = getProperties();
KafkaProducer<String, String> producer = new KafkaProducer<>(kafkaProps);
//内容不为空，发送消息
while ((line = br.readLine()) != null) {
    ProducerRecord<String, String> record = new ProducerRecord<>(TOPIC, line);
    producer.send(record);
}
//关闭流
zipInputStream.closeEntry();
fileInputStream.close();

消费者接收数据

KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(TOPIC));
try {
    while (true) {
        ConsumerRecords<String, String> records = consumer.poll(100);
        if(records == null || records.isEmpty()) continue;
        for (ConsumerRecord<String, String> record : records) {
            //处理record
        }
    }
} catch (Exception e) {
    e.printStackTrace();
} finally {
    consumer.close();
}

三、数据处理

待处理数据

xxx.zip

源数据样式

{
	...
	"basicInfo": {
		"lastTime": "",
		"firstTime": "",
		"total": 1,
		"data": "",
		"dataType": "",
		"attackAction": [""],
		"attackInProtocol": [],
		"malwareClass": [
			""
		],
		"tags": ""
	},
	...
}

目标数据：

{
		"lastTime": "",
		"firstTime": "",
		"total": 1,
		"url": "",
		"attackInProtocol": [],
		"tags": ["",""]
}

处理要求：

提取basicInfo
提取dataType作为键，data作为值
“attackAction”，"malwareClass"字段合并生成新的字段 tags(源tags删除)
入es库，索引名称（soar_test_data）

数据处理代码

    public static Map<String, Object> handleData(String json) {
        Received received = JSON.parseObject(json, Received.class);
        BasicInfo basicInfo = received.getBasicInfo();
        HashMap<String, Object> resMap = new HashMap<>();
        resMap.put("lastTime", basicInfo.getLastTime());
        resMap.put("firstTime", basicInfo.getFirstTime());
        resMap.put("total", basicInfo.getTotal());
        resMap.put(basicInfo.getDataType(), basicInfo.getData());
        resMap.put("attackInProtocol", basicInfo.getAttackInProtocol());
        List<String> tags = new ArrayList<>();
        List<String> attackAction = basicInfo.getAttackAction();
        for (String tag : attackAction) {
            tags.add(tag);
        }
        List<String> malwareClass = basicInfo.getMalwareClass();
        for (String tag : malwareClass) {
            tags.add(tag);
        }
        resMap.put("tags", tags);
        return resMap;
    }

指定url字段的值为es库中的id（主键）

Spring Boot配置es

批量将消费者接收的数据发送给es代码

BulkRequest bulkRequest = new BulkRequest();
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    if(records == null || records.isEmpty()) continue;
    for (ConsumerRecord<String, String> record : records) {
        //获取消息的值，其值是一个json字符串
        String json = record.value();
        //将json字符串处理为结果需要的对象
        Map<String, Object> resMap = handleData(json);
        String resData = JSON.toJSONString(resMap);

        //将对象转换的字符串保存到ES中
        IndexRequest indexRequest = new IndexRequest(INDEX);
        //将id设置为对象的url
        indexRequest.id((String) resMap.get("url"));
        indexRequest.source(resData, XContentType.JSON);
        bulkRequest.add(indexRequest);
    }
    //批量操作
    RequestConfig requestConfig = RequestConfig.custom()
        .setConnectTimeout(5000)
        .setSocketTimeout(120000)
        .build();
    RequestOptions options = RequestOptions.DEFAULT.toBuilder()
        .setRequestConfig(requestConfig)
        .build();
    BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, options);
    System.out.println(bulkResponse);
}

四、数据查询展示

通过total字段降序排序查询top 10

考验es操作的，做法比较简单

@GetMapping("/data")
public ModelAndView showData() throws IOException {
    SearchRequest request = new SearchRequest(INDEX);
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.matchAllQuery());
    builder.sort("total", SortOrder.DESC);
    request.source(builder);
    SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);
    SearchHits hits = response.getHits();
    Iterator<SearchHit> iterator = hits.iterator();
    List<Integer> totals = new ArrayList<>();
    List<String> urls = new ArrayList<>();
    //不做特殊处理，es默认貌似只取前十个数据，如果是这样后面这段可以不写for的逻辑
    for (int i = 0; i < 10; ++i) {
        if (iterator.hasNext()) {
            SearchHit hit = iterator.next();
            Map<String, Object> hitMap = hit.getSourceAsMap();
            totals.add((Integer) hitMap.get("total"));
            urls.add((String) hitMap.get("url"));
        }
    }
    ModelAndView modelAndView = new ModelAndView();
    modelAndView.addObject("urls", urls);
    modelAndView.addObject("totals", totals);
    modelAndView.setViewName("data");
    return modelAndView;
}

echarts展示饼图（直接官网找的截图）

在这里插入图片描述

通过导入echarts.js代码，使用官方提供的demo来修改

import * as echarts from 'echarts';

var chartDom = document.getElementById('main');
var myChart = echarts.init(chartDom);
var option;

option = {
  title: {
    text: 'total字段排序查询top 10',
    left: 'center',
    top: 10
  },
  tooltip: {
    trigger: 'item'
  },
  legend: {
    right: 10,
    top: 'center',
    orient: 'verticalAlign'
  },
  series: [
    {
      name: 'Access From',
      type: 'pie',
      radius: ['40%', '70%'],
      avoidLabelOverlap: false,
      label: {
        show: false,
        position: 'center'
      },
      emphasis: {
        label: {
          show: true,
          fontSize: 40,
          fontWeight: 'bold'
        }
      },
      labelLine: {
        show: false
      },
      data: [
        { value: 1048, name: 'Search Engine' },
        { value: 735, name: 'Direct' },
        { value: 580, name: 'Email' },
        { value: 484, name: 'Union Ads' },
        { value: 300, name: 'Video Ads' }
      ]
    }
  ]
};

option && myChart.setOption(option);

五、测试环境

springboot 2.5.1
java 11
es 7.13.1

六、遇到的问题

linux配置Kafka环境（单节点、使用kafka自带zookeeper）

主要是修改Kafka的配置文件

server.properties修改

broker.id=0
listeners=PLAINTEXT://your.host.ip:9092
advertised.listeners=PLAINTEXT://your.host.ip:9092
zookeeper.connect=your.host.ip:2181
host.name=your.host.ip

当时改成把your.host.ip写成localhost，结果一直连不上

Kafka消费者启动时频繁打印日志

启动kafka消费者，org.apache.kafka.clients频繁打印日志，需要通过创建logback.xml修改日志等级

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <logger name="org.apache.kafka.clients" level="info" />
</configuration>

SpringBoot通过上传文件获取文件的绝对路径

在网上找了很久，没有找到方法，因为通过上传文件能获得文件的绝对路径本来就是很不安全的行为，所以只能通过“中转”的方式，先把文件存储在项目中，然后获取本项目的路径，从而得到文件的绝对路径

if (file == null) {
    return "index";
}
File savePos = new File("src/main/resources/upload");
if (!savePos.exists()) {  // 不存在，则创建该文件夹
    savePos.mkdir();
}
//生成新的文件名
String uuid = UUID.randomUUID().toString().replace("-", "").toLowerCase();
String originalFilename = file.getOriginalFilename();
int i = originalFilename.lastIndexOf(".");
String newFileName = uuid.concat(originalFilename.substring(i));

//将文件转存之项目中src/main/resources/upload/xxxFile.xx
String fullPath = savePos.getCanonicalPath() + "/" + newFileName;
file.transferTo(new File(fullPath));