项目-数据接收处理入库查询

项目-数据接收处理入库查询

记录一下刚入公司做的员工测验项目

目标

读取本地.zip文件上传后处理数据发送kafka,接收kafka数据,处理,格式化,存入elasticsearch,提供分页查询页面。

一、上传文件

  • 通过页面提交文件:xxx.zip

通过HTML表单提交文件

<form th:action="@{/upload/file}" method="post" enctype="multipart/form-data">
    <input type="file" name="file" id="file">
    <input type="submit" value="上传">
</form>

后端收到文件对象,将文件对象存储到项目文件下(这样才能获取到文件的绝对路径,Java能直接读取zip)

@PostMapping("/upload/file")
public String upload(@RequestParam("file") MultipartFile file) throws IOException, InterruptedException {
    if (file == null) {
        return "index";
    }
    File savePos = new File("src/main/resources/upload");
    if (!savePos.exists()) {  // 不存在,则创建该文件夹
        savePos.mkdir();
    }
    //生成新的文件名
    String uuid = UUID.randomUUID().toString().replace("-", "").toLowerCase();
    String originalFilename = file.getOriginalFilename();
    int i = originalFilename.lastIndexOf(".");
    String newFileName = uuid.concat(originalFilename.substring(i));

    //将文件转存之项目中src/main/resources/upload/xxxFile.xx
    String fullPath = savePos.getCanonicalPath() + "/" + newFileName;
    file.transferTo(new File(fullPath));
    ...
}
  • java直接读取zip

通过ZipInputStream读取信息

@PostMapping("/upload/file")
public String upload(@RequestParam("file") MultipartFile file) throws IOException, InterruptedException {
    ...
    //读取文件信息
    FileInputStream fileInputStream = new FileInputStream(fullPath);
    ZipInputStream zipInputStream = new ZipInputStream(new BufferedInputStream(fileInputStream), Charset.forName("UTF-8"));
    ZipEntry entry = zipInputStream.getNextEntry();
    BufferedReader br = new BufferedReader(new InputStreamReader(zipInputStream, Charset.forName("UTF-8")));
    String line;
    //内容不为空,打印数据
    while ((line = br.readLine()) != null) {
        System.out.println(line);
    }
}

二、Kafka发送、接收数据

配置kafka环境(中间比较重要的点:设置TOPIC,服务器地址、端口,消费者设置group.id

  • 生产者发送数据
//读取文件信息
...
String line;
//获取kafka生产者
Properties kafkaProps = getProperties();
KafkaProducer<String, String> producer = new KafkaProducer<>(kafkaProps);
//内容不为空,发送消息
while ((line = br.readLine()) != null) {
    ProducerRecord<String, String> record = new ProducerRecord<>(TOPIC, line);
    producer.send(record);
}
//关闭流
zipInputStream.closeEntry();
fileInputStream.close();
  • 消费者接收数据
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(TOPIC));
try {
    while (true) {
        ConsumerRecords<String, String> records = consumer.poll(100);
        if(records == null || records.isEmpty()) continue;
        for (ConsumerRecord<String, String> record : records) {
            //处理record
        }
    }
} catch (Exception e) {
    e.printStackTrace();
} finally {
    consumer.close();
}

三、数据处理

待处理数据

xxx.zip

源数据样式

{
	...
	"basicInfo": {
		"lastTime": "",
		"firstTime": "",
		"total": 1,
		"data": "",
		"dataType": "",
		"attackAction": [""],
		"attackInProtocol": [],
		"malwareClass": [
			""
		],
		"tags": ""
	},
	...
}

目标数据:

{
		"lastTime": "",
		"firstTime": "",
		"total": 1,
		"url": "",
		"attackInProtocol": [],
		"tags": ["",""]
}

处理要求:

  1. 提取basicInfo
  2. 提取dataType作为键,data作为值
  3. “attackAction”,"malwareClass"字段合并生成新的字段 tags(源tags删除)
  4. 入es库,索引名称(soar_test_data)

数据处理代码

    public static Map<String, Object> handleData(String json) {
        Received received = JSON.parseObject(json, Received.class);
        BasicInfo basicInfo = received.getBasicInfo();
        HashMap<String, Object> resMap = new HashMap<>();
        resMap.put("lastTime", basicInfo.getLastTime());
        resMap.put("firstTime", basicInfo.getFirstTime());
        resMap.put("total", basicInfo.getTotal());
        resMap.put(basicInfo.getDataType(), basicInfo.getData());
        resMap.put("attackInProtocol", basicInfo.getAttackInProtocol());
        List<String> tags = new ArrayList<>();
        List<String> attackAction = basicInfo.getAttackAction();
        for (String tag : attackAction) {
            tags.add(tag);
        }
        List<String> malwareClass = basicInfo.getMalwareClass();
        for (String tag : malwareClass) {
            tags.add(tag);
        }
        resMap.put("tags", tags);
        return resMap;
    }
  1. 指定url字段的值为es库中的id(主键)

Spring Boot配置es

批量将消费者接收的数据发送给es代码

BulkRequest bulkRequest = new BulkRequest();
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(100);
    if(records == null || records.isEmpty()) continue;
    for (ConsumerRecord<String, String> record : records) {
        //获取消息的值,其值是一个json字符串
        String json = record.value();
        //将json字符串处理为结果需要的对象
        Map<String, Object> resMap = handleData(json);
        String resData = JSON.toJSONString(resMap);

        //将对象转换的字符串保存到ES中
        IndexRequest indexRequest = new IndexRequest(INDEX);
        //将id设置为对象的url
        indexRequest.id((String) resMap.get("url"));
        indexRequest.source(resData, XContentType.JSON);
        bulkRequest.add(indexRequest);
    }
    //批量操作
    RequestConfig requestConfig = RequestConfig.custom()
        .setConnectTimeout(5000)
        .setSocketTimeout(120000)
        .build();
    RequestOptions options = RequestOptions.DEFAULT.toBuilder()
        .setRequestConfig(requestConfig)
        .build();
    BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, options);
    System.out.println(bulkResponse);
}

四、数据查询展示

  1. 通过total字段降序排序查询top 10

考验es操作的,做法比较简单

@GetMapping("/data")
public ModelAndView showData() throws IOException {
    SearchRequest request = new SearchRequest(INDEX);
    SearchSourceBuilder builder = new SearchSourceBuilder();
    builder.query(QueryBuilders.matchAllQuery());
    builder.sort("total", SortOrder.DESC);
    request.source(builder);
    SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);
    SearchHits hits = response.getHits();
    Iterator<SearchHit> iterator = hits.iterator();
    List<Integer> totals = new ArrayList<>();
    List<String> urls = new ArrayList<>();
    //不做特殊处理,es默认貌似只取前十个数据,如果是这样后面这段可以不写for的逻辑
    for (int i = 0; i < 10; ++i) {
        if (iterator.hasNext()) {
            SearchHit hit = iterator.next();
            Map<String, Object> hitMap = hit.getSourceAsMap();
            totals.add((Integer) hitMap.get("total"));
            urls.add((String) hitMap.get("url"));
        }
    }
    ModelAndView modelAndView = new ModelAndView();
    modelAndView.addObject("urls", urls);
    modelAndView.addObject("totals", totals);
    modelAndView.setViewName("data");
    return modelAndView;
}
  1. echarts展示饼图(直接官网找的截图)

在这里插入图片描述

通过导入echarts.js代码,使用官方提供的demo来修改

import * as echarts from 'echarts';

var chartDom = document.getElementById('main');
var myChart = echarts.init(chartDom);
var option;

option = {
  title: {
    text: 'total字段排序查询top 10',
    left: 'center',
    top: 10
  },
  tooltip: {
    trigger: 'item'
  },
  legend: {
    right: 10,
    top: 'center',
    orient: 'verticalAlign'
  },
  series: [
    {
      name: 'Access From',
      type: 'pie',
      radius: ['40%', '70%'],
      avoidLabelOverlap: false,
      label: {
        show: false,
        position: 'center'
      },
      emphasis: {
        label: {
          show: true,
          fontSize: 40,
          fontWeight: 'bold'
        }
      },
      labelLine: {
        show: false
      },
      data: [
        { value: 1048, name: 'Search Engine' },
        { value: 735, name: 'Direct' },
        { value: 580, name: 'Email' },
        { value: 484, name: 'Union Ads' },
        { value: 300, name: 'Video Ads' }
      ]
    }
  ]
};

option && myChart.setOption(option);

五、测试环境

springboot 2.5.1
java 11
es 7.13.1

六、遇到的问题

linux配置Kafka环境(单节点、使用kafka自带zookeeper)

主要是修改Kafka的配置文件

server.properties修改

broker.id=0
listeners=PLAINTEXT://your.host.ip:9092
advertised.listeners=PLAINTEXT://your.host.ip:9092
zookeeper.connect=your.host.ip:2181
host.name=your.host.ip

当时改成把your.host.ip写成localhost,结果一直连不上

Kafka消费者启动时频繁打印日志

启动kafka消费者,org.apache.kafka.clients频繁打印日志,需要通过创建logback.xml修改日志等级

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <logger name="org.apache.kafka.clients" level="info" />
</configuration>

SpringBoot通过上传文件获取文件的绝对路径

在网上找了很久,没有找到方法,因为通过上传文件能获得文件的绝对路径本来就是很不安全的行为,所以只能通过“中转”的方式,先把文件存储在项目中,然后获取本项目的路径,从而得到文件的绝对路径

if (file == null) {
    return "index";
}
File savePos = new File("src/main/resources/upload");
if (!savePos.exists()) {  // 不存在,则创建该文件夹
    savePos.mkdir();
}
//生成新的文件名
String uuid = UUID.randomUUID().toString().replace("-", "").toLowerCase();
String originalFilename = file.getOriginalFilename();
int i = originalFilename.lastIndexOf(".");
String newFileName = uuid.concat(originalFilename.substring(i));

//将文件转存之项目中src/main/resources/upload/xxxFile.xx
String fullPath = savePos.getCanonicalPath() + "/" + newFileName;
file.transferTo(new File(fullPath));

解析JSON字符串

用的fastjson,写一个bean就行

其他问题不记得了。

七、总结

其实整个问题还是很简单的,但是中途真的遇到了很多莫名其妙的问题,花了两三天时间才做完,做完还是有所收获,所以写个md文档记录一下。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值