项目-数据接收处理入库查询
记录一下刚入公司做的员工测验项目
目标
读取本地.zip
文件上传后处理数据发送kafka,接收kafka数据,处理,格式化,存入elasticsearch,提供分页查询页面。
一、上传文件
- 通过页面提交文件:
xxx.zip
通过HTML表单提交文件
<form th:action="@{/upload/file}" method="post" enctype="multipart/form-data">
<input type="file" name="file" id="file">
<input type="submit" value="上传">
</form>
后端收到文件对象,将文件对象存储到项目文件下(这样才能获取到文件的绝对路径,Java能直接读取zip)
@PostMapping("/upload/file")
public String upload(@RequestParam("file") MultipartFile file) throws IOException, InterruptedException {
if (file == null) {
return "index";
}
File savePos = new File("src/main/resources/upload");
if (!savePos.exists()) { // 不存在,则创建该文件夹
savePos.mkdir();
}
//生成新的文件名
String uuid = UUID.randomUUID().toString().replace("-", "").toLowerCase();
String originalFilename = file.getOriginalFilename();
int i = originalFilename.lastIndexOf(".");
String newFileName = uuid.concat(originalFilename.substring(i));
//将文件转存之项目中src/main/resources/upload/xxxFile.xx
String fullPath = savePos.getCanonicalPath() + "/" + newFileName;
file.transferTo(new File(fullPath));
...
}
- java直接读取zip
通过ZipInputStream
读取信息
@PostMapping("/upload/file")
public String upload(@RequestParam("file") MultipartFile file) throws IOException, InterruptedException {
...
//读取文件信息
FileInputStream fileInputStream = new FileInputStream(fullPath);
ZipInputStream zipInputStream = new ZipInputStream(new BufferedInputStream(fileInputStream), Charset.forName("UTF-8"));
ZipEntry entry = zipInputStream.getNextEntry();
BufferedReader br = new BufferedReader(new InputStreamReader(zipInputStream, Charset.forName("UTF-8")));
String line;
//内容不为空,打印数据
while ((line = br.readLine()) != null) {
System.out.println(line);
}
}
二、Kafka发送、接收数据
配置kafka环境(中间比较重要的点:设置TOPIC,服务器地址、端口,消费者设置group.id
)
- 生产者发送数据
//读取文件信息
...
String line;
//获取kafka生产者
Properties kafkaProps = getProperties();
KafkaProducer<String, String> producer = new KafkaProducer<>(kafkaProps);
//内容不为空,发送消息
while ((line = br.readLine()) != null) {
ProducerRecord<String, String> record = new ProducerRecord<>(TOPIC, line);
producer.send(record);
}
//关闭流
zipInputStream.closeEntry();
fileInputStream.close();
- 消费者接收数据
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(TOPIC));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
if(records == null || records.isEmpty()) continue;
for (ConsumerRecord<String, String> record : records) {
//处理record
}
}
} catch (Exception e) {
e.printStackTrace();
} finally {
consumer.close();
}
三、数据处理
待处理数据
xxx.zip
源数据样式
{
...
"basicInfo": {
"lastTime": "",
"firstTime": "",
"total": 1,
"data": "",
"dataType": "",
"attackAction": [""],
"attackInProtocol": [],
"malwareClass": [
""
],
"tags": ""
},
...
}
目标数据:
{
"lastTime": "",
"firstTime": "",
"total": 1,
"url": "",
"attackInProtocol": [],
"tags": ["",""]
}
处理要求:
- 提取basicInfo
- 提取dataType作为键,data作为值
- “attackAction”,"malwareClass"字段合并生成新的字段 tags(源tags删除)
- 入es库,索引名称(soar_test_data)
数据处理代码
public static Map<String, Object> handleData(String json) {
Received received = JSON.parseObject(json, Received.class);
BasicInfo basicInfo = received.getBasicInfo();
HashMap<String, Object> resMap = new HashMap<>();
resMap.put("lastTime", basicInfo.getLastTime());
resMap.put("firstTime", basicInfo.getFirstTime());
resMap.put("total", basicInfo.getTotal());
resMap.put(basicInfo.getDataType(), basicInfo.getData());
resMap.put("attackInProtocol", basicInfo.getAttackInProtocol());
List<String> tags = new ArrayList<>();
List<String> attackAction = basicInfo.getAttackAction();
for (String tag : attackAction) {
tags.add(tag);
}
List<String> malwareClass = basicInfo.getMalwareClass();
for (String tag : malwareClass) {
tags.add(tag);
}
resMap.put("tags", tags);
return resMap;
}
- 指定url字段的值为es库中的id(主键)
Spring Boot配置es
批量将消费者接收的数据发送给es代码
BulkRequest bulkRequest = new BulkRequest();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
if(records == null || records.isEmpty()) continue;
for (ConsumerRecord<String, String> record : records) {
//获取消息的值,其值是一个json字符串
String json = record.value();
//将json字符串处理为结果需要的对象
Map<String, Object> resMap = handleData(json);
String resData = JSON.toJSONString(resMap);
//将对象转换的字符串保存到ES中
IndexRequest indexRequest = new IndexRequest(INDEX);
//将id设置为对象的url
indexRequest.id((String) resMap.get("url"));
indexRequest.source(resData, XContentType.JSON);
bulkRequest.add(indexRequest);
}
//批量操作
RequestConfig requestConfig = RequestConfig.custom()
.setConnectTimeout(5000)
.setSocketTimeout(120000)
.build();
RequestOptions options = RequestOptions.DEFAULT.toBuilder()
.setRequestConfig(requestConfig)
.build();
BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, options);
System.out.println(bulkResponse);
}
四、数据查询展示
- 通过
total
字段降序排序查询top 10
考验es操作的,做法比较简单
@GetMapping("/data")
public ModelAndView showData() throws IOException {
SearchRequest request = new SearchRequest(INDEX);
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.matchAllQuery());
builder.sort("total", SortOrder.DESC);
request.source(builder);
SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = response.getHits();
Iterator<SearchHit> iterator = hits.iterator();
List<Integer> totals = new ArrayList<>();
List<String> urls = new ArrayList<>();
//不做特殊处理,es默认貌似只取前十个数据,如果是这样后面这段可以不写for的逻辑
for (int i = 0; i < 10; ++i) {
if (iterator.hasNext()) {
SearchHit hit = iterator.next();
Map<String, Object> hitMap = hit.getSourceAsMap();
totals.add((Integer) hitMap.get("total"));
urls.add((String) hitMap.get("url"));
}
}
ModelAndView modelAndView = new ModelAndView();
modelAndView.addObject("urls", urls);
modelAndView.addObject("totals", totals);
modelAndView.setViewName("data");
return modelAndView;
}
echarts
展示饼图(直接官网找的截图)
通过导入echarts.js
代码,使用官方提供的demo来修改
import * as echarts from 'echarts';
var chartDom = document.getElementById('main');
var myChart = echarts.init(chartDom);
var option;
option = {
title: {
text: 'total字段排序查询top 10',
left: 'center',
top: 10
},
tooltip: {
trigger: 'item'
},
legend: {
right: 10,
top: 'center',
orient: 'verticalAlign'
},
series: [
{
name: 'Access From',
type: 'pie',
radius: ['40%', '70%'],
avoidLabelOverlap: false,
label: {
show: false,
position: 'center'
},
emphasis: {
label: {
show: true,
fontSize: 40,
fontWeight: 'bold'
}
},
labelLine: {
show: false
},
data: [
{ value: 1048, name: 'Search Engine' },
{ value: 735, name: 'Direct' },
{ value: 580, name: 'Email' },
{ value: 484, name: 'Union Ads' },
{ value: 300, name: 'Video Ads' }
]
}
]
};
option && myChart.setOption(option);
五、测试环境
springboot 2.5.1
java 11
es 7.13.1
六、遇到的问题
linux配置Kafka环境(单节点、使用kafka自带zookeeper)
主要是修改Kafka的配置文件
server.properties
修改
broker.id=0
listeners=PLAINTEXT://your.host.ip:9092
advertised.listeners=PLAINTEXT://your.host.ip:9092
zookeeper.connect=your.host.ip:2181
host.name=your.host.ip
当时改成把your.host.ip
写成localhost
,结果一直连不上
Kafka消费者启动时频繁打印日志
启动kafka消费者,org.apache.kafka.clients
频繁打印日志,需要通过创建logback.xml
修改日志等级
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<logger name="org.apache.kafka.clients" level="info" />
</configuration>
SpringBoot通过上传文件获取文件的绝对路径
在网上找了很久,没有找到方法,因为通过上传文件能获得文件的绝对路径本来就是很不安全的行为,所以只能通过“中转”的方式,先把文件存储在项目中,然后获取本项目的路径,从而得到文件的绝对路径
if (file == null) {
return "index";
}
File savePos = new File("src/main/resources/upload");
if (!savePos.exists()) { // 不存在,则创建该文件夹
savePos.mkdir();
}
//生成新的文件名
String uuid = UUID.randomUUID().toString().replace("-", "").toLowerCase();
String originalFilename = file.getOriginalFilename();
int i = originalFilename.lastIndexOf(".");
String newFileName = uuid.concat(originalFilename.substring(i));
//将文件转存之项目中src/main/resources/upload/xxxFile.xx
String fullPath = savePos.getCanonicalPath() + "/" + newFileName;
file.transferTo(new File(fullPath));
解析JSON字符串
用的fastjson
,写一个bean就行
其他问题不记得了。
七、总结
其实整个问题还是很简单的,但是中途真的遇到了很多莫名其妙的问题,花了两三天时间才做完,做完还是有所收获,所以写个md文档记录一下。