#filebeat配置
filebeat.yml
-源文件类型、路径、encoding(编码为utf-8可以忽略)
-输出地方 logstash或ES
filebeat启动命令
-filebeat.exe -e -c filebeat.yml
#logstash配置
logstash.conf文件
-启动conf文件配置 见下面logstash-geonames.conf
logstash启动命令
-logstash.bat -f ../config/logstash-geonames.conf
#es配置
elasticsearch.yml
-跨域问题 解决
http.cors.enabled: true
http.cors.allow-origin: "*"
es启动
-直接点击elasticsearch.bat
#es-head配置
参看github 关于es-head配置
#kibana配置
默认即可
#注意必须字段中包含lon、lat,以便kibana识别geo_point类型数据
#_mapping
http://localhost:9200/geochina/infors
_mapping?include_type_name=true post
{
"properties": {
"id": {
"type": "keyword"
},
"lon": {
"type": "float"
},
"lat": {
"type": "float"
},
"name": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"address": {
"type": "keyword"
},
"telephone": {
"type": "keyword"
},
"type": {
"type": "keyword"
},
"areaid": {
"type": "keyword"
},
"wgslng": {
"type": "float"
},
"wgslat": {
"type": "float"
},
"bdlng": {
"type": "float"
},
"bdlat": {
"type": "float"
},
"updatetime": {
"type": "keyword"
},
"isdelete": {
"type": "keyword"
},
"areaname": {
"type": "keyword"
},
"parentname": {
"type": "keyword"
},
"location": {
"type": "geo_point"
}
}
}
#config 注意必须包含lon、lat 不然kibana中识别不了geo_point类型数据
# Sample Logstash configuration for creating a simple
# Beats -> Logstash -> Elasticsearch pipeline.
input {
beats {
port => 5044
}
}
filter {
csv {
skip_header =>"true"
separator => ","
columns => ["id","lon","lat","name","address","telephone","type","areaid","wgslng","wgslat","bdlng","bdlat","updatetime","isdelete","areaname","parentname"]
add_field => ["[location][lon]","%{wgslng}"]
add_field => ["[location][lat]","%{wgslat}"]
remove_field => ["message","headers","@version","version","ecs","@timestamp","tags","agent","input","host","log","offset"]
}
mutate {
convert => {
# 类型转换
"id"=>"string"
"lon" => "float"
"lat" => "float"
"name" => "string"
"address" => "string"
"telephone" => "string"
"type" => "string"
"areaid" => "string"
"wgslng" => "float"
"wgslat" => "float"
"bdlng" => "float"
"bdlat" => "float"
"updatetime" => "string"
"isdelete" => "string"
"areaname" => "string"
"parentname" => "string"
"[location][lon]" => "float"
"[location][lat]" => "float"
}
}
}
output {
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "geochina3"
document_type => "infors3"
}
}
#其他临时知识点
查询
{
"query": {
"match": {
"name": "玉泉山"
}
}
}
#问题解决
1.数据文件编码为UTF-8格式 如何使用filebeat+logstash+es导入过程不出现数据乱码?
filebeat.yml 配饰文件中默认是UTF-8且filebeat默认读取数据为utf-8(所以不需要进行encoding的配置)
logstash.conf 默认读取数据格式为UTF-8所以也不需要特别配置
2.数据文件编码为GB2312/GBK格式 如何使用filebeat+logstash+es导入过程不出现数据乱码?
filebeat.yml 配饰文件中默认是UTF-8且filebeat默认读取数据为utf-8而原数据编码为GB2312,所以需要添加encoding:GB2312的配置
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
#- /var/log/*.log
- E:\FGQ\ELK\areanamePareaname.txt
encoding: GB2312/GBK
logstash.conf 默认读取数据格式为UTF-8而进来的是GB2312,所以需要将GB2312转成UTF-8才不会导致乱码,见如下配置
input {
beats {
codec => plain {
charset => "UTF-8"
}
port => 5044
}
}
3.如果源文件的编码不知道为何,且已经试过上述两种方式gbk、gb2312、plain、utf-8等,那么就需要将源文件转为自己熟知的文件格式,如何转如下
方式一:源文件大小没限制(我使用情况是47G的csv)
Get-Content E:\FGQ\ELK\poi.csv" | out-file "E:\FGQ\ELK\poiutf-8\poiutf8.csv" -encoding utf-8
Get-Content E:\FGQ\ELK\poi.csv | Out-File E:\FGQ\ELK\poi_6.csv -encoding utf8
方式二:源文件要求大小不超过2G
powershell 批量转换文本文件编码(GBK转UTF-8)
手头有一批SQL文件,通过某程序批量更新到Local DB。但是发现导进去后中文变乱码(一堆????),而且日志里头insert语句中文已经变成乱码,想来应该是编码的问题。一看SQL文件,GBK(系统默认编码)编码,于是想统一改成UTF-8编码。又不想去找各种工具了,直接用Powershell搞搞了。
02 正文
刚开始直接用powershell的内置命令get-content 和 set-content是挺方便的,但是结果一看,GBK转到了UTF-8 BOM 编码,不是很满意。但是也贴出来作为参考吧。
@echo off
powershell.exe -command "dir *.sql -R|foreach-object{(Get-Content $_.FullName -Encoding Default) | Set-Content $_.FullName -Encoding UTF8 };Write-Host '转换完成...'"
pause
于是想到另外一个方法——.net。最后得到想要的UTF-8编码。
@echo off
powershell.exe -command "dir *.sql -R|foreach-object{[void][System.IO.File]::WriteAllBytes($_.FullName,[System.Text.Encoding]::Convert([System.Text.Encoding]::GetEncoding('GBK'),[System.Text.Encoding]::UTF8,[System.IO.File]::ReadAllBytes($_.FullName)))};Write-Host '转换完成...'"
pause
脚本使用说明
脚本在powershell 5.1下测试通过
powershell脚本嵌入了CMD命令,所以另存为.bat,然后双击运行即可(递归遍历所有子目录,如果不需要,请将dir *.sql -R修改为dir *.sql)
如果是其他文本格式如csv或txt,请将脚本中dir *.sql修改为dir *.csv 或 dir *.txt
其他编码转换也可参考,做适当修改即可
该脚本如果执行成功(不报错),请勿重复执行!否则可能会造成真的乱码。
4.源数据中每个字段对应数据都带有双引号 结果导致导入es后去kibana中查看 结果geo_point无法生效
原因是因为es中索引的_mapping和logstash中config的字段当中没有包含lon、lat字段 导致的
或
重新创建索引及_mappnig 再试
#临时记录
elasticsearch {
hosts => ["127.0.0.1:9200"]
index => "geochina"
document_type => "infors"
}
stdout {
codec=>rubydebug
}
codec => json
codec => plain {
charset => "UTF-8"
}
codec => plain {
charset => "GBK"
}
codec => json_lines
ELK-1.5亿数据处理完整过程
最新推荐文章于 2024-05-15 10:19:26 发布