应用场景:分析Elasticsearch 中的索引每天占用的存储量, 存储量单位为mb, 按照存储量降序排列
1.读取文件
import pandas as pd
import numpy as np
result = pd.read_table('./analysis.txt', sep='\s+')
result.head(10)
输出:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open taskmanager-20181202 7C6CUgPESu2Go35zottXXw 1 1 5829 0 1.3mb 703.6kb
green open accor-20181202 KPspuAYXRrGuZ70LQamVoQ 1 1 56188745 0 36.2gb 18.1gb
- 写处理函数:store_size 格式化成mb 为单位的
def gb_to_mb(store_size):
if store_size.find("gb") > -1:
return float(store_size[0:store_size.find("gb")])*1024
elif store_size.find("mb") > -1:
return float(store_size[0:store_size.find("mb")])
else:
return float(store_size[0:store_size.find("kb")])/1024
- 应用 2 中的函数
result['store_size_format'] = result['store.size'].map(gb_to_mb)
输出:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size store_size_format
green open taskmanager-20181202 7C6CUgPESu2Go35zottXXw 1 1 5829 0 1.3mb 703.6kb 1.3
green open accor-20181202 KPspuAYXRrGuZ70LQamVoQ 1 1 56188745 0 36.2gb 18.1gb 37068.8
- 将结果写入另外一个文件
result[['index', 'pri', 'rep', 'store_size_format']].sort_values(by='store_size_format', ascending=False).to_csv("./process_result.csv")