TASK:
统计性能测试过程中,整个应用(learn app)占用host(例如:ec2机器)的cpu使用率/memory 平均值
环境:
learn app 部署在mesos-cluster ,为一个container。
mesos-cluster 使用aws ec2的机器部署。
开发小工具,初略统计(类似商用newrelic-infra agent 的功能)
方案:
每隔3分钟,获取cpu使用率 、memory占用值,然后统计平均值
memory 占用值,通过mesos接口/monitor/statistics 字段mem_rss_bytes 实时获取
cpu使用率 ,通过mesos接口/monitor/statistics 字段 cpus_system_time_secs cpus_user_time_secs 值经过 小计算可得
cpu usage=(current的cpu总耗时- 前一个三分钟cpu总耗时)/( current 时间戳-前一个三分钟时间戳)
knowledge:
1) optparse套件,
它功能强大,而且易于使用,可以方便地生成标准的、符合Unix/Posix 规范的命令行说明。
parser = OptionParser()
parser.add_option("-u", "--url", dest="api_url", help="Mesos api url")
parser.add_option("-e", "--executor-id", dest="executor_id", help="Executor id")
(options, args) = parser.parse_args()
2)标准HTTP API 调用
mesos-cluster 的agent-node (container运行的实体)
agent-node ipaddress:10.146.20.205
http://10.146.20.205:5051/monitor/statistics
可以返回每个container task 的resource信息,形如:
-{
executor_id: "performance_jenkins_feature_perf-testing-downgrade_learn.be8907ab-e3d5-11e7-a9bf-469c20cc1bbb",
executor_name: "Command Executor (Task: performance_jenkins_feature_perf-testing-downgrade_learn.be8907ab-e3d5-11e7-a9bf-469c20cc1bbb) (Command: sh -c '/mnt/mesos/s...')",
framework_id: "2df1664b-918a-4602-bd18-5d1c097352cb-34970",
source: "performance_jenkins_feature_perf-testing-downgrade_learn.be8907ab-e3d5-11e7-a9bf-469c20cc1bbb",
statistics: {
cpus_limit: 4.1,
cpus_system_time_secs: 10279.03,
cpus_user_time_secs: 39016.26,
mem_limit_bytes: 17213423616,
mem_rss_bytes: 9182752768,
timestamp: 1513676172.31889
}
},
说明:cpus_user_time_secs 该时刻user耗用cpu总时间
cpus_system_time_secs 该时刻 system耗用cpu总时间
mem_rss_bytes 该时刻memory耗用值
主任务程序
while True:
rsp = poll(options.api_url)
e = findExecutorMetric(rsp, options.executor_id)
stat = Statistics()
stat.fromJson(e)
if current != None:
stat.diffUse(current)
current = stat
dumpStat(current)
time.sleep(3)
定义的class
# Statistics object
class Statistics:
cpus_user_time_secs = 0.0
cpus_system_time_secs = 0.0,
cpus_limit = 0.0
cpus_total_usage = 0.0
mem_rss_bytes = 0.0
mem_limit_bytes = 0.0
disk_used_bytes = 0.0
disk_limit_bytes = 0.0
timestamp = 0.0
def diffUse(self, stat):
cpus_user_usage = (self.cpus_user_time_secs - stat.cpus_user_time_secs) / (self.timestamp - stat.timestamp)
cpus_system_usage = (self.cpus_system_time_secs - stat.cpus_system_time_secs) / (self.timestamp - stat.timestamp)
self.cpus_total_usage = cpus_user_usage + cpus_system_usage
def fromJson(self, e):
stat = e['statistics']
self.cpus_user_time_secs = stat['cpus_user_time_secs']
self.cpus_system_time_secs = stat['cpus_system_time_secs']
self.cpus_limit = stat['cpus_limit']
self.cpus_total_usage = 0.0
self.mem_rss_bytes = stat['mem_rss_bytes']
self.mem_limit_bytes = stat['mem_limit_bytes']
self.disk_used_bytes = 0.0
self.disk_limit_bytes = 0.0
self.timestamp = stat['timestamp']
被调用函数
# search element by executor_id
def findExecutorMetric(rsp, executor_id):
for e in rsp:
if e['executor_id'] == executor_id:
return e
return None
def poll(api_url):
return json.loads(urllib2.urlopen(api_url).read())
def dumpStat(stat):
list = (str(stat.timestamp), str(stat.cpus_total_usage), str(stat.mem_rss_bytes))
s = ','
print s.join(list)
用户使用指南:
步骤1,压测开始前,执行
python metric-monitor.py -u http://10.146.20.205:5051/monitor/statistics -e performance_jenkins_feature_perf-testing-downgrade_learn.be8907ab-e3d5-11e7-a9bf-469c20cc1bbb >> learn_perf_testing_downgrade_20171219-1.txt &
-e :一个container任务的executor-id
-u :mesos-cluster agent-node api访问
步骤2,压测结束,停止脚本执行,此时原始数据收集完成
步骤3,数据计算分析
使用excel 从txt文件导入数据绘制图标功能 并计算average。
绘制趋势图:
Data->Get external data txt文本数据转化为excel列数据
insert ->chart ->line 选择上一步导入的列数据,可绘制成趋势图
计算average :
选中列数据 及下一个空的单元格 =average(H2:H798)
[1].http://mesos.apache.org/documentation/latest/endpoints/slave/monitor/statistics/
[2].http://www.cnblogs.com/captain_jack/archive/2011/01/11/1933366.html