ganglia是干什么的就不用介绍了,网上很多,我之前的博客里也有详细的讲解。这里主要来讨论一下,ganglia监控本身之外的东西。正刚情况ganglia能够监控这台机器的基本信息,什么cpu,load,memery,swap什么的,都非常详细,但是在实践的工作中,这些数据只能反应出这台机器的基本情况,很难监控到具体业务,所以针对某一些具体业务的监控还是非常重要的。最初我还不是特别明白怎么自己加自己的业务监控,只能在网上找代码,直到我看了mysql的监控之后。
#! /bin/bash
# # Get statistics from MySQL and feed them into Ganglia for monitoring.
# # To use, simply adjust the path to gmetric and mysql credentials as appropriate.
# # Author: David Winterbottom (david.winterbottom@gmail.com)
# Site: http://codeinthehole.com
# Config
# 定义变量 GMETRIC是ganglia的一工具,用它来向ganglia转入数据 其它的看名字应该就知道是干什么的了。
declare -r GMETRIC=/usr/bin/gmetric
declare -r NEW_DATA_FILE=/tmp/mysql-stats.new
declare -r OLD_DATA_FILE=/tmp/mysql-stats.old
declare -r MYSQL_USER="root"
declare -r MYSQL_PASSWORD="insert-password-here"
# Sanity checks 这里是判断gmtric是否存在。正常情况下cluster下都有,但是node机器上可能会没有,这个没关系,直接拷贝一个过去就行了。如果配置文件不对,在对应的文件目录下来一个link -s 就好了
if test -z "$GMETRIC" ; then
printf "The command $GMETRIC is not available";
exit 192
fi
# Function for submiting metrics 通过metric向ganglia提供数据。其实我要说的精髓就是这里。gmetric --type 类型 --name 名字 --value 值 --unit 更新频率
function record_value {
if [ $# -lt 1 ]; then
printf "You must specify a look-up value\n"
exit 192
fi
LOOKUP_VAR=$1
GANGLIA_NAME=${2-unspecified}
GANGLIA_TYPE=${3-float}
GANGLIA_UNITS=${4-units}
GANGLIA_VALUE=`grep "$LOOKUP_VAR[^_]" "$NEW_DATA_FILE" | awk '{print $2}'` printf " * $GANGLIA_NAME: $GANGLIA_VALUE\n"
$GMETRIC --type "$GANGLIA_TYPE" --name "$GANGLIA_NAME" --value $GANGLIA_VALUE --unit "$GANGLIA_UNITS" }
# Function for submitting delta metrics 这个就是文件中收集数据并向ganglia发送数据了。
function record_value_rate
{
if [ $# -lt 1 ]; then
printf "You must specify a look-up value\n"
exit 192
fi
MYSQL_VAR=$1
GANGLIA_NAME=${2-unspecified}
GANGLIA_TYPE=${3-float}
GANGLIA_UNITS=${4-"per second"}
# Get values from old and new files
PREVIOUS_VALUE=`grep "$MYSQL_VAR[^_]" "$OLD_DATA_FILE" | awk '{print $2}'`
NEW_VALUE=`grep "$MYSQL_VAR[^_]" "$NEW_DATA_FILE" | awk '{print $2}'`
DELTA_VALUE=$[ $NEW_VALUE-$PREVIOUS_VALUE ]
PREVIOUS_TIMESTAMP=`date -r "$OLD_DATA_FILE" +%s`
NEW_TIMESTAMP=`date -r "$NEW_DATA_FILE" +%s`
DELTA_TIMESTAMP=$[ $NEW_TIMESTAMP-$PREVIOUS_TIMESTAMP ]
if [ $DELTA_VALUE -lt 0 ] || [ $DELTA_TIMESTAMP -lt 0 ]; then
# Something strange here - MYSQL may just have started. Ignore for now
printf "Weird data value - skipping\n"
else
# Need to pipe to bc to perform floating point operations
DELTA_RATE=`echo "scale=4; $DELTA_VALUE/$DELTA_TIMESTAMP" | bc -l`
printf " * $GANGLIA_NAME -- Previous value: $PREVIOUS_VALUE, new value: $NEW_VALUE, delta: $DELTA_VALUE, previous timestamp: $PREVIOUS_TIMESTAMP, new timestamp: $NEW_TIMESTAMP, delta: $DELTA_TIMESTAMP, $DELTA_RATE per second\n"
$GMETRIC --type "$GANGLIA_TYPE" --name "$GANGLIA_NAME" --value $DELTA_RATE --unit "$GANGLIA_UNITS"
fi
}
# Read MySQL statistics into a temporary file 这个基本都能看懂了,就是去mysql查看信息
mysql --user=$MYSQL_USER --password=$MYSQL_PASSWORD --execute "SHOW GLOBAL STATUS" > "$NEW_DATA_FILE"
# Submit metrics 监控mysql的数据
record_value_rate "Connections" "MYSQL_CONNECTIONS" "float" "Connections/sec"
record_value_rate "Com_update" "MYSQL_UPDATE_QUERIES" "float" "Queries/sec"
record_value_rate "Com_select" "MYSQL_SELECT_QUERIES" "float" "Queries/sec"
record_value_rate "Com_insert" "MYSQL_INSERT_QUERIES" "float" "Queries/sec"
record_value_rate "Com_delete" "MYSQL_DELETE_QUERIES" "float" "Queries/sec"
record_value_rate "Created_tmp_tables" "MYSQL_CREATED_TMP_TABLES" "float" "Tables created/sec"
record_value_rate "Slow_queries" "MYSQL_SLOW_QUERIES" "float" "Queries/sec"
record_value_rate "Qcache_hits" "MYSQL_QUERY_CACHE_HITS" "float" "Hits/sec"
record_value "Qcache_queries_in_cache" "MYSQL_QUERIES_IN_CACHE" "float" "Queries"
record_value_rate "Questions" "MYSQL_QUESTIONS" "float" "Questions/sec"
record_value_rate "Threads_connected" "MYSQL_THREADS_CONNECTED" "float" "Threads connected/sec"
record_value "Threads_running" "MYSQL_THREADS_RUNNING" "float" "Threads running"
# Copy data 覆盖旧数据。
cp "$NEW_DATA_FILE" "$OLD_DATA_FILE"
代码分析完了,我相信只要稍微懂点shell和mysql的人都能看懂。而通过这个脚本,我学到了什么呢?1,复制,用这套代码稍微修改一下是不是可以监控redis,tomcat什么的呢? 2,关键点,其实就一个gmetric。把这个的参数弄懂完就差不多了。我想了一下,参数是不是只有四个呢?本着寻根究底的精神,运行了一下 gmetric --help (大多数情况下,这个都是很有用的)于是出现以下结果
-h, --help Print help and exit
-V, --version Print version and exit
-c, --conf=STRING The configuration file to use for finding send channels
(default=`/usr/local/etc/gmond.conf')
-n, --name=STRING Name of the metric
-v, --value=STRING Value of the metric
-t, --type=STRING Either
string|int8|uint8|int16|uint16|int32|uint32|float|double
-u, --units=STRING Unit of measure for the value e.g. Kilobytes, Celcius
(default=`')
-s, --slope=STRING Either zero|positive|negative|both (default=`both')
-x, --tmax=INT The maximum time in seconds between gmetric calls
(default=`60')
-d, --dmax=INT The lifetime in seconds of this metric (default=`0')
-g, --group=STRING Group of the metric
-C, --cluster=STRING Cluster of the metric
-D, --desc=STRING Description of the metric
-T, --title=STRING Title of the metric
-S, --spoof=STRING IP address and name of host/device (colon separated) we
are spoofing (default=`')
-H, --heartbeat spoof a heartbeat message (use with spoof option)
这里面具体是干什么的,就不用多介绍了吧,后面都有解释。
那么ganglia 监控任何业务的方法就有了,步骤如下:
0,想好要监控的内容
1,获取数据,每个应用基本都有方法获取此应用的状态的方法。
2,提取数据,通过脚本把你要监控的数据提取出来。
3,通过gmetric把数据提交
4,通过crontab定期执行上面三步。
示例:监控squid
0,squid有什么要监控的呢? hits,memory,file等。以File descriptor usage for squid 为列吧。
1,获取数据 ./squidclient -p端口(不要就是其默认值) -hip(不要就是本身) mgr:info
2,提取数据,这个需要shell比较熟悉,也可以通过其它方法来搞 我这个算是比较简单的 ./squidclient -p80 -hxxx.xxx.com mgr:info|tail -n12|head -n7|awk -F":" '{print $2}'
3,提交 具体参数可以查看上面的--help的内容
4,写crontab