夜莺安装与配置

基础介绍

基于open-falcon的升级版,亮点:
新增组织目录树,管理更加方便;
新增告警策略条件表达式,与条件;
新增留观时长;

缺点:
当前正在高速迭代期,可能部分功能在高并发情况下会发生不可描述现象.
当前未与k8s贴合,流失k8s市场.
虽说开源,能满足运维日常监控,但要与自身业务紧密贴合,还需要有开发技术能力的基佬完成自身业务需求二次开发,

记录时间: 2020.6.30
当前版本: v2.7.2


部署环境

实例位置内网IP实例配置业务类型服务组件说明
滴滴云10.255.0.1831C/2G/40G/1M依赖组件redis/nginx/mysql由于经费有限,该服务部署在滴滴云同一个实例
滴滴云10.255.0.1831C/2G/40G/1M监控服务端n9e-tsdb/n9e-index/n9e-monapi/n9e-collector/n9e-judge/-n9e-transfer由于经费有限,该服务部署在滴滴云同一个实例
阿里云172.26.155.2282C/4G/40G/1M被监控端/客户端n9e-collector/prometheus-exporter-collector/redis-exporter用于在阿里云采集测试各个服务上报情况
UCLOUD10.7.23.62C/4G/40G/1M被监控端/客户端n9e-collector/prometheus-exporter-collector/redis-exporter用于在UCLOUD采集测试各个服务上报情况

go 环境安装与配置

下载
cd /opt && wget https://studygolang.com/dl/golang/go1.15.linux-amd64.tar.gz

解压
tar -zxvf go1.15.linux-amd64.tar.gz

重命名
mv go1.15.linux-amd64 go

配置go的工作空间
go的代码必须在GOPATH中,也就是一个工作目录,目录包含三个子目录

$GOPATH
    src        存放go源代码的目录,存放golang项目的目录,所有项目都放到gopath的src目录下
    bin        在go install后生成的可执行文件的目录
    pkg        编译后生成的,源码文件,如.a

创建 /opt/gocode/{src,bin,pkg},用于设置GOPATH为/opt/godocer
mkdir -p /opt/gocode/{src,bin,pkg}

/opt/gocode/
├── bin
├── pkg
└── src

设置GOPATH环境变量
修改 /etc/profile 系统环境变量文件,写入GOPATH信息以及go sdk路径


[root@ZBX-PRMS go]# cat /etc/profile.d/go_profile.sh
#!/bin/bash

# Golang源代码目录,安装目录
export GOROOT=/opt/go

# Golang项目代码目录
export GOPATH=/opt/gocode

# Linux环境变量
export PATH=$GOROOT/bin:$PATH

#go install后生成的可执行命令存放路径
export GOBIN=$GOPATH/bin


查看go环境变量路径
which go

查看go语言环境信息
go env

查看go版本,查看是否安装成功
go version

配置 go module 国内镜像

原生golang地址会被墙(GOPROXY="https://proxy.golang.org,direct"),无法get.使用阿里云的go镜像地址加速

go env -w GO111MODULE=on
go env -w GOPROXY=https://mirrors.aliyun.com/goproxy/,direct

源码编译夜莺监控项目代码

# 该项目没有使用go module管理,需要放到github.com/didi下编译
echo $GOPATH
/opt/gocode

mkdir -p $GOPATH/src/github.com/didi
cd $GOPATH/src/github.com/didi

# clone代码并编译打包,pack时会自动build,打包成一个tar.gz
git clone https://github.com/didi/nightingale.git

如本机未FQ,可能克隆比较慢,可以使用下列克隆方法尝试,50%几率会加速克隆(嘻嘻)
git clone https://github.com.cnpmjs.org/didi/nightingale.git

cd nightingale && ./control build && ./control pack

安装数据库

如果已有数据库则忽略


初始化数据库

mysql -uroot -p -P 3306 < $GOPATH/src/github.com/didi/nightingale/sql/n9e_hbs.sql
mysql -uroot -p -P 3306 < $GOPATH/src/github.com/didi/nightingale/sql/n9e_mon.sql
mysql -uroot -p -P 3306 < $GOPATH/src/github.com/didi/nightingale/sql/n9e_uic.sql

agc&P8m.

reids 安装与配置

yum install -y redis

vi /etc/redis.conf
daemonize yes

systemctl enable redis && systemctl restart redis && systemctl status redis

nginx 安装与配置

 
yum -y install nginx
systemctl enable nginx && systemctl restart nginx && systemctl status nginx


[root@10-255-0-183 pub]# cat /etc/nginx/nginx.conf
user root;

worker_processes auto;
worker_cpu_affinity auto;

error_log /var/log/nginx/error.log;
pid /run/nginx.pid;

include /usr/share/nginx/modules/*.conf;

events {
    use epoll;
    worker_connections 204800;
}

http {
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile            on;
    tcp_nopush          on;
    tcp_nodelay         on;
    keepalive_timeout   65;
    types_hash_max_size 2048;

    include             /etc/nginx/mime.types;
    default_type        application/octet-stream;

    include /etc/nginx/conf.d/*.conf;


    proxy_connect_timeout   500ms;
    proxy_send_timeout      1000ms;
    proxy_read_timeout      3000ms;
    proxy_buffers           64 8k;
    proxy_busy_buffers_size    128k;
    proxy_temp_file_write_size 64k;
    proxy_redirect off;
    proxy_next_upstream error invalid_header timeout http_502 http_504;

    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Real-Port $remote_port;
    proxy_set_header Host $http_host;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    gzip on;
    gzip_min_length 1k;
    gzip_buffers 4 16k;
    gzip_comp_level 2;
    gzip_types application/javascript application/x-javascript text/css text/javascript image/jpeg image/gif image/png;
    gzip_vary off;
    gzip_disable "MSIE [1-6]\.";

    upstream n9e.monapi {
        server 127.0.0.1:5800;
        keepalive 10;
    }

    upstream n9e.index {
        server 127.0.0.1:5830;
        keepalive 10;
    }

    upstream n9e.transfer {
        server 127.0.0.1:5810;
        keepalive 10;
    }

    server {
        listen       81 default_server;
        server_name  _;
        root         /usr/share/nginx/html;

        # Load configuration files for the default server block.
        include /etc/nginx/default.d/*.conf;

        location / {
            root /opt/gocode/src/github.com/didi/nightingale/pub;
        }

        location /api/portal {
            proxy_pass http://n9e.monapi;
        }

        location /api/index {
            proxy_pass http://n9e.index;
        }

        location /api/transfer {
            proxy_pass http://n9e.transfer;
        }
    }

}


重载配置文件
nginx -t
nginx -s reload

修改配置文件

 
配置文件在etc目录,着重看一下mysql.yml,修改mysql访问的用户名和密码,另外redis密码默认为空,如果您配置了redis的访问密码,需要对应的修改monapi和judge的配置文件,将redis密码配置好。另外在etc/address.yml下可以看到各个模块监听的端口,如果与本地其他服务端口冲突了,就需要手工修改一下啦。

mysql.yml

[root@10-255-0-183 etc]# cat mysql.yml
---
uic:
  addr: "root:agc&P8m.@tcp(127.0.0.1:3306)/n9e_uic?charset=utf8&parseTime=True&loc=Asia%2FShanghai"
  max: 16
  idle: 4
  debug: false
mon:
  addr: "root:agc&P8m.@tcp(127.0.0.1:3306)/n9e_mon?charset=utf8&parseTime=True&loc=Asia%2FShanghai"
  max: 16
  idle: 4
  debug: false
hbs:
  addr: "root:agc&P8m.@tcp(127.0.0.1:3306)/n9e_hbs?charset=utf8&parseTime=True&loc=Asia%2FShanghai"
  max: 16
  idle: 4
  debug: false

启动各模块进程

 
发布包里默认提供了一个control脚本,用来启停服务,直接执行 ./control start all 即可启动所有模块,./control status 可以查看各模块进程是否都已启动,夜莺共有6个核心模块,注意一下进程数是否正确。最后安装一下nginx,nginx有个示例配置文件在etc/nginx.conf,注意修改pub目录指向真实路径。至此,单机版就部署成功了。访问nginx即可看到页面。如果发现nginx日志里出现权限报错,检查机器selinux配置,尝试关闭selinux解决。
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux

使用服务方式启动
修改 etc/service 目录下的各个服务启动文件后复制到 /usr/lib/systemd/system

然后在 etc/service 目录下设置开机启动这些服务
[root@10-255-0-183 service]# systemctl enable *.service
Created symlink from /etc/systemd/system/multi-user.target.wants/n9e-index.service to /usr/lib/systemd/system/n9e-index.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/n9e-judge.service to /usr/lib/systemd/system/n9e-judge.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/n9e-monapi.service to /usr/lib/systemd/system/n9e-monapi.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/n9e-transfer.service to /usr/lib/systemd/system/n9e-transfer.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/n9e-tsdb.service to /usr/lib/systemd/system/n9e-tsdb.service.

查看关键服务开机启动列表
[root@10-255-0-183 service]# systemctl list-unit-files | grep -E "n9e|sender|mysql|redis|nginx"
dingtalk-sender.service                       enabled
mail-sender.service                           enabled
mysql.service                                 enabled
mysqld.service                                enabled
mysqld@.service                               disabled
n9e-collector.service                         enabled
n9e-index.service                             enabled
n9e-judge.service                             enabled
n9e-monapi.service                            enabled
n9e-transfer.service                          enabled
n9e-tsdb.service                              enabled
nginx.service                                 enabled
redis-sentinel.service                        disabled
redis.service                                 enabled

执行重启操作后检查相关服务是否正常启动
reboot

查看服务进程
[root@10-255-0-183 ~]# ps -ef | grep -E "n9e|sender|mysql|redis|nginx" | grep -v "grep" --color=auto | sort -n
mysql     3103     1 22 14:28 ?        00:01:10 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
redis     2986     1  0 14:28 ?        00:00:00 /usr/bin/redis-server 127.0.0.1:6379
root      2877     1  0 14:28 ?        00:00:00 nginx: master process /usr/sbin/nginx
root      2878  2877  0 14:28 ?        00:00:00 nginx: worker process
root      2979     1  1 14:28 ?        00:00:04 /opt/gocode/src/github.com/didi/nightingale/n9e-collector
root      2980     1  0 14:28 ?        00:00:00 /opt/gocode/src/github.com/n9e/mail-sender/mail-sender
root      2983     1  0 14:28 ?        00:00:01 /opt/gocode/src/github.com/didi/nightingale/n9e-transfer
root      2985     1  0 14:28 ?        00:00:00 /opt/gocode/src/github.com/n9e/dingtalk-sender/dingtalk-sender
root      2987     1  0 14:28 ?        00:00:00 /opt/gocode/src/github.com/didi/nightingale/n9e-judge
root      2988     1  0 14:28 ?        00:00:00 /opt/gocode/src/github.com/didi/nightingale/n9e-index
root      2989     1  0 14:28 ?        00:00:00 /opt/gocode/src/github.com/didi/nightingale/n9e-tsdb
root      3141     1  0 14:28 ?        00:00:01 /opt/gocode/src/github.com/didi/nightingale/n9e-monapi

各服务启动文件信息

n9e-collector.service

 
[root@10-255-0-183 service]# cat n9e-collector.service
[Unit]
Description=Nightingale collector
After=network-online.target
Wants=network-online.target

[Service]
# modify when deploy in prod env
User=root
Group=root

Type=simple
ExecStart=/opt/gocode/src/github.com/didi/nightingale/n9e-collector
WorkingDirectory=/opt/gocode/src/github.com/didi/nightingale

Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
[root@10-255-0-183 service]#

n9e-index.service

 
[root@10-255-0-183 service]# cat n9e-index.service
[Unit]
Description=Nightingale index
After=network-online.target
Wants=network-online.target

[Service]
# modify when deploy in prod env
User=root
Group=root

Type=simple
ExecStart=/opt/gocode/src/github.com/didi/nightingale/n9e-index
WorkingDirectory=/opt/gocode/src/github.com/didi/nightingale

Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
[root@10-255-0-183 service]#

n9e-judge.service

 
[root@10-255-0-183 service]# cat n9e-judge.service
[Unit]
Description=Nightingale judge
After=network-online.target
Wants=network-online.target

[Service]
# modify when deploy in prod env
User=root
Group=root

Type=simple
ExecStart=/opt/gocode/src/github.com/didi/nightingale/n9e-judge
WorkingDirectory=/opt/gocode/src/github.com/didi/nightingale

Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
[root@10-255-0-183 service]#

n9e-monapi.service

 
[root@10-255-0-183 service]# cat n9e-monapi.service
[Unit]
Description=Nightingale monapi
After=network-online.target
Wants=network-online.target

[Service]
# modify when deploy in prod env
User=root
Group=root

Type=simple
ExecStart=/opt/gocode/src/github.com/didi/nightingale/n9e-monapi
WorkingDirectory=/opt/gocode/src/github.com/didi/nightingale

Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
[root@10-255-0-183 service]#

n9e-transfer.service

 
[root@10-255-0-183 service]# cat n9e-transfer.service
[Unit]
Description=Nightingale transfer
After=network-online.target
Wants=network-online.target

[Service]
# modify when deploy in prod env
User=root
Group=root

Type=simple
ExecStart=/opt/gocode/src/github.com/didi/nightingale/n9e-transfer
WorkingDirectory=/opt/gocode/src/github.com/didi/nightingale

Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
[root@10-255-0-183 service]#

 

n9e-tsdb.service

 
[root@10-255-0-183 service]# cat n9e-tsdb.service
[Unit]
Description=Nightingale tsdb
After=network-online.target
Wants=network-online.target

[Service]
# modify when deploy in prod env
User=root
Group=root

Type=simple
ExecStart=/opt/gocode/src/github.com/didi/nightingale/n9e-tsdb
WorkingDirectory=/opt/gocode/src/github.com/didi/nightingale

Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
[root@10-255-0-183 service]#

 

配置文件详解

address模块配置

 
[root@10-255-0-183 etc]# cat address.yml
---
monapi:
  http: 0.0.0.0:5800
  addresses:
    - 0.0.0.0

transfer:
  http: 0.0.0.0:5810
  rpc: 0.0.0.0:5811
  addresses:
    - 0.0.0.0

tsdb:
  http: 0.0.0.0:5820
  rpc: 0.0.0.0:5821

index:
  http: 0.0.0.0:5830
  rpc: 0.0.0.0:5831

judge:
  http: 0.0.0.0:5840
  rpc: 0.0.0.0:5841

collector:
  http: 0.0.0.0:2058


[root@10-255-0-183 etc]#

collector模块配置

配置采集相关

 
[root@10-255-0-183 etc]# cat collector.yml
logger:
  dir: logs/collector
  level: WARNING
  keepHours: 2

identity:
  specify: ""
  #shell: ifconfig `route|grep '^default'|awk '{print $NF}'`|grep inet|awk '{print $2}'|awk -F ':' '{print $NF}'|head -n 1
  shell: ips=`/sbin/ifconfig -a | grep -v "docker" | grep -v "veth" | grep -v "lo:" | grep -v "enp*" | grep -v "^br" | grep -v "127.0.0.1" | grep -v "172.16.0.1" | grep -v "192.168.0.1" | grep -v  "172.18.0.1" | grep -v "172.19.0.1" | grep -v "172.17.0.1" | grep -v inet6 | grep "inet" | awk '{print $2}' | tr -d "addr:"`; host_name=`hostname --fqdn`; echo "${host_name}-${ips}"

stra:
  enable: true
  portPath: ./etc/port
  procPath: ./etc/proc
  logPath: ./etc/log

sys:
  # timeout in ms
  # interval in second
  timeout: 1000
  interval: 20
  plugin: ./plugin

  # monitor nic which filtered by prefix
  ifacePrefix:
    - eth
    - em
    - ens

  # ignore disk mount point
  mountIgnore:
    prefix:
      - /var/lib
      - /run
    # collect anyway
    exclude: []

  ignoreMetrics:
    - cpu.core.idle
    - cpu.core.util
    - cpu.core.sys
    - cpu.core.user
    - cpu.core.nice
    - cpu.core.guest
    - cpu.core.irq
    - cpu.core.softirq
    - cpu.core.iowait
    - cpu.core.steal
[root@10-255-0-183 etc]#

索引模块配置

配置索引相关

 
[root@10-255-0-183 etc]# cat index.yml
logger:
  dir: logs/index
  level: WARNING
  keepHours: 2
identity:
  specify: ""
  #shell: ifconfig `route|grep '^default'|awk '{print $NF}'`|grep inet|awk '{print $2}'|awk -F ':' '{print $NF}'|head -n 1
  shell: ips=`/sbin/ifconfig -a | grep -v "docker" | grep -v "veth" | grep -v "lo:" | grep -v "enp*" | grep -v "^br" | grep -v "127.0.0.1" | grep -v "172.16.0.1" | grep -v "192.168.0.1" | grep -v  "172.18.0.1" | grep -v "172.19.0.1" | grep -v "172.17.0.1" | grep -v inet6 | grep "inet" | awk '{print $2}' | tr -d "addr:"`; host_name=`hostname --fqdn`; echo "${host_name}-${ips}"
[root@10-255-0-183 etc]#

告警判断模块配置

 
[root@10-255-0-183 etc]# cat judge.yml
query:
  connTimeout: 1000
  callTimeout: 2000
  indexCallTimeout: 2000

redis:
  addrs:
    - 127.0.0.1:6379
  db: 0
  pass: ""
  # timeout:
  #   conn: 500
  #   read: 3000
  #   write: 3000

identity:
  specify: ""
  #shell: ifconfig `route|grep '^default'|awk '{print $NF}'`|grep inet|awk '{print $2}'|awk -F ':' '{print $NF}'|head -n 1
  shell: ips=`/sbin/ifconfig -a | grep -v "docker" | grep -v "veth" | grep -v "lo:" | grep -v "enp*" | grep -v "^br" | grep -v "127.0.0.1" | grep -v "172.16.0.1" | grep -v "192.168.0.1" | grep -v "172.18.0.1" | grep -v "172.19.0.1" | grep -v "172.17.0.1" | grep -v inet6 | grep "inet" | awk '{print $2}' | tr -d "addr:"`; host_name=`hostname --fqdn`; echo "${host_name}-${ips}"

logger:
  dir: logs/judge
  level: WARNING
  keepHours: 2
[root@10-255-0-183 etc]#

monapi模块配置

配置接口服务,并启用告警多通道

 
[root@10-255-0-183 etc]# cat monapi.yml
---
salt: "PLACE_SALT"

logger:
  dir: "logs/monapi"
  level: "WARNING"
  keepHours: 24

http:
  secret: "PLACE_SECRET"

# for ldap authorization
ldap:
  host: "ldap.example.org"
  port: 389
  baseDn: "dc=example,dc=org"
  # AD: manange@example.org
  bindUser: "cn=manager,dc=example,dc=org"
  bindPass: "*******"
  # openldap: (&(uid=%s))
  # AD: (&(sAMAccountName=%s))
  authFilter: "(&(uid=%s))"
  attributes:
    dispname: "cn"
    email: "mail"
    phone: "mobile"
    im: ""
  coverAttributes: false
  autoRegist: false
  tls: false
  startTLS: false

# notify support: voice, sms, mail, im
# if we have all of notice channel
notify:
   p1: ["voice", "sms", "mail", "im"]
   p2: ["sms", "mail", "im"]
   p3: ["mail", "im"]

# if we only have mail channel
#notify:
#  p1: ["mail"]
#  p2: ["mail"]
#  p3: ["mail"]

# addresses accessible using browsers
link:
  stra: http://n9e.cqops.club:81/#/monitor/strategy/%v
  event: http://n9e.cqops.club:81/#/monitor/history/his/%v
  claim: http://n9e.cqops.club:81/#/monitor/history/cur/%v

# for alarm event and message queue
redis:
  addr: "127.0.0.1:6379"
  db: 0
  pass: ""
  # in ms
  # timeout:
  #   conn: 500
  #   read: 3000
  #   write: 3000

tokens:
  - abc

mysql数据库连接配置

配置数据连接地址

 
[root@10-255-0-183 etc]# cat mysql.yml
---
uic:
  addr: "root:agc&P8m@tcp(127.0.0.1:3306)/n9e_uic?charset=utf8&parseTime=True&loc=Asia%2FShanghai"
  max: 16
  idle: 4
  debug: false
mon:
  addr: "root:agc&P8m@tcp(127.0.0.1:3306)/n9e_mon?charset=utf8&parseTime=True&loc=Asia%2FShanghai"
  max: 16
  idle: 4
  debug: false
hbs:
  addr: "root:agc&P8m@tcp(127.0.0.1:3306)/n9e_hbs?charset=utf8&parseTime=True&loc=Asia%2FShanghai"
  max: 16
  idle: 4
  debug: false
[root@10-255-0-183 etc]#

transfer模块配置

配置数据转发

 
[root@10-255-0-183 etc]# cat transfer.yml
backend:
  maxConns: 20000
  # in ms
  # connTimeout: 1000
  # callTimeout: 3000
  cluster:
    tsdb01: 127.0.0.1:5821
  influxdb:
    enabled: false
    username: "influx"
    password: "admin123"
    precision: "s"
    database: "n9e"
    address: "http://127.0.0.1:8086"

  opentsdb:
    enabled: false
    address: "127.0.0.1:4242"
  kafka:
    enabled: false
    brokersPeers: "192.168.1.1:9092,192.168.1.2:9092"
    topic: "n9e"

logger:
  dir: logs/transfer
  level: WARNING
  keepHours: 2
[root@10-255-0-183 etc]#

tsdb模块配置

配置数据存储

 
[root@10-255-0-183 etc]# cat tsdb.yml
rrd:
  storage: data/5821
cache:
  keepMinutes: 120
logger:
  dir: logs/tsdb
  level: WARNING
  keepHours: 2

[root@10-255-0-183 etc]#

搭建文件下载服务

用于其他被监控的实例可以更快速的下载打包的文件

生成密码

 
printf "admin:$(openssl passwd -crypt admin.)\n" >> /etc/nginx/conf.d/htpasswd
cat /etc/nginx/conf.d/htpasswd
admin:xxxx

新增配置文件

 
vi conf.d/download.cqops.club
[root@10-255-0-183 conf.d]# cat download.cqops.club.conf
server {
    listen       82;
    root         /opt/gocode/src/github.com/didi/nightingale/;
    server_name download.cqops.club;

    location / {
        index  index.html index.htm;

    # 配置访问验证
    auth_basic "nginx basic for n9e.cqops.club";
    auth_basic_user_file conf.d/htpasswd;

    # 打开目录浏览功能
    autoindex on;

    # 默认为on,显示出文件的确切大小,单位是bytes
    # 显示出文件的大概大小,单位是kB或者MB或者G
    autoindex_exact_size off;

    # 默认为off,显示的文件时间为GMT时间
    # 改为on后,显示的文件时间为文件的服务器时间
    autoindex_localtime on;

    # 让浏览器不保存临时文件
    add_header Cache-Control no-store;

    # 限流设置,暂不开启,如需要,自行开启即可
    #limit_conn one 8;
    #limit_rate 2048k;

    }

    error_page 404 /404.html;
        location = /40x.html {
    }

    error_page 500 502 503 504 /50x.html;
        location = /50x.html {
    }

access_log /var/log/nginx/access.log  main;

}

[root@10-255-0-183 conf.d]# pwd
/etc/nginx/conf.d
[root@10-255-0-183 conf.d]#

适配浏览器在线查看日志

 
[root@10-255-0-183 nginx]# cat mime.types

types {
    # 新增此行
    text/log log;

[root@10-255-0-183 nginx]# pwd
/etc/nginx

重载服务

 
nginx -t && nginx -s reload

使用

浏览器中使用

直接在浏览器中输入地址, 会弹出用户密码输入框, 输入即可访问
http://download.cqops.club:82

使用 wget

wget –http-user=admin –http-passwd=admin. http://download.cqops.club:82/n9e-2020-06-30-16-15-13.tar.gz

使用 curl

curl -u admin:admin. -O http://download.cqops.club:82/n9e-2020-06-30-16-15-13.tar.gz

在线动态查看日志

http://download.cqops.club:82/logs/

注意: 在线动态查看日志,务必保证日志量不是很大的情况下才使用此功能,否则,可能浏览器和服务器承受不住压力(财大气粗忽略)


交代一些额外信息

 
如果在全部对象页面看不到机器,或者看到了机器但是看不到监控数据,请检查所有的logs目录下的文件,一定可以找到线索。比如是不是ifconfig、route等命令缺失?少了net-tools库?机器环境的问题还请baidu解决哈~
 
Nightingale本身的核心模块不提供告警发送功能,只是把告警消息推送到redis里就算完事,因为不同公司的告警通道各异,有的希望用邮件接收、有的希望用短信,或者是电话、微信、钉钉、自研IM等,这块希望各公司自行适配,共建社区。邮件的发送有SMTP标准,所以初期可以使用邮件测试,这里提供一个邮件的发送模块:mail-sender 作为样例。

邮件告警通道模块配置

构建

 
cd ~ && cd $GOPATH/src && mkdir -p github.com/n9e && cd github.com/n9e
git clone https://github.com/n9e/mail-sender.git
cd mail-sender && ./control build

配置启动

 
cp /opt/gocode/src/github.com/n9e/mail-sender/etc/mail-sender.service /usr/lib/systemd/system

启动文件内容

 
[root@10-255-0-183 etc]# cat mail-sender.service
[Unit]
Description=Nightingale mail sender
After=network-online.target
Wants=network-online.target

[Service]
User=root
Group=root

Type=simple
ExecStart=/opt/gocode/src/github.com/n9e/mail-sender/mail-sender
WorkingDirectory=/opt/gocode/src/github.com/n9e/mail-sender

Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
[root@10-255-0-183 etc]# pwd
/opt/gocode/src/github.com/n9e/mail-sender/etc
[root@10-255-0-183 etc]#

重载系统服务

 
systemctl daemon-reload && systemctl enable mail-sender.service && systemctl restart mail-sender.service && systemctl status mail-sender.service

修改邮件里面的报警详情和报警策略的URL地址

 
[root@10-255-0-183 nightingale]# grep -ri "n9e.example.com" ./*
./etc/monapi.yml:  stra: http://n9e.example.com/#/monitor/strategy/%v
./etc/monapi.yml:  event: http://n9e.example.com/#/monitor/history/his/%v
./etc/monapi.yml:  claim: http://n9e.example.com/#/monitor/history/cur/%v
[root@10-255-0-183 nightingale]# pwd
/opt/gocode/src/github.com/didi/nightingale
[root@10-255-0-183 nightingale]#

测试邮件模块是否正常工作

 
./mail-sender -t you@example.com(改成自己常用的邮箱,如自己的QQ邮箱),程序会自动读取etc目录下的配置文件,发一封测试邮件给 you@example.com

使用告警策略测试邮件告警通道是否正常

 
在"告警策略"菜单中,新增一条告警策略,将阀值改成低于正常获取的值来测试邮件的发送功能是否正常
如 cpu.idle >= 1.00,该条件每一条都会满足,则可触发邮件告警
测试没问题后,将阀值改为正常值或删除测试的告警策略

邮件发送日志

 
发送的日志放在 logs/n9e-monapi 目录中,如邮件发送异常,则进入该目录查看具体详情日志

钉钉告警通道模块配置

编译

 
cd ~ && cd $GOPATH/src && mkdir -p github.com/n9e
cd github.com/n9e && git clone https://github.com/n9e/dingtalk-sender.git
cd dingtalk-sender && ./control build

获取钉钉机器人的token

 
https://oapi.dingtalk.com/robot/send?access_token=5bf0d1b423afbe82af72e99e71f892b4e7869f1c20d3bb82bd8397411f7de7ff

修改 dingtalk-sender.yml 的配置

 
[root@10-255-0-183 dingtalk-sender]# cat etc/dingtalk-sender.yml
---
logger:
  dir: "logs/dingtalk-sender"
  level: "DEBUG"
  keepHours: 24

redis:
  addr: "127.0.0.1:6379"
  pass: ""
  db: 0
  idle: 5
  timeout:
    conn: 500
    read: 3000
    write: 3000

# 这个配置不用动,worker是调用dingtalk的并发数
consumer:
  queue: "/n9e/sender/im"
  worker: 10

# dingtalk 仅支持发送钉钉群告警 建议不配置, 通过web端设置
dingtalk:
  token: "5bf0d1b423afbe82af72e99e71f892b4e7869f1c20d3bb82bd8397411f7de7ff"
  mobiles:
#    - "18500001111"


[root@10-255-0-183 dingtalk-sender]# pwd
/opt/gocode/src/github.com/n9e/dingtalk-sender

测试钉钉机器人发送是否正常

 
[root@10-255-0-183 dingtalk-sender]# ./dingtalk-sender -t 5bf0d1b423afbe82af72e99e71f892b4e7869f1c20d3bb82bd8397411f7de7ff
2020/06/19 14:39:09 maxprocs: Leaving GOMAXPROCS=1: CPU quota undefined
runner.cwd: /opt/gocode/src/github.com/n9e/dingtalk-sender
runner.hostname: 10-255-0-183
parse configuration file: /opt/gocode/src/github.com/n9e/dingtalk-sender/etc/dingtalk-sender.yml
2020-06-19 14:39:09.517400 ERROR config/funcs.go:46 test send to %s success!!!
5bf0d1b423afbe82af72e99e71f892b4e7869f1c20d3bb82bd8397411f7de7ff
[root@10-255-0-183 dingtalk-sender]#

配置启动服务

 
cp /opt/gocode/src/github.com/n9e/dingtalk-sender/etc/dingtalk-sender.service /usr/lib/systemd/system

启动文件内容

 
[root@10-255-0-183 etc]# cat dingtalk-sender.service
[Unit]
Description=Nightingale dingtalk sender
After=network-online.target
Wants=network-online.target

[Service]
User=root
Group=root

Type=simple
ExecStart=/opt/gocode/src/github.com/n9e/dingtalk-sender/dingtalk-sender
WorkingDirectory=/opt/gocode/src/github.com/n9e/dingtalk-sender

Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
[root@10-255-0-183 etc]# pwd
/opt/gocode/src/github.com/n9e/dingtalk-sender/etc
[root@10-255-0-183 etc]#

重载系统服务

 
systemctl daemon-reload && systemctl enable dingtalk-sender.service && systemctl restart dingtalk-sender.service && systemctl status dingtalk-sender.service

企业微信机器人告警通道模块配置

前提

 
请自行联系企业微信管理员添加机器人,关键信息如下:
机器人的连接地址,https://qiyeweixin.qq.com/token=xxxx

编译

 
cd $GOPATH/src
mkdir -p github.com/n9e
cd github.com/n9e
git clone https://github.com/n9e/wechatrobot-sender.git
cd wechatrobot-sender
./control build

配置

 

 

测试验证

 
/opt/gocode/src/github.com/n9e/wechatrobot-sender/wechatrobot-sender -t "企业微信机器人的token地址"

配置服务

 
[root@n9e-01 etc]# cat wechatrobot-sender.service
[Unit]
Description=Nightingale wechatrobot sender
After=network-online.target
Wants=network-online.target

[Service]
User=root
Group=root

Type=simple
ExecStart=/opt/gocode/src/github.com/n9e/wechatrobot-sender/wechatrobot-sender
WorkingDirectory=/opt/gocode/src/github.com/n9e/wechatrobot-sender

Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
[root@n9e-01 etc]# pwd
/opt/gocode/src/github.com/n9e/wechatrobot-sender/etc
[root@n9e-01 etc]#

 


企业微信应用告警通道模块配置

前提

 
请自行申请企业微信app应用,申请方法百度搜索,关键信息如下:
corp_id: "xxx"
agent_id: xxx
secret: "xxx"

编译

 
cd $GOPATH/src
mkdir -p github.com/n9e
cd github.com/n9e
git clone https://github.com/n9e/wechat-sender.git
cd wechat-sender
# 该项目不需要mod,临时关闭,如其他项目需要记得在env开启来
export GO111MODULE=off
go build

配置

 
将企业微信APP应用相关配置在/opt/gocode/src/github.com/n9e/wechat-sender/etc/wechat-sender.yml文件中配置,如下:
[root@n9e-01 etc]# cat wechat-sender.yml
---
logger:
  dir: "logs/wechat-sender"
  level: "DEBUG"
  keepHours: 24

redis:
  addr: "127.0.0.1:6379"
  pass: ""
  idle: 5
  timeout:
    conn: 500
    read: 3000
    write: 3000

# 这个配置不用动,worker是调用wechat的并发数
consumer:
  queue: "/n9e/sender/im"
  worker: 10

wechat:
  corp_id: "xxx"
  agent_id: xxx
  secret: "xxx"
[root@n9e-01 etc]# pwd
/opt/gocode/src/github.com/n9e/wechat-sender/etc
[root@n9e-01 etc]#

测试验证

 
/opt/gocode/src/github.com/n9e/wechat-sender/wechat-sender -t "企业微信的用户名,也就是用户的ID"

结果显示 success,并且能收到测试消息,表示功能正常

配置服务

 
[root@n9e-01 etc]# cat wechat-sender.service
[Unit]
Description=Nightingale wechat sender
After=network-online.target
Wants=network-online.target

[Service]
User=root
Group=root

Type=simple
ExecStart=/opt/gocode/src/github.com/n9e/wechat-sender/wechat-sender
WorkingDirectory=/opt/gocode/src/github.com/n9e/wechat-sender

Restart=always
RestartSec=1
StartLimitInterval=0

[Install]
WantedBy=multi-user.target
[root@n9e-01 etc]# pwd
/opt/gocode/src/github.com/n9e/wechat-sender/etc
[root@n9e-01 etc]#

复制


PrometheusAlert告警通道模块配置

为解决单独部署多个告警通道,维护非常繁琐的问题.在github搜索到了一个整合通道的项目,强烈推荐使用

使用容器部署

 
# clone项目源代码
git clone https://github.com/feiyu563/PrometheusAlert.git

# 更新容器镜像代码
docker pull feiyu563/prometheus-alert:latest

# 创建配置文件
mkdir /etc/prometheusalert-center/
cp PrometheusAlert/conf/app.conf /etc/prometheusalert-center/

# 修改配置文件
[root@aliyun PrometheusAlert]# cat /etc/prometheusalert-center/app.conf
#---------------------↓全局配置-----------------------
appname = PrometheusAlert
#监听端口
httpport = 8080
runmode = dev
#设置代理 proxy = http://123.123.123.123:8080
proxy =
#开启JSON请求
copyrequestbody = true
#告警消息标题
title=PrometheusAlert
#链接到告警平台地址
GraylogAlerturl=http://graylog.org
#钉钉告警 告警logo图标地址
logourl=https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png
#钉钉告警 恢复logo图标地址
rlogourl=https://raw.githubusercontent.com/feiyu563/PrometheusAlert/master/doc/alert-center.png
#短信告警级别(等于3就进行短信告警) 告警级别定义 0 信息,1 警告,2 一般严重,3 严重,4 灾难
messagelevel=3
#电话告警级别(等于4就进行语音告警) 告警级别定义 0 信息,1 警告,2 一般严重,3 严重,4 灾难
phonecalllevel=4
#默认拨打号码(页面测试短信和电话功能需要配置此项)
defaultphone=xxxxxxxx
#故障恢复是否启用电话通知0为关闭,1为开启
phonecallresolved=0
#自动告警抑制(自动告警抑制是默认同一个告警源的告警信息只发送告警级别最高的第一条告警信息,其他消息默认屏蔽,这么做的目的是为了减少相同告警来源的消息数量,防止告警炸弹,0为关闭,1为开启)
silent=0
#是否前台输出file or console
logtype=file
#日志文件路径
logpath=logs/prometheusalertcenter.log
#转换Prometheus告警消息的时区为CST时区(如默认已经是CST时区,请勿开启)
prometheus_cst_time=0

#---------------------↓webhook-----------------------
#是否开启钉钉告警通道,可同时开始多个通道0为关闭,1为开启
open-dingding=1
#默认钉钉机器人地址
ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxx
#是否开启 @所有人(0为关闭,1为开启)
dd_isatall=1

#是否开启微信告警通道,可同时开始多个通道0为关闭,1为开启
open-weixin=1
#默认企业微信机器人地址
wxurl=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx

#是否开启飞书告警通道,可同时开始多个通道0为关闭,1为开启
open-feishu=0
#默认飞书机器人地址
fsurl=https://open.feishu.cn/open-apis/bot/hook/xxxxxxxxx


#---------------------↓腾讯云接口-----------------------
#是否开启腾讯云短信告警通道,可同时开始多个通道0为关闭,1为开启
open-txdx=0
#腾讯云短信接口key
TXY_DX_appkey=xxxxx
#腾讯云短信模版ID 腾讯云短信模版配置可参考 prometheus告警:{1}
TXY_DX_tpl_id=xxxxx
#腾讯云短信sdk app id
TXY_DX_sdkappid=xxxxx
#腾讯云短信签名 根据自己审核通过的签名来填写
TXY_DX_sign=腾讯云

#是否开启腾讯云电话告警通道,可同时开始多个通道0为关闭,1为开启
TXY_DH_open-txdh=0
#腾讯云电话接口key
TXY_DH_phonecallappkey=xxxxx
#腾讯云电话模版ID
TXY_DH_phonecalltpl_id=xxxxx
#腾讯云电话sdk app id
TXY_DH_phonecallsdkappid=xxxxx

#---------------------↓华为云接口-----------------------
#是否开启华为云短信告警通道,可同时开始多个通道0为关闭,1为开启
open-hwdx=0
#华为云短信接口key
HWY_DX_APP_Key=xxxxxxxxxxxxxxxxxxxxxx
#华为云短信接口Secret
HWY_DX_APP_Secret=xxxxxxxxxxxxxxxxxxxxxx
#华为云APP接入地址(端口接口地址)
HWY_DX_APP_Url=https://rtcsms.cn-north-1.myhuaweicloud.com:10743
#华为云短信模板ID
HWY_DX_Templateid=xxxxxxxxxxxxxxxxxxxxxx
#华为云签名名称,必须是已审核通过的,与模板类型一致的签名名称,按照自己的实际签名填写
HWY_DX_Signature=华为云
#华为云签名通道号
HWY_DX_Sender=xxxxxxxxxx

#---------------------↓阿里云接口-----------------------
#是否开启阿里云短信告警通道,可同时开始多个通道0为关闭,1为开启
open-alydx=0
#阿里云短信主账号AccessKey的ID
ALY_DX_AccessKeyId=xxxxxxxxxxxxxxxxxxxxxx
#阿里云短信接口密钥
ALY_DX_AccessSecret=xxxxxxxxxxxxxxxxxxxxxx
#阿里云短信签名名称
ALY_DX_SignName=阿里云
#阿里云短信模板ID
ALY_DX_Template=xxxxxxxxxxxxxxxxxxxxxx

#是否开启阿里云电话告警通道,可同时开始多个通道0为关闭,1为开启
open-alydh=0
#阿里云电话主账号AccessKey的ID
ALY_DH_AccessKeyId=xxxxxxxxxxxxxxxxxxxxxx
#阿里云电话接口密钥
ALY_DH_AccessSecret=xxxxxxxxxxxxxxxxxxxxxx
#阿里云电话被叫显号,必须是已购买的号码
ALY_DX_CalledShowNumber=xxxxxxxxx
#阿里云电话文本转语音(TTS)模板ID
ALY_DH_TtsCode=xxxxxxxx

#---------------------↓容联云接口-----------------------
#是否开启容联云电话告警通道,可同时开始多个通道0为关闭,1为开启
RLY_DH_open-rlydh=0
#容联云基础接口地址
RLY_URL=https://app.cloopen.com:8883/2013-12-26/Accounts/
#容联云后台SID
RLY_ACCOUNT_SID=xxxxxxxxxxx
#容联云api-token
RLY_ACCOUNT_TOKEN=xxxxxxxxxx
#容联云app_id
RLY_APP_ID=xxxxxxxxxxxxx


# 启动PrometheusAlert并挂载配置文件
docker run --detach --restart always -d -p 8080:8080 -v /etc/localtime:/etc/localtime -v /etc/prometheusalert-center:/app/conf -v /data/PrometheusAlert-db:/app/db --name prometheusalert-center feiyu563/prometheus-alert:latest

# 如需要重建容器则使用下面命令
docker stop prometheusalert-center && docker rm prometheusalert-center

# 启动后可使用浏览器打开以下地址查看:http://127.0.0.1:8080

# 更新代码
docker pull feiyu563/prometheus-alert:latest
docker restart prometheusalert-center

复制

新增模板

 
进入PrometheusAlert后台管理,网页上方导航栏点击"AlertTemplate",
点击"添加模板",完善如下信息:
模板名称:n9e-qywx-webhook
模板类型:企业微信
模板用途:Other
模板内容:
> <font color="info">告警状态</font>:{{ if .status }}恢复{{ else }}告警{{ end }}
> <font color="info">告警级别</font>:P{{.priority}} 
> <font color="info">策略名称</font>:{{.sname}}
> <font color="info">endpoint</font>:{{.endpoint}}
{{range $v:=.detail}}
> <font color="info">metric</font>:{{$v.metric}}
> <font color="info">tags</font>:{{$v.tags}}
{{end}}
> <font color="info">当前值</font>:{{.value}}
> <font color="info">报警说明</font>:{{.info}}
> <font color="info">触发时间</font>:{{ .created | printf "%.19s" }}
> <font color="info">报警详情</font>:http://n9e.cqops.club:81/#/monitor/history/his/{{.id}}


消息协议JSON内容:
如果能找到相关日志可以粘贴在这里,用于模板测试.

最后保存模板

调用地址:
http://127.0.0.1:8080/prometheusalert?type=wx&tpl=n9e-qywx-webhook&wxurl=微信机器人地址
http://127.0.0.1:8080/prometheusalert?type=wx&tpl=n9e-qywx-webhook&wxurl=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx

复制

夜莺配置告警消息转发

 
进入夜莺后台,点击"报警策略",新增报警策略:
添加"通知我自己开发的系统(报警回调, 请确认是 IDC 内可访问的地址)" 此字段的值为prometheusalert的调用地址,标准内容填写如下:
127.0.0.1:8080/prometheusalert?type=wx&tpl=n9e-qywx-webhook&wxurl=https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx

IP地址视自身需求改动,如暴露给公网,这里就要填写公网IP,如之前启动指定了IP启动,这里就要填写prometheusalert的IP地址

复制

测试验证

 
设置阀值较低的情况进行触发:
如cpu.idle>=1

查看prometheusalert-center日志:
docker logs -f prometheusalert-center

相关通道收到告警消息:
xxx

复制


历史报警保留天数配置

文档完善中

刷新索引

文档完善中

服务启停时短时不告警配置

在 control 文件中,加入judge判断和redis处理

 
#!/bin/bash

# release version
version=2.7.2

processname=('n9e-monapi' 'n9e-transfer' 'n9e-tsdb'  'n9e-index' 'n9e-collector' 'n9e-judge')

CWD=$(cd $(dirname $0)/; pwd)
cd $CWD


function usage() {
	echo $"Usage: $0 [ {start|stop|restart|status|debug|start_all|stop_all|restart_all|status_all} ]"
	exit 0
}

# judge 启停操作前等待
function judge_wait() {
	# 与探针上报周期 interval 值保持一致
    declare -i interval=20
    declare -i counter=0
    declare -i max_counter=${interval}+5
    printf "注意: 为避免代码更新重新启动服务时产生 nodata 告警, 特设置探针采集上报周期(${interval}) + 5 秒的等待时间后再启动judge服务!"
    until [[  (( counter -ge max_counter )) ]];
    do
        printf "."
        counter+=1
        sleep 1
    done
    echo
}

REDIS_CLI=/bin/redis-cli
HOST='127.0.0.1'
PORT=6379
keyPrefix='/n9e/sender/*'
db_number='0'
Client="${REDIS_CLI} -h ${HOST} -p ${PORT} -n ${db_number}" 

# judge 启停操作前清理已存在redis的key
function clean_redis_judge_key() {
	# /bin/redis-cli -h 127.0.0.1 -p 6379 -n 0 keys /n9e/sender/*
	# /bin/redis-cli -h 127.0.0.1 -p 6379 -n 0 keys /n9e/sender/* | xargs /bin/redis-cli -h 127.0.0.1 -p 6379 -n 0 del
	${Client} keys "${keyPrefix}" | xargs ${Client} del
	${Client} keys "${keyPrefix}"
}

function get_processname_info() {
	server_name="$1"
	# ps -ef | grep n9e-tsdb | grep -v "grep" | awk '{print $2}'
	pid=`ps -ef | grep ${server_name} | grep -v "grep" | awk '{print $2}'`
	# ps -p 28817 -o lstart
	start_time=`ps -p ${pid} -o lstart`
	ProcNumber=`ps -ef | grep -w ${server_name} | grep -v "grep" | wc -l`
	# ps -ef | grep n9e-tsdb | grep -v "grep"
	procStatus=`ps -ef | grep ${server_name} | grep -v "grep" | awk '{print $8}'`
	echo "Info: ${server_name} | ${procStatus} | ${pid} | ${start_time}"
}

# 启动所有服务
function start_all() {
	
	# monapi http: 5800 
	systemctl start n9e-monapi
	echo "n9e-monapi start success."

	# transfer http: 5810; rpc: 5811
	systemctl start n9e-transfer
	echo "n9e-transfer start success."

	# tsdb http: 5820; rpc: 5821
	systemctl start n9e-tsdb
	echo "n9e-tsdb start success."

	# index http: 5830; rpc: 5831
	systemctl start n9e-index
	echo "n9e-index start success."

	# collector http: 2058
	systemctl start n9e-collector
	echo "n9e-collector start success."

	judge_wait
	clean_redis_judge_key

	# judge http: 5840; rpc: 5841
	systemctl start n9e-judge
	#echo "n9e-judge start success."

	# mail-sender
	#systemctl start mail-sender
	#echo "mail-sender start success."

	# sms-sender
	#systemctl start sms-sender
	#echo "sms-sender start success."

	# dingtalk-sender
	#systemctl start dingtalk-sender
	#echo "dingtalk-sender start success."

}

# 重启所有服务
function restart_all() {
	# monapi http: 5800 
	systemctl restart n9e-monapi
	echo "n9e-monapi restart success."

	# transfer http: 5810; rpc: 5811
	systemctl restart n9e-transfer
	echo "n9e-transfer restart success."

	# tsdb http: 5820; rpc: 5821
	systemctl restart n9e-tsdb
	echo "n9e-tsdb restart success."

	# index http: 5830; rpc: 5831
	systemctl restart n9e-index
	echo "n9e-index restart success."

	# collector http: 2058
	systemctl restart n9e-collector
	echo "n9e-collector restart success."

	judge_wait

	# judge http: 5840; rpc: 5841
	systemctl restart n9e-judge
	echo "n9e-judge restart success."

	# mail-sender
	#systemctl restart mail-sender
	#echo "mail-sender restart success."

	# sms-sender
	#systemctl restart sms-sender
	#echo "sms-sender restart success."

	# dingtalk-sender
	#systemctl restart dingtalk-sender
	#echo "dingtalk-sender restart success."

}

function start() {
	server_name="$1"
    systemctl start "${server_name}"
	echo "${server_name} start success."
}

function stop_all() {
	# monapi http: 5800 
	systemctl stop n9e-monapi
	echo "n9e-monapi stop success."

	# transfer http: 5810; rpc: 5811
	systemctl stop n9e-transfer
	echo "n9e-transfer stop success."

	# tsdb http: 5820; rpc: 5821
	systemctl stop n9e-tsdb
	echo "n9e-tsdb stop success."

	# index http: 5830; rpc: 5831
	systemctl stop n9e-index
	echo "n9e-index stop success."

	# collector http: 2058
	systemctl stop n9e-collector
	echo "n9e-collector stop success."

	judge_wait
	clean_redis_judge_key

	# judge http: 5840; rpc: 5841
	systemctl stop n9e-judge
	echo "n9e-judge stop success."

	# mail-sender
	#systemctl stop mail-sender
	echo "mail-sender stop success."

	# sms-sender
	#systemctl stop sms-sender
	#echo "sms-sender stop success."

	# dingtalk-sender
	#systemctl stop dingtalk-sender
	#echo "dingtalk-sender stop success."

}

function stop() {
    server_name="$1"
    systemctl stop "${server_name}"
	echo "${server_name} stop success."
}

function restart() {
    server_name="$1"
    systemctl restart "${server_name}"
	if [ ${server_name} = "n9e-judge" ]; then
		clean_redis_judge_key
	fi
	echo "${server_name} restart success."
}

function status() {
    server_name="$1"
    systemctl status "${server_name}"
}

function status_all() {
	systemctl status | grep -v "grep" | grep "n9e"
	if [ $? != 0 ]; then
		echo "n9e service is not running"
	fi
}

function reload() {
    server_name="$1"
    systemctl reload "${server_name}"
	echo "${server_name} reload success."
}

function debug() {
    server_name="$1"
    journalctl -u "${server_name}"
}

case "$1" in
	start)
		start $2
		;;
	stop)
		stop $2
		;;
	restart)
		restart $2
		;;
	status)
		status $2
		;;
	debug)
		debug $2
		;;
	start_all)
		start_all
		;;
	stop_all)
		stop_all
		;;
	restart_all)
		restart_all
		;;
	status_all)
		status_all
		;;
	*)
		usage
esac
  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值