Sentinel
redis的sentinel系统用于管理多个redis服务器,该系统主要执行三个任务:监控、提醒、自动故障转移。
1、监控(Monitoring): Redis Sentinel实时监控主服务器和从服务器运行状态,并且实现自动切换。
2、提醒(Notification):当被监控的某个 Redis 服务器出现问题时, Redis Sentinel 可以向系统管理员发送通知, 也可以通过 API 向其他程序发送通知。
3、自动故障转移(Automatic failover): 当一个主服务器不能正常工作时,Redis Sentinel 可以将一个从服务器升级为主服务器, 并对其他从服务器进行配置,让它们使用新的主服务器。当应用程序连接Redis 服务器时, Redis Sentinel会告之新的主服务器地址和端口。
注意:在使用sentinel监控主从节点的时候,从节点需要是使用动态方式配置的,如果直接修改配置文件,后期sentinel实现故障转移的时候会出问题。
# Example sentinel.conf # port <sentinel-port> # The port that this sentinel instance will run on # sentinel实例运行的端口 port 26379 # sentinel announce-ip <ip> # sentinel announce-port <port> # # The above two configuration directives are useful in environments where, # because of NAT, Sentinel is reachable from outside via a non-local address. # # When announce-ip is provided, the Sentinel will claim the specified IP address # in HELLO messages used to gossip its presence, instead of auto-detecting the # local address as it usually does. # # Similarly when announce-port is provided and is valid and non-zero, Sentinel # will announce the specified TCP port. # # The two options don't need to be used together, if only announce-ip is # provided, the Sentinel will announce the specified IP and the server port # as specified by the "port" option. If only announce-port is provided, the # Sentinel will announce the auto-detected local IP and the specified port. # # Example: # # sentinel announce-ip 1.2.3.4 # dir <working-directory> # Every long running process should have a well-defined working directory. # For Redis Sentinel to chdir to /tmp at startup is the simplest thing # for the process to don't interferer with administrative tasks such as # unmounting filesystems. dir /tmp # sentinel monitor <master-name> <ip> <redis-port> <quorum> # master-name : master Redis Server名称 # ip : master Redis Server的IP地址 # redis-port : master Redis Server的端口号 # quorum : 主实例判断为失效至少需要 quorum 个 Sentinel 进程的同意,只要同意 Sentinel 的数量不达标,自动failover就不会执行 # # Tells Sentinel to monitor this master, and to consider it in O_DOWN # (Objectively Down) state only if at least <quorum> sentinels agree. # # Note that whatever is the ODOWN quorum, a Sentinel will require to # be elected by the majority of the known Sentinels in order to # start a failover, so no failover can be performed in minority. # # Slaves are auto-discovered, so you don't need to specify slaves in # any way. Sentinel itself will rewrite this configuration file adding # the slaves using additional configuration options. # Also note that the configuration file is rewritten when a # slave is promoted to master. # # Note: master name should not include special characters or spaces. # The valid charset is A-z 0-9 and the three characters ".-_". # sentinel monitor mymaster 127.0.0.1 6379 2 # sentinel auth-pass <master-name> <password> # # Set the password to use to authenticate with the master and slaves. # Useful if there is a password set in the Redis instances to monitor. # # Note that the master password is also used for slaves, so it is not # possible to set a different password in masters and slaves instances # if you want to be able to monitor these instances with Sentinel. # # However you can have Redis instances without the authentication enabled # mixed with Redis instances requiring the authentication (as long as the # password set is the same for all the instances requiring the password) as # the AUTH command will have no effect in Redis instances with authentication # switched off. # # Example: # # sentinel auth-pass mymaster MySUPER--secret-0123passw0rd # sentinel down-after-milliseconds <master-name> <milliseconds> # # Number of milliseconds the master (or any attached slave or sentinel) should # be unreachable (as in, not acceptable reply to PING, continuously, for the # specified period) in order to consider it in S_DOWN state (Subjectively # Down). # 选项指定了 Sentinel 认为Redis实例已经失效所需的毫秒数。当实例超过该时间没有返回PING,或者直接返回错误, 那么 Sentinel 将这个实例标记为主观下线(subjectively down,简称 SDOWN ) # # Default is 30 seconds. sentinel down-after-milliseconds mymaster 30000 # sentinel parallel-syncs <master-name> <numslaves> # # How many slaves we can reconfigure to point to the new slave simultaneously # during the failover. Use a low number if you use the slaves to serve query # to avoid that all the slaves will be unreachable at about the same # time while performing the synchronization with the master. # 选项指定了在执行故障转移时, 最多可以有多少个从Redis实例在同步新的主实例, 在从Redis实例较多的情况下这个数字越小,同步的时间越长,完成故障转移所需的时间就越长。 sentinel parallel-syncs mymaster 1 # sentinel failover-timeout <master-name> <milliseconds> # # Specifies the failover timeout in milliseconds. It is used in many ways: # # - The time needed to re-start a failover after a previous failover was # already tried against the same master by a given Sentinel, is two # times the failover timeout. # # - The time needed for a slave replicating to a wrong master according # to a Sentinel current configuration, to be forced to replicate # with the right master, is exactly the failover timeout (counting since # the moment a Sentinel detected the misconfiguration). # # - The time needed to cancel a failover that is already in progress but # did not produced any configuration change (SLAVEOF NO ONE yet not # acknowledged by the promoted slave). # # - The maximum time a failover in progress waits for all the slaves to be # reconfigured as slaves of the new master. However even after this time # the slaves will be reconfigured by the Sentinels anyway, but not with # the exact parallel-syncs progression as specified. # 如果在该时间(ms)内未能完成failover操作,则认为该failover失败 # # Default is 3 minutes. sentinel failover-timeout mymaster 180000 # SCRIPTS EXECUTION # # sentinel notification-script and sentinel reconfig-script are used in order # to configure scripts that are called to notify the system administrator # or to reconfigure clients after a failover. The scripts are executed # with the following rules for error handling: # # If script exits with "1" the execution is retried later (up to a maximum # number of times currently set to 10). # # If script exits with "2" (or an higher value) the script execution is # not retried. # # If script terminates because it receives a signal the behavior is the same # as exit code 1. # # A script has a maximum running time of 60 seconds. After this limit is # reached the script is terminated with a SIGKILL and the execution retried. # NOTIFICATION SCRIPT # # sentinel notification-script <master-name> <script-path> # # Call the specified notification script for any sentinel event that is # generated in the WARNING level (for instance -sdown, -odown, and so forth). # This script should notify the system administrator via email, SMS, or any # other messaging system, that there is something wrong with the monitored # Redis systems. # # The script is called with just two arguments: the first is the event type # and the second the event description. # # The script must exist and be executable in order for sentinel to start if # this option is provided. # 指定sentinel检测到该监控的redis实例指向的实例异常时,调用的报警脚本。该配置项可选,但是很常用。 # # Example: # # sentinel notification-script mymaster /var/redis/notify.sh # CLIENTS RECONFIGURATION SCRIPT # # sentinel client-reconfig-script <master-name> <script-path> # # When the master changed because of a failover a script can be called in # order to perform application-specific tasks to notify the clients that the # configuration has changed and the master is at a different address. # # The following arguments are passed to the script: # # <master-name> <role> <state> <from-ip> <from-port> <to-ip> <to-port> # # <state> is currently always "failover" # <role> is either "leader" or "observer" # # The arguments from-ip, from-port, to-ip, to-port are used to communicate # the old address of the master and the new address of the elected slave # (now a master). # # This script should be resistant to multiple invocations. # # Example: # # sentinel client-reconfig-script mymaster /var/redis/reconfig.sh
哨兵模式启动: