Alwayson 系统视图、常用sql、性能计数器、扩展事件

最新推荐文章于 2024-06-30 16:52:57 发布

Hehuyi_In

最新推荐文章于 2024-06-30 16:52:57 发布

阅读量2k

点赞数

分类专栏：监控 SQLServer alwayson

本文链接：https://blog.csdn.net/Hehuyi_In/article/details/116884739

版权

SQLServer 同时被 3 个专栏收录

142 篇文章 26 订阅

订阅专栏

监控

18 篇文章 0 订阅

订阅专栏

alwayson

13 篇文章 1 订阅

订阅专栏

一、系统视图

1. 系统目录视图（System catalog view）

由于存储alwayson的配置信息，一旦确定后如果不修改配置不会再变化。

System catalog view	Description
sys.availability_databases_cluster	查看本实例中的ag数据库信息 Contains one row for each availability database on the instance of SQL Server that is hosting an availability replica for any Always On availability group in the Windows Server Failover Clustering (WSFC) cluster, regardless of whether the local copy database has been joined to the availability group yet.
sys.availability_group_listener_ip_addresses	查看alwayson监听vip（注意不是wsfc的vip） Returns a row for every IP address that is associated with any Always On availability group listener in the Windows Server Failover Clustering (WSFC) cluster.
sys.availability_group_listeners	查看alwayson监听信息，跟前个视图有点类似，但是字段不完全相同 For each Always On availability group, returns either zero rows indicating that no network name is associated with the availability group, or returns a row for each availability-group listener configuration in the Windows Server Failover Clustering (WSFC) cluster.
sys.availability_groups	查看本实例中每个AlwaysOn 可用性组的元数据信息 Returns a row for each availability group for which the local instance of SQL Server hosts an availability replica. Each row contains a cached copy of the availability group metadata.
sys.availability_groups_cluster	查看 WSFC中每个 AlwaysOn 可用性组的元数据信息，该信息来自 WSFC。如果服务器中只有一个SqlServer实例，跟前个视图返回结果是一样的。 Returns a row for each Always On availability group in the Windows Server Failover Clustering (WSFC) . Each row contains the availability group metadata from the WSFC cluster.
sys.availability_read_only_routing_lists	查看本实例中的只读路由列表，没配的话结果为空 Returns a row for the read only routing list of each availability replica in an Always On availability group in the WSFC failover cluster.
sys.availability_replicas	查看AlwaysOn的可用性副本信息 Returns a row for each of the availability replicas that belong to any Always On availability group in the WSFC failover cluster.

部分视图返回结果截图

2. 系统动态管理视图

返回alwayson相关的动态信息，其中数据可能会不断变化。

System dynamic management view	Description
sys.dm_hadr_auto_page_repair	为每个自动修复的坏页返回一条记录 Returns a row for every automatic page-repair attempt on any availability database on an availability replica that is hosted for any availability group by the server instance.
sys.dm_hadr_availability_group_states	查看本机的可用性副本状态 Returns a row for each Always On availability group that possesses an availability replica on the local instance of SQL Server. Each row displays the states that define the health of a given availability group.
sys.dm_hadr_availability_replica_cluster_nodes 用处不大	查看可用性组的各个节点服务器名 Returns a row for every availability replica (regardless of join state) of the Always On availability groups in the Windows Server Failover Clustering (WSFC) cluster
sys.dm_hadr_availability_replica_cluster_states 用处不大	查看可用性组的各个节点的 join_state Returns a row for each Always On availability replica (regardless of its join state) of all Always On availability groups (regardless of replica location) in the Windows Server Failover Clustering (WSFC) cluster.
sys.dm_hadr_availability_replica_states	查看各副本实例状态，包括连接、recovery、同步情况，last error等 Returns a row for each local replica and a row for each remote replica in the same Always On availability group as a local replica. Each row contains information about the state of a given replica.
sys.dm_hadr_cluster	查看集群名及仲裁信息 Returns a row that exposes the cluster name and information about the quorum
sys.dm_hadr_cluster_members	查看集群节点及见证文件信息、投票情况 Returns a row for each of the members that constitute the quorum and the state of each of them
sys.dm_hadr_cluster_networks 用处不大	查看集群子网ip信息 Returns a row for every WSFC cluster member that is participating in an availability group's subnet configuration.
sys.dm_hadr_database_replica_cluster_states	查看每个副本中每个ag数据库的状态，包括pending，recovery_lsn，truncation lsn等 Returns a row containing information intended to provide you with insight into the health of the availability databases in the Always On availability groups in each Always On availability group on the Windows Server Failover Clustering (WSFC) cluster.
sys.dm_hadr_database_replica_states	最常用及最重要的视图。查看每个副本中每个ag数据库的状态，包括同步状态、健康情况、是否suspend及原因、各种lsn及时间、log队列大小及发送速率等 Returns a row for each database that is participating in an Always On availability group for which the local instance of SQL Server is hosting an availability replica.
sys.dm_hadr_instance_node_map 用处不大	查看可用性组的各个节点服务器名 For every instance of SQL Server that hosts an availability replica that is joined to its Always On availability group, returns the name of the Windows Server Failover Cluster (WSFC) node that hosts the server instance.
sys.dm_hadr_name_id_map 用处不大	查看ag名、ag id、资源id、wsfc组id Shows the mapping of Always On availability groups that the current instance of SQL Server has joined to three unique IDs: an availability group ID, a WSFC resource ID, and a WSFC Group ID.
sys.dm_tcp_listener_states	查看各类监听端口、类型和状态 Returns a row containing dynamic-state information for each TCP listener.

System dynamic management view

Description

sys.dm_hadr_auto_page_repair

为每个自动修复的坏页返回一条记录

Returns a row for every automatic page-repair attempt on any availability database on an availability replica that is hosted for any availability group by the server instance.

sys.dm_hadr_availability_group_states

查看本机的可用性副本状态

Returns a row for each Always On availability group that possesses an availability replica on the local instance of SQL Server. Each row displays the states that define the health of a given availability group.

sys.dm_hadr_availability_replica_cluster_nodes

用处不大

查看可用性组的各个节点服务器名

Returns a row for every availability replica (regardless of join state) of the Always On availability groups in the Windows Server Failover Clustering (WSFC) cluster

sys.dm_hadr_availability_replica_cluster_states

用处不大

查看可用性组的各个节点的 join_state

Returns a row for each Always On availability replica (regardless of its join state) of all Always On availability groups (regardless of replica location) in the Windows Server Failover Clustering (WSFC) cluster.

sys.dm_hadr_availability_replica_states

查看各副本实例状态，包括连接、recovery、同步情况，last error等

Returns a row for each local replica and a row for each remote replica in the same Always On availability group as a local replica. Each row contains information about the state of a given replica.

sys.dm_hadr_cluster

查看集群名及仲裁信息

Returns a row that exposes the cluster name and information about the quorum

sys.dm_hadr_cluster_members

查看集群节点及见证文件信息、投票情况

Returns a row for each of the members that constitute the quorum and the state of each of them

sys.dm_hadr_cluster_networks

用处不大

查看集群子网ip信息

Returns a row for every WSFC cluster member that is participating in an availability group's subnet configuration.

sys.dm_hadr_database_replica_cluster_states

查看每个副本中每个ag数据库的状态，包括pending，recovery_lsn，truncation lsn等

Returns a row containing information intended to provide you with insight into the health of the availability databases in the Always On availability groups in each Always On availability group on the Windows Server Failover Clustering (WSFC) cluster.

sys.dm_hadr_database_replica_states

最常用及最重要的视图。查看每个副本中每个ag数据库的状态，包括同步状态、健康情况、是否suspend及原因、各种lsn及时间、log队列大小及发送速率等

Returns a row for each database that is participating in an Always On availability group for which the local instance of SQL Server is hosting an availability replica.

sys.dm_hadr_instance_node_map

用处不大

查看可用性组的各个节点服务器名

For every instance of SQL Server that hosts an availability replica that is joined to its Always On availability group, returns the name of the Windows Server Failover Cluster (WSFC) node that hosts the server instance.

sys.dm_hadr_name_id_map

用处不大

查看ag名、ag id、资源id、wsfc组id

Shows the mapping of Always On availability groups that the current instance of SQL Server has joined to three unique IDs: an availability group ID, a WSFC resource ID, and a WSFC Group ID.

sys.dm_tcp_listener_states

查看各类监听端口、类型和状态

Returns a row containing dynamic-state information for each TCP listener.

dm_hadr_database_replica_states 查询截图

按不同层次再次归类上述视图

--可用性组所在Windows故障转移集群
SELECT * FROM sys.dm_hadr_cluster;
SELECT * FROM sys.dm_hadr_cluster_members ;
SELECT * FROM sys.dm_hadr_cluster_networks;
SELECT * FROM sys.dm_hadr_instance_node_map;
SELECT * FROM sys.dm_hadr_name_id_map

--可用性组
SELECT * FROM sys.availability_groups;
SELECT * FROM sys.availability_groups_cluster;
SELECT * FROM sys.dm_hadr_availability_group_states ;
SELECT * FROM sys.dm_hadr_automatic_seeding
SELECT * FROM sys.dm_hadr_physical_seeding_stats

--可用性副本
SELECT * FROM sys.availability_replicas;
SELECT * FROM sys.[availability_read_only_routing_lists]
SELECT * FROM sys.dm_hadr_availability_replica_cluster_nodes;
SELECT * FROM sys.[dm_hadr_availability_replica_cluster_states]
SELECT * FROM sys.[dm_hadr_availability_replica_states]

--可用性数据库
SELECT * FROM sys.availability_databases_cluster;
SELECT * FROM sys.dm_hadr_database_replica_cluster_states;
SELECT * FROM sys.[dm_hadr_auto_page_repair]
SELECT * FROM sys.[dm_hadr_database_replica_states]

--可用性组listener
SELECT * FROM sys.availability_group_listener_ip_addresses;
SELECT * FROM sys.availability_group_listeners;
SELECT * FROM sys.dm_tcp_listener_states;

3. 系统函数 System functions

System function	Description
sys.fn_hadr_is_primary_replica select sys.fn_hadr_is_primary_replica ( 'dbname' )	用于确定当前副本是否为主副本 Used to determine if the current replica is the primary replica.
sys.fn_hadr_backup_is_preferred_replica sys.fn_hadr_backup_is_preferred_replica ( 'dbname' )	用于确定当前副本是否为首选备份副本 Used to determine if the current replica is the preferred backup replica.
sys.fn_hadr_distributed_ag_replica select sys.fn_hadr_distributed_ag_replica( lag_Id, replica_id )	用于将分布式可用性组中的副本映射到本地可用性组 Used to map a replica in a distributed availability group to the local availability group.

System function

Description

sys.fn_hadr_is_primary_replica

select sys.fn_hadr_is_primary_replica ( 'dbname' )

用于确定当前副本是否为主副本

Used to determine if the current replica is the primary replica.

sys.fn_hadr_backup_is_preferred_replica
sys.fn_hadr_backup_is_preferred_replica ( 'dbname' )

用于确定当前副本是否为首选备份副本

Used to determine if the current replica is the preferred backup replica.

sys.fn_hadr_distributed_ag_replica

select sys.fn_hadr_distributed_ag_replica( lag_Id, replica_id )

用于将分布式可用性组中的副本映射到本地可用性组

Used to map a replica in a distributed availability group to the local availability group.

二、常用sql

查看是否已启用alwayson属性

Select serverproperty ('IsHadrEnabled')

查看主从lag延迟

--主库查：
select last_commit_time as 'primary_last_commit_tm' from sys.dm_hadr_database_replica_states where is_local=1;
select last_commit_time as 'standby_last_commit_tm' from sys.dm_hadr_database_replica_states where is_local=0;

-- primary：2017-03-21 15:31:37.930
-- standby：2017-03-21 15:31:38.377
-- delta：0.447 sec

-- 查看从库信息
select * from sys.dm_hadr_database_replica_states; 

-- 找到 replica_id
select * from sys.availability_replicas where replica_id=xxx;

查有多少日志没有经过网络传过去（transport lag）

SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server, dr_state.database_id as database_id,dr_state.log_send_queue_size, is_ag_replica_local = CASE
　　WHEN ar_state.is_local = 1 THEN N'LOCAL'
　　ELSE 'REMOTE'
　　END,
　　ag_replica_role = CASE
　　WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED'
　　ELSE ar_state.role_desc
　　END
　　FROM (( sys.availability_groups AS ag JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id )
　　JOIN sys.dm_hadr_availability_replica_states AS ar_state ON ar.replica_id = ar_state.replica_id)
　　JOIN sys.dm_hadr_database_replica_states dr_state on
　　ag.group_id = dr_state.group_id and dr_state.replica_id = ar_state.replica_id;

查已经传过去的日志，还有多少没有应用（apply lag）

SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server, dr_state.database_id as database_id,dr_state.redo_queue_size, is_ag_replica_local = CASE
　　WHEN ar_state.is_local = 1 THEN N'LOCAL'
　　ELSE 'REMOTE'
　　END,
　　ag_replica_role = CASE
　　WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED'
　　ELSE ar_state.role_desc
　　END
　　FROM (( sys.availability_groups AS ag JOIN sys.availability_replicas AS ar ON ag.group_id = ar.group_id )
　　JOIN sys.dm_hadr_availability_replica_states AS ar_state ON ar.replica_id = ar_state.replica_id)
　　JOIN sys.dm_hadr_database_replica_states dr_state on
　　ag.group_id = dr_state.group_id and dr_state.replica_id = ar_state.replica_id;

三、性能计数器

1. Availability Group Performance Counters

Perf Counter	AG Replica	SQL release	Explanation and Usage
Log Bytes Flushed/sec	Primary	SQL 2012+	主副本中每秒刷新的日志字节总数。它表示可在主副本中捕获并发送到辅助副本的日志量。 Total number of log bytes flushed per second in the primary replica. It represents the volume of logs that are available to be captured in primary replica and sent to secondary replica(s).
Log Pool LogWriter Pushes/sec	Primary	SQL 2016+	LogWriter模式捕获的日志字节总数。SQL Server 2016中引入了LogWriter模式，它直接从内存日志缓存而不是日志文件中捕获日志。启用LogWriter模式时，在日志刷新和日志捕获之间几乎没有延迟。 Total number of log bytes are captured by LogWriter mode. LogWriter mode was introduced in SQL Server 2016 and it directly captures logs from memory log cache instead of log file. When LogWriter mode is on, it means there is almost no lag between log flush and log capture.
Bytes Sent to Replica/sec	Primary	SQL 2012+	发送到单个辅助副本的内部消息队列的字节总数。内部 in-memory消息队列中包含捕获的和格式化的数据，这些数据将被发送到目标辅助副本。默认情况下，对于2012和2014中的同步和异步辅助副本，内部消息队列中的数据都被压缩。但对于2016，仅适用于异步辅助副本。启用压缩后，此性能计数器的数据量小于Log Bytes Flushed/sec 中的相应值。 Total number of bytes sent to the internal message Queue of a single secondary replica (for both replica and database level data). The internal in-memory message queue which holds the captured and formatted data that will be sent to the targeted secondary replica. By default, the data in the internal message queue is compressed for both sync and async secondary replicas in SQL12 and SQL14. But only for an async secondary replica from SQL16. When data compression is enabled, the data volume of this perf counter is less than the corresponding value in Log Bytes Flushed/sec.
Flow Control/sec	Primary	SQL 2012+	最近一秒启动的流量控制数量。AG具有内部节流门以控制数据流量，当辅助副本未确认的顺序消息数超过阈值时，数据流将暂停，直到从辅助副本接收到更多确认消息为止。 Number of flow control initiated in the last second. AG has internal throttling gates to control data flow. When the number of sequential messages that has not be acknowledged by secondary replica exceeds the gate value, data flow will be paused until receiving more acknowledge messages from secondary replica. More details can be found in "Flow Control Gates" section in https://msdn.microsoft.com/en-us/library/dn135338(v=sql.120).aspx with one correction: Database level gate values are 1792 for x64 and 256 for x86 environments.
Flow Control Time (ms/sec)	Primary	SQL 2012+	消息在最近一秒内等待流量控制的时间（ms） Time in milliseconds messages waited on flow control in the last second.
Bytes Sent to Transport/sec	Primary	SQL 2012+	可用性副本发送到传输（UCS）的总字节数。它表示从XmitQueue出队并发送到传输层的数据量。它的值应该非常接近 "Bytes Sent to Replica/sec" Total bytes sent to transport (UCS) for the availability replica. It represents the volume of data dequeued for XmitQueue and sent to transport layer. Its values should be very close to "Bytes Sent to Replica/sec" with a very slight lag.
Bytes Received from Replica/sec	Secondary	SQL 2012+	从主副本收到的总字节数。它表示在辅助副本中进行任何处理之前从传输（UCS）接收的数据量。它的值应非常接近主服务器上的 "Bytes Sent to Replica/sec"，主从副本之间的当前网络延迟会影响该计数器值。 Total bytes received from the primary replica. It represents the volume of data received from transport (UCS) before any processing in secondary replica. Its values should be very close to "Bytes Sent to Replica/sec" on primary, with a lag influenced by the current network latency between this primary-secondary replica pair.

In Perfmon, except "Log Bytes Flushed/sec" and "Log Pool LogWriter Pushes/sec" which is in SQLServer:Databases object, all other counters are in the SQLServer:Availability Replica object.

2. Network Performance Counters

Perf Counter	Description
Bytes Received/sec	显示通过每个网络适配器接收字节的速率，它是Network Interface\Bytes的子集。 Shows the rate at which bytes are received over each network adapter. The counted bytes include framing characters. Bytes Received/sec is a subset of Network Interface\Bytes Total/sec.
Bytes Sent/sec	显示通过每个网络适配器发送字节的速率，它也是Network Interface\Bytes的子集。 Shows the rate at which bytes are sent over each network adapter. The counted bytes include framing characters. Bytes Sent/sec is a subset of Network Interface\Bytes Total/sec.
Bytes Total/sec	显示在网络接口上发送和接收字节的速率，它等于 Network Interface\Bytes Received/sec + Network Interface\ Bytes Sent/sec. Shows the rate at which bytes are sent and received on the network interface, including framing characters. Bytes Total/sec is the sum of the values of Network Interface\Bytes Received/sec and Network Interface\ Bytes Sent/sec.
Current Bandwidth	当前网络带宽的估计值，以每秒比特数（BPS）为单位。对于带宽没有变化的接口或无法进行准确估算的接口，此值为标称带宽。 Shows an estimate of the current bandwidth of the network interface in bits per second (BPS). For interfaces that do not vary in bandwidth or for those where no accurate estimation can be made, this value is the nominal bandwidth.

当主机具有专用于SQL的网络时，主副本中网络 Bytes Sent/sec 应与AG Bytes Sent to Transport/sec 具有相同的趋势，并且前者的值稍大。辅助副本上的网络计数器 Bytes Received/sec 和AG计数器 Bytes Received from Replica/sec 之间类似。

请注意，Current Bandwidth值以BPS为单位，与 Bytes Total/sec 进行比较时，前者需要先除以8。调整单位后如果两者的值经常接近，通常说明网络带宽成为了性能瓶颈。

三、可用性组扩展事件

尽管性能计数器可以显示每个AG数据移动阶段的总体性能，但是当出现性能问题时，它们不足以找出哪个阶段较慢，因为AG中的数据流就像一个闭环系统，最慢的节点决定系统的最终吞吐量。在开始出现性能问题后尝试研究可用的性能计数器时，所有性能计数器值可能已经放慢了。

捕获扩展事件并通过公共字段将它们关联起来以跟踪整个数据流中的单个块日志移动将非常有用。它将提供详细信息来确定哪个阶段花费了最多的时间，并缩小了根本原因分析的范围。

Event Name	Replica	Description	Symptom Key & Value	Correlation Key(s)
log_flush_start	Primary	Occurs when asynchronous log write starts		log_block_id,database_id
log_flush_complete	Primary	Occurs when log write complete. It can happen after hadr_capture_log_block because of its asynchronous nature of writes.		log_block_id,database_id
hadr_log_block_compression	Primary	Occurs when log is ready to be captured and log compression is enabled. (If log compression is not enabled, this XEvent is not logged in SQL12 and SQL14, but it is always logged from SQL16. For SQL16, check Boolean value of property is_compressed.)		log_block_id,database_id
hadr_capture_log_block	Primary	Occurs right after primary has captured a log block.	mode=1	log_block_id, database_replica_id
hadr_capture_log_block	Primary	Occurs right before primary enqueues the captured log block to an internal message Queue of DbMgrPartner. A DbMgrPartner maps a database in the remote replica. No processing operation between mode 1 and 2.	mode=2	log_block_id,database_replica_id
hadr_capture_log_block	Primary	Occurs after dequeuing a log block from the internal message Queue of DbMgrPartner and before sending to transport (UCS)	mode=3	log_block_id,database_replica_id
hadr_capture_log_block	Primary	Occurs after the dequeued message reaches Replica layer and before sending to transport (UCS). Only message routing actions between mode 3 and 4.	mode=4	log_block_id,database_replica_id,availability_replica_id
hadr_transport_receive_log_block_message	Secondary	Occurs when receiving new log block message from transport (UCS)	mode=1	log_block_id,database_replica_id
hadr_transport_receive_log_block_message	Secondary	Occurs after enqueuing a new RouteMessageTask to DbMgrPartner for this new received log block. No process operations yet.	mode=2	log_block_id,database_replica_id
hadr_log_block_decompression	Secondary	Occurs after decompressing log block buffer. Boolean value of "is_compressed" means the incoming log block buffer is compressed or not.		log_block_id,database_id
log_flush_start	Secondary	Occurs when asynchronous log write starts		log_block_id,database_id
log_flush_complete	Secondary	Occurs when log write complete		log_block_id,database_id
hadr_apply_log_block	Secondary	Occurs right after log block is flushed and new redo target LSN is calculated in secondary		log_block_id,database_replica_id
hadr_send_harden_lsn_message	Secondary	Occurs when SyncLogProgressMsg with new hardened lsn is constructed and before pushing this message to the internal message Queue of the database's DbMgrPartner. At this point, the previous received log block has been flushed to disk, but the current log block may not.	mode=1	log_block_id (*Need to choose the immediate next log_block_id of this database),hadr_database_id
hadr_send_harden_lsn_message	Secondary	Occurs when SyncLogProgressMsg is dequeued from the internal message Queue and before sending to transport (UCS)	mode=2	log_block_id (*Need to choose the immediate next log_block_id of this database),hadr_database_id
hadr_send_harden_lsn_message	Secondary	Occurs after the dequeued SyncLogProgressMsg reaches Replica layer and before sending to transport (UCS). No processing operation between mode 2 and 3.	mode=3	log_block_id (*Need to choose the immediate next log_block_id of this database),hadr_database_id
hadr_receive_harden_lsn_message	Primary	Occurs when receiving new SyncLogProgressMsg which contains new hardened lsn of database in secondary replica from transport (UCS)	mode=1	log_block_id (*Need to choose the immediate next log_block_id of this database),database_replica_id
hadr_receive_harden_lsn_message	Primary	Occurs after enqueuing a new RouteSyncProgressMessageTask to DbMgrPartner for this new received SyncLogProgressMsg. No processing operations yet.	mode=2	log_block_id (*Need to choose the immediate next log_block_id of this database),database_replica_id
hadr_db_commit_mgr_harden	Primary	Occurs after minimal hardened lsn among primary replica and all synchronous-commit secondary replicas exceeds the lsn of the expected log block.		wait_log_block,database_id,ag_database_id
hadr_db_commit_mgr_harden_still_waiting	Primary	Occurs when a committed lsn in primary has not been notified for the hardening from all synchronous-commit secondary replicas for more than 2 seconds. This Xevent will be logged every 2 seconds until logging hadr_db_commit_mgr_harden.In normal cases, the harden wait time is in tens of millisecond range, and this XEvent is not logged.