Hbase 笔记(10) 集群监控

1、Context 监控实现:

GangliaContext  :                            推送至Ganglia

FileContext:                                      写入文件

TimeStampingFileContext:           写入文件,带时间戳

CompositeContext:                        多个实现

NullContext:                                     不监控

NullContextWithUpdateThread      不监控,启动聚合统计线程。


2、 HMaster 监控指标

cluster requests      集群请求数   

split time                   拆分预写日志的时间

split size                    拆分预写日志的大小


3、HRegionServer 监控指标

block cache          块缓存:     count, size, free, evicted      

compaction           合并:        size, tine, request size

memstore             内存缓存: size,  flush queue size, flush size, flush time

stores                     存储:         store files, stores, file index

I/O                             I/O:               fs read latency,      fs write latency,   fs sync latency

其他:                                            read request count,  write request count


4、RPC 监控

RPC Processing Time

RPC  Queue         Time


5. JVM 监控

Heap

GC

Thread

System event

 

6、Info监控

date   version  revision url  user hdfsDate  hdfsVersion  hdfsRevision  hdfsUrl  hdfsUser


7、Ganglia 结构

gmond   在所监控的每个节点上收集数据

gmetad  一个节点,从gmond 获取整个集群的数据

web页面 展示数据

安装完成后修改 hadoop-metrics.properties 或 hadoop-metrics2.properties


8. JMX 监控配置:

export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote.port=10101 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HADOOP_NAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote.port=10102 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HADOOP_DATANODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote.port=10103  -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HADOOP_SECONDARYNAMENODE_OPTS"
export HBASE_MASTER_OPTS="-Dcom.sun.management.jmxremote.port=11101 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HBASE_MASTER_OPTS"
export HBASE_REGIONSERVER_OPTS="-Dcom.sun.management.jmxremote.port=11102 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HBASE_REGIONSERVER_OPTS"
export HBASE_ZOOKEEPER_OPTS="-Dcom.sun.management.jmxremote.port=11103 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HBASE_ZOOKEEPER_OPTS"

export HBASE_THRIFT_OPTS="-Dcom.sun.management.jmxremote.port=11104 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HBASE_THRIFT_OPTS"


9. JVM监控:

ClassLoading:  LoadedClassCount,  TotalLoadedClassCount,    UnloadedClassCount   

Compilation: Name,  CompilationTimeMonitoringSupported,       TotalCompilationTime

GarbageCollecto -->  PS MarkSweep : Name,  CollectionCount,     CollectionTime,   LastGcInfo,   MemoryPoolNames,  Valid

GarbageCollecto -->  PS  Scavenge :     Name,  CollectionCount,     CollectionTime,   LastGcInfo,   MemoryPoolNames,  Valid

Memory: HeapMemoryUsage (init,  max, commit, used),     NonHeapMemoryUsage (init,  max, commit, used),   ObjectPendingFinalizationCount

MemoryManager -> CodeCacheManager:     Name,  MemoryPoolName

MemoryPool -> Code Cache:    Name,  Type,  UsageThresholdSupported, CollectionUsageUsageThresholdSupported , MemoryManagerNames, Usage(init,  max, commit, used),PeakUsage(init,  max, commit, used) , UsageThreshold   UsageThresholdCount, UsageThresholdExceeded ,CollectionUsage(init,  max, commit, used) , CollectionUsageThreshold, CollectionUsageThresholdCount ,CollectionUsageThresholdExceeded

MemoryPool -> PS Eden Space:   Name,  Type,  UsageThresholdSupported, CollectionUsageUsageThresholdSupported , MemoryManagerNames, Usage(init,  max, commit, used),PeakUsage(init,  max, commit, used) , UsageThreshold   UsageThresholdCount, UsageThresholdExceeded ,CollectionUsage(init,  max, commit, used) , CollectionUsageThreshold, CollectionUsageThresholdCount ,CollectionUsageThresholdExceeded

MemoryPool -> PS Servivor  Space:   Name,  Type,  UsageThresholdSupported, CollectionUsageUsageThresholdSupported , MemoryManagerNames, Usage(init,  max, commit, used),PeakUsage(init,  max, commit, used) , UsageThreshold   UsageThresholdCount, UsageThresholdExceeded ,CollectionUsage(init,  max, commit, used) , CollectionUsageThreshold, CollectionUsageThresholdCount ,CollectionUsageThresholdExceeded

MemoryPool -> PS Old Gen:  Name,  Type,  UsageThresholdSupported, CollectionUsageUsageThresholdSupported , MemoryManagerNames, Usage(init,  max, commit, used),PeakUsage(init,  max, commit, used) , UsageThreshold   UsageThresholdCount, UsageThresholdExceeded ,CollectionUsage(init,  max, commit, used) , CollectionUsageThreshold, CollectionUsageThresholdCount ,CollectionUsageThresholdExceeded

MemoryPool -> PS Pern Gen:  Name,  Type,  UsageThresholdSupported, CollectionUsageUsageThresholdSupported , MemoryManagerNames, Usage(init,  max, commit, used),PeakUsage(init,  max, commit, used) , UsageThreshold   UsageThresholdCount, UsageThresholdExceeded ,CollectionUsage(init,  max, commit, used) , CollectionUsageThreshold, CollectionUsageThresholdCount ,CollectionUsageThresholdExceeded 

OperatingSystem:  Name, Arch, AvailableProcessors, CommittedVirtualMemorySize, FreePhysicalMemorySize, FreeSwapSpaceSize, MaxFileDescriptorCount,OpenFileDescriptorCount,ProcessCpuLoad,ProcessCpuTime, SystemCpuLoad, SystemLoadAverage, TotalPhysicalMemorySize, TotalSwapSpaceSize, Version

Runtime:  Name, BootClassPathSupported, BootClassPath, ClassPath, InputArguments, LibraryPath, ManagementSpecVersion, SpecName,SpecVendor,SpecVersion, StartTime,SystemProperties,Uptime,VmName,VmVendor,VmVersion

Threading:  CurrentThreadCpuTimeSupported, AllThreadIds, CurrentThreadCpuTime, CurrentThreadUserTime, CurrentThreadUserTime, ,ObjectMonitorUsageSupported, PeakThreadCount, SynchronizerUsageSupported, ThreadAllocatedMemoryEnabled, ThreadAllocatedMemorySupported, ThreadContentionMonitoringEnabled, ThreadContentionMonitoringSupported, ThreadCount,  ThreadCpuTimeEnabled, ThreadCpuTimeSupported, TotalStartedThreadCount

java.io.BufferPool -> direct:        Name, TotalCapacity, Count, MemoryUsed

java.io.BufferPool -> mapped:    Name, TotalCapacity, Count, MemoryUsed


10. Hadoop 各个进程共有属性

JvmMetrics: GcCount, GcCountPS MarkSweep, GcCountPS Scavenge, GcTimeMillis,GcTimeMillisPS MarkSweep,  GcTimeMillisPS Scavenge, LogError,LogFatal,  LogInfo, LogWarn, MemHeapCommittedM, MemHeapUsedM,MemMaxM, MemNonHeapCommittedM, MemNonHeapUsedM, ThreadsBlocked, ThreadsNew, ThreadsRunnable, ThreadsTerminated, ThreadsTimedWaiting, ThreadsWaiting,  tag.Context, tag.Hostname, tag.ProcessName , tag.SessionId

MetricsSystemStats :DroppedPubAll, NumActiveSinks, NumActiveSources, NumAllSinks, NumAllSources, PublishAvgTime, PublishNumOps, SnapshotAvgTime, SnapshotNumOps, tag.Context,  tag.Hostname

StartupProgress: ElapsedTime, LoadingEditsCount, LoadingEditsElapsedTime, LoadingEditsPercentComplete, LoadingEditsTotal,  LoadingFsImageCount, LoadingFsImageElapsedTime, LoadingFsImagePercentComplete, LoadingFsImageTotal,PercentComplete, SafeModeCount, SafeModeElapsedTime, SafeModePercentComplete, SafeModeTotal, SavingCheckpointCount, SavingCheckpointElapsedTime, SavingCheckpointPercentComplete, SavingCheckpointTotal, tag.Hostname

UgiMetrics (User and group):  LoginFailureAvgTime, LoginFailureNumOps, LoginSuccessAvgTime, LoginSuccessNumOps, tag.Context, tag.Hostname


11.  NameNode 监控:

FSNamesystem: BlockCapacity, BlocksTotal, CapacityRemaining, CapacityTotal,CapacityUsed,CapacityUsedNonDFS,CorruptBlocks, ExcessBlocks, ExpiredHeartbeats, FilesTotal,LastCheckpointTime, LastWrittenTransactionId, MillisSinceLastLoadedEdits, MissingBlocks, PendingDataNodeMessageCount, PendingDeletionBlocks, PendingReplicationBlocks, PostponedMisreplicatedBlocks, ScheduledReplicationBlocks, Snapshots, SnapshottableDirectories, StaleDataNodes, TotalFiles, TotalLoad, TransactionsSinceLastCheckpoint, TransactionsSinceLastLogRoll, UnderReplicatedBlocks, tag.Context, tag.HAState, tag.Hostname

FSNamesystemState: BlocksTotal, CapacityRemaining, CapacityTotal, CapacityUsed, FSState, FilesTotal, NumDeadDataNodes, NumStaleDataNodes, ScheduledReplicationBlocks, TotalLoad, UnderReplicatedBlocks

NameNodeActivity: AddBlockOps, AllowSnapshotOps, BlockReportAvgTime, BlockReportNumOps, CreateFileOps, CreateSnapshotOps, CreateSymlinkOps, DeleteFileOps,  DeleteSnapshotOps, DisallowSnapshotOps, FileInfoOps, FilesAppended, FilesCreated, FilesDeleted, FilesInGetListingOps, FilesRenamed, FsImageLoadTime, GetAdditionalDatanodeOps, GetBlockLocations, GetLinkTargetOps, GetListingOps, ListSnapshottableDirOps, RenameSnapshotOps,SafeModeTime , SnapshotDiffReportOps, SyncsAvgTime, TransactionsAvgTime, TransactionsBatchedInSync, TransactionsNumOps, tag.Context, tag.Hostname, tag.ProcessName

NameNodeInfo:BlockPoolId, BlockPoolUsedSpace, ClusterId, DeadNodes, DecomNodes, DistinctVersionCount, DistinctVersions,Free,  JournalTransactionInfo, LiveNodes, NameDirStatuses, NonDfsUsedSpace, NumberOfMissingBlocks, PercentBlockPoolUsed, PercentRemaining, PercentUsed,Safemode, Threads, Total, TotalBlocks,TotalFiles, UpgradeFinalized, Used, Version 

RpcActivityForPort9000: CallQueueLength,NumOpenConnections, ReceivedBytes,RpcAuthenticationFailures, RpcAuthenticationSuccesses, RpcAuthorizationFailures, RpcAuthorizationSuccesses, RpcProcessingTimeAvgTime,RpcProcessingTimeNumOps,   RpcQueueTimeAvgTime, RpcQueueTimeNumOps, SentBytes, tag.Context, tag.Hostname, tag.port

RpcDetailedActivityForPort9000:AddBlockAvgTime,AddBlockNumOps, BlockReceivedAndDeletedAvgTime, BlockReceivedAndDeletedNumOps, BlockReportAvgTime, BlockReportNumOps, CommitBlockSynchronizationAvgTime, CommitBlockSynchronizationNumOps, CompleteAvgTime, CompleteNumOps, CreateAvgTime, CreateNumOps, DeleteAvgTime, DeleteNumOps, FsyncAvgTime, FsyncNumOps, GetBlockLocationsAvgTime,  GetBlockLocationsNumOps, GetEditLogManifestAvgTime, GetEditLogManifestNumOps, GetFileInfoAvgTime, GetFileInfoNumOps, GetListingAvgTime, GetListingNumOps,GetServerDefaultsAvgTime, GetServerDefaultsNumOps, GetTransactionIdAvgTime, GetTransactionIdNumOps,MkdirsAvgTime, MkdirsNumOps , RecoverLeaseAvgTime, RecoverLeaseNumOps, ,RegisterDatanodeAvgTime, RegisterDatanodeNumOps,  RenameAvgTime, RenameNumOps, RenewLeaseAvgTime,  RenewLeaseNumOps,  RollEditLogAvgTime, RollEditLogNumOps, SendHeartbeatAvgTime,SendHeartbeatNumOps, SetSafeModeAvgTime, SetSafeModeNumOps, SetTimesAvgTime, SetTimesNumOps,  UpdateBlockForPipelineAvgTime, UpdateBlockForPipelineNumOps, UpdatePipelineAvgTime, UpdatePipelineNumOps, VersionRequestAvgTime, VersionRequestNumOps, tag.Context, tag.Hostname, tag.port

JvmMetrics:

MetricsSystemStats :

StartupProgress

UgiMetrics (User and group)


12.  DataNode 监控:

DataNodeActivity:BlockChecksumOpAvgTime, BlockChecksumOpNumOps,BlockReportsAvgTime,BlockReportsNumOps,BlockVerificationFailures,BlocksGetLocalPathInfo, BlocksRead, BlocksRemoved, BlocksReplicated, BlocksVerified, BlocksWritten, BytesRead,BytesWritten, CopyBlockOpAvgTime,CopyBlockOpNumOps,FlushNanosAvgTime,FlushNanosNumOps,FsyncCount,  FsyncNanosAvgTime,  FsyncNanosNumOps,  PacketAckRoundTripTimeNanosAvgTime,   PacketAckRoundTripTimeNanosNumOps, ReadBlockOpAvgTime, ReadBlockOpNumOps

DataNodeInfo:ClusterId,HttpPort,NamenodeAddresses,RpcPort,Version,VolumeInfo,XceiverCount

FSDatasetState:Capacity,DfsUsed,NumFailedVolumes,Remaining,StorageInfo

RpcActivityForPort50020:CallQueueLength,NumOpenConnections, ReceivedBytes,RpcAuthenticationFailures, RpcAuthenticationSuccesses, RpcAuthorizationFailures, RpcAuthorizationSuccesses, RpcProcessingTimeAvgTime,RpcProcessingTimeNumOps,   RpcQueueTimeAvgTime,  RpcQueueTimeNumOps, SentBytes, tag.Context, tag.Hostname,  tag.port

RpcDetailedActivityForPort50020:tag.Context, tag.Hostname, tag.port

JvmMetrics

MetricsSystemStats :

StartupProgress: 

UgiMetrics (User and group): 


13.  SecondaryNameNode 监控:

JvmMetrics:

MetricsSystemStats :

StartupProgress: 

UgiMetrics (User and group):  


14.  HMaster 监控:

IPC:ProcessCallTime ,QueueCallTime ,authenticationFailures,authenticationSuccesses,authorizationFailures,authorizationSuccesses,numActiveHandler,numCallsInGeneralQueue,numCallsInPriorityQueue,numCallsInReplicationQueue,numOpenConnections,queueSize,receivedBytes,sentBytes,tag.Context,tag.Hostname

AssignmentManger:Assign ,BulkAssign ,ritCount,ritCountOverThreshold,ritOldestAge,tag.Context,tag.Hostname

Balancer:BalancerCluster ,miscInvocationCount,tag.Context,tag.Hostname

FileSystem:HlogSplitSize ,HlogSplitTime ,MetaHlogSplitSize ,MetaHlogSplitTime ,tag.Context,tag.Hostname

Server:averageLoad,clusterRequests,masterActiveTime,masterStartTime,numDeadRegionServers,numRegionServers,tag.Context,tag.Hostname,tag.clusterId,tag.deadRegionServers,tag.isActiveMaster,tag.liveRegionServers,tag.serverName,tag.zookeeperQuorum

JvmMetrics:

MetricsSystemStats :

StartupProgress: 

UgiMetrics (User and group):  


15.  HRegionServer 监控:

IPC:ProcessCallTime ,QueueCallTime ,authenticationFailures,authenticationSuccesses,authorizationFailures,authorizationSuccesses,numActiveHandler,numCallsInGeneralQueue,numCallsInPriorityQueue,numCallsInReplicationQueue,numOpenConnections,queueSize,receivedBytes,sentBytes,tag.Context,tag.Hostname

Regions:tablename_get(75th_percentile,    95th_percentile, 99th_percentile, max, mean, median, min, num_ops),  tablename_scanNext(75th_percentile,    95th_percentile, 99th_percentile, max, mean, median, min, num_ops),  coprocessorExecutionStatistics, region_appendCount,   region_compactionsCompletedCount,  region_deleteCount,  region_incrementCount,  region_memStoreSize,  region_mutateCount,  region_numBytesCompactedCount,  region_numFilesCompactedCount,  region_storeCount,  region_storeFileCount,  region_storeFileSize

Replication:tag.Contextt,tag.Hostname

Server:Append  ,Delete ,Get ,Increment ,Mutate ,Replay ,blockCacheCount,blockCacheEvictionCount,blockCacheExpressHitPercent,blockCacheFreeSize, blockCacheHitCount,blockCacheMissCount,blockCacheSize,blockCountHitPercent,checkMutateFailedCount,checkMutatePassedCount,compactedCellsCount,compactedCellsSize,compactionQueueLength,flushQueueLength,flushedCellsCount,flushedCellsSize,hlogFileCount,hlogFileSize,majorCompactedCellsCount,majorCompactedCellsSize,memStoreSize,mutationsWithoutWALCount,mutationsWithoutWALSize,percentFilesLocal,readRequestCount,regionCount,regionServerStartTime,slowAppendCount,slowDeleteCount,slowGetCount,slowIncrementCount,slowPutCount,staticBloomSize,staticIndexSize,storeCount,storeFileCount,storeFileIndexSize,storeFileSize,totalRequestCount,updatesBlockedTime,writeRequestCount,tag.Context,tag.Hostname,tag.clusterId, tag.serverName,tag.zookeeperQuorum

WAL:AppendSize ,AppendTime ,SyncTime ,appendCount,slowAppendCount,tag.Contextt,tag.Hostname

JvmMetrics:

MetricsSystemStats :

StartupProgress: 

UgiMetrics (User and group):  


16.  ZooKeeper 监控:

ReplicatedServer_id1:Name,QuorumSize

replica.0:Name,QuorumAddress

replica.1:Name,QuorumAddress

replica.2:Name,QuorumAddress

Leader:AvgRequestLatency,ClientPort,CurrentZxid,MaxClientCnxnsPerHost,MaxRequestLatency,MaxSessionTimeout,MinRequestLatency,MinRequestLatency, MinSessionTimeout,NumAliveConnections,OutstandingRequests,PacketsReceived,PacketsSent,StartTime,TickTime,Version

InMemoryDataTree:LastZxid,NodeCount,WatchCount

Connection:AvgLatency,EphemeralNodes,LastCxid,LastLatency,LastOperation,LastResponseTime,LastZxid,MaxLatency,MinLatency,OutstandingRequests, PacketsReceived,PacketsSent,SessionId,SessionTimeout,SourceIP,StartedTime


17. Thrift Server 监控:

ThriftOne:  BatchGet  ,  BatchMutate  ,  SlowThriftCall  ,  ThriftCall  , TimeInQueue  ,   callQueueLen,  tag.Hostname,  tag.Context

ThriftTwo::  同 ThriftOne

JvmMetrics

MetricsSystemStats : 

UgiMetrics (User and group):  



  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值