(适用于hadoop 2.7及以上版本)
涉及到RESTful API
-
ResourceManager REST API’s:
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html -
WebHDFS REST API:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/WebHDFS.html -
MapReduce History Server REST API’s:
https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html -
Spark Monitoring and Instrumentation
http://spark.apache.org/docs/latest/monitoring.html
1. 统计HDFS文件系统实时使用情况
-
URL
http://emr-header-1:50070/webhdfs/v1/?user.name=hadoop&op=GETCONTENTSUMMARY -
返回结果:
{ "ContentSummary": { "directoryCount": 2, "fileCount" : 1, "length" : 24930, "quota" : -1, "spaceConsumed" : 24930, "spaceQuota" : -1 } }
-
关于返回结果的说明:
{ "name" : "ContentSummary", "properties": { "ContentSummary": { "type" : "object", "properties": { "directoryCount": { "description": "The number of directories.", "type" : "integer", "required" : true }, "fileCount": { "description": "The number of files.", "type" : "integer", "required" : true }, "length": { "description": "The number of bytes used by the content.", "type" : "integer", "required" : true }, "quota": { "description": "The namespace quota of this directory.", "type" : "integer", "required" : true }, "spaceConsumed": { "description": "The disk space consumed by the content.", "type" : "integer", "required" : true }, "spaceQuota": { "description": "The disk space quota.", "type" : "integer", "required" : true } } } } }
-
注意length与spaceConsumed的关系,跟hdfs副本数有关。
-
如果要统计各个组工作目录的使用情况,使用如下请求:
http://emr-header-1:50070/webhdfs/v1/user/feed_aliyun?user.name=hadoop&op=GETCONTENTSUMMARY