【Ambari】设置yarn队列资源为绝对值[memory=10240,vcores=12,yarn.io/gpu=4]

1.介绍

hadoop 3.0以后支持以绝对值方式指定yarn资源队列memory vcores及gpu cores的大小,2.x版本只能指定每个队列使用内存资源的百分比。

由于集群总容量可能会有所不同,因此资源配置值可使用百分比表示。
在capacity-scheduler.xml配置如下:
6:1:3的比例分配集群资源,engineering:60%,support:10%,30%

Property: yarn.scheduler.capacity.root.engineering.capacity
Value: 60

Property: yarn.scheduler.capacity.root.support.capacity
Value: 10

Property: yarn.scheduler.capacity.root.marketing.capacity
Value: 30

如果希望指定为每个队列指定资源绝对值,可如下设置:

Property: yarn.scheduler.capacity.root.engineering.capacity
Value: [memory=10240,vcores=12,yarn.io/gpu=4]

Property: yarn.scheduler.capacity.root.support.capacity
Value: [memory=10240,vcores=12,yarn.io/gpu=4]

Property: yarn.scheduler.capacity.root.marketing.capacity
Value: [memory=10240,vcores=12,yarn.io/gpu=4]

如果不提供内存或vcore值,则使用父队列的资源值。

如果想实现资源可弹:
将maximum capacity指定为浮点百分比值。必须将maximum capacity设置为高于或等于每个队列的absolute capacity。该值设置为-1,将maximum capacity设置为100%。在下面的示例中,engineering队列的最大容量设置为70%。

Property: yarn.scheduler.capacity.root.engineering.maximum-capacity
Value: 70

另外
capacity-scheduler.xml 配置资源计算方式

0 yarn.scheduler.capacity.resource-calculator description
1 org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator 默认值,在调度的时候只考虑内存
2 org.apache.hadoop.yarn.util.resource.DominantResourceCalculator 同时支持内存和cpu

2 修改资源队列 API步骤

2.1 获取capacity-scheduler配置的tag和version

GET /api/v1/clusters/{clusterName}?fields=Clusters/desired_configs/{configName}

curl -X GET \
  'http://10.211.55.20:8080/api/v1/clusters/ctest?fields=Clusters/desired_configs/capacity-scheduler' \
  -H 'X-Requested-By: ambari' \
  --user 'admin:admin'

响应:
{
   
  "href" : "http://10.211.55.20:8080/api/v1/clusters/ctest?fields=Clusters/desired_configs/capacity-scheduler",
  "Clusters" : {
   
    "cluster_name" : "ctest",
    "desired_configs" : {
   
      "capacity-scheduler" : {
   
        "tag" : "version1574727396161",
        "version" : 21
      }
    }
  }
}

2.2 获取capacity-scheduler指定版本的配置

GET /api/v1/clusters/{clusterName}/configurations?type={configName}& tag={tagVersion}

curl -X GET \
  'http://10.211.55.20:8080/api/v1/clusters/ctest/configurations?type=capacity-scheduler&tag=version1574727396161' 

响应:
{
   
  "href" : "http://10.211.55.20:8080/api/v1/clusters/ctest/configurations?type=capacity-scheduler&tag=version1574727396161",
  "items" : [
    {
   
      "href" : "http://10.211.55.20:8080/api/v1/clusters/ctest/configurations?type=capacity-scheduler&tag=version1574727396161",
      "tag" : "version1574727396161",
      "type" : "capacity-scheduler",
      "version" : 21,
      "Config" : {
   
        "cluster_name" : "ctest",
        "stack_id" : "HDP-3.1"
      },
      "properties" : {
   
        "yarn.scheduler.capacity.maximum-am-resource-percent" : "0.2",
        "yarn.scheduler.capacity.maximum-applications" : "10000",
        "yarn.scheduler.capacity.node-locality-delay" : "40",
        "yarn.scheduler.capacity.queue-mappings-override.enable" : "false",
        "yarn.scheduler.capacity.resource-calculator" : "org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator",
        "yarn.scheduler.capacity.root.accessible-node-labels" : "*",
        "yarn.scheduler.capacity.root.acl_administer_queue" : "*",
        "yarn.scheduler.capacity.root.acl_submit_applications" : "*",
        "yarn.scheduler.capacity.root.capacity" : "100",
        "yarn.scheduler.capacity.root.default.acl_submit_applications" : "*",
        "yarn.scheduler.capacity.root.default.capacity" : "[memory=1024,vcores=1]",
        "yarn.scheduler.capacity.root.default.maximum-capacity" : "[memory=2048,vcores=2]",
        "yarn.scheduler.capacity.root.default.priority" : "0",
        "yarn.scheduler.capacity.root.default.state" : "RUNNING",
        "yarn.scheduler.capacity.root.default.user-limit-factor" : "1",
        "yarn.scheduler.capacity.root.bj.acl_administer_queue" : "*",
        "yarn.scheduler.capacity.root.bj.acl_submit_applications" : "*",
        "yarn.scheduler.capacity.root.bj.capacity" : "[memory=1024,vcores=1]",
        "yarn.scheduler.capacity.root.bj.maximum-capacity" : "[memory=2048,vcores=2]",
        "yarn.scheduler.capacity.root.bj.minimum-user-limit-percent" : "100",
        "yarn.scheduler.capacity.root.bj.ordering-policy" : "fifo",
        "yarn.scheduler.capacity.root.bj.priority" : "0",
        "yarn.scheduler.capacity.root.bj.state" : "RUNNING",
        "yarn.scheduler.capacity.root.bj.user-limit-factor" : "1",
        "yarn.scheduler.capacity.root.hb.acl_administer_queue" : "*",
        "yarn.scheduler.capacity.root.hb.acl_submit_applications" : "*",
        "yarn.scheduler.capacity.root.hb.capacity" : "[memory=1024,vcores=1]",
        "yarn.scheduler.capacity.root.hb.maximum-capacity" : "[memory=2048,vcores=2]",
        "yarn.scheduler.capacity.root.hb.minimum-user-limit-percent" : "100",
        "yarn.scheduler.capacity.root.hb.ordering-policy" : "fifo",
        "yarn.scheduler.capacity.root.hb.priority" : "0",
        "yarn.scheduler.capacity.root.hb.state" : "RUNNING",
        "yarn.scheduler.capacity.root.hb.user-limit-factor" : "1",
        "yarn.scheduler.capacity.root.priority" : "0",
        "yarn.scheduler.capacity.root.queues" : "default,bj,hb"
      }
    }
  ]
}

2.3 通过Cluster Metrics API获取yarn可分配资源

即集群有多少内存、vcore可供分配

GET /ws/v1/cluster/metrics

curl -X GET \
  http://10.211.55.20:8088/ws/v1/cluster/metrics \
  -H 'X-Requested-By: ambari'

响应:

{
   
    "clusterMetrics": {
   
        "appsSubmitted": 1,
        "appsCompleted": 0,
        "appsPending": 0,
        "appsRunning": 0,
        "appsFailed": 0,
        "appsKilled": 1,
        "reservedMB": 0,
        "availableMB": 15360,
        "allocatedMB": 0,
        "reservedVirtualCores": 0,
        "availableVirtualCores": 6,
        "allocatedVirtualCores": 0,
        "containersAllocated": 0,
        "containersReserved": 0,
        "containersPending": 0,
        "totalMB": 15360,
        "totalVirtualCores": 6,
        "totalNodes": 1,
        "lostNodes": 0,
        "unhealthyNodes": 0,
        "decommissioningNodes": 0,
        "decommissionedNodes": 0,
        "rebootedNodes": 0,
        "activeNodes": 1,
        "shutdownNodes": 0,
        "totalUsedResourcesAcrossPartition": {
   
            "memory": 0,
            "vCores": 0,
            "resourceInformations": {
   
                "resourceInformation": [
                    {
   
                        "maximumAllocation": 9223372036854775807,
                        "minimumAllocation": 0,
                        "name": "memory-mb",
                        "resourceType": "COUNTABLE",
                        "units": "Mi",
                        "value": 0
                    },
                    {
   
                        "maximumAllocation": 9223372036854775807,
                        "minimumAllocation": 0,
                        "name": "vcores",
                        "resourceType": "COUNTABLE",
                        "units": "",
                        "value": 0
                    }
                ]
            }
        },
        "totalClusterResourcesAcrossPartition": {
   
            "memory": 15360,
            "vCores": 6,
            "resourceInformations": {
   
                "resourceInformation": [
                    {
   
                        "maximumAllocation": 9223372036854775807,
                        "minimumAllocation": 0,
                        "name": "memory-mb",
                        "resourceType": "COUNTABLE",
                        "units": "Mi",
                        "value": 15360
                    },
                    {
   
                        "maximumAllocation": 9223372036854775807,
                        "minimumAllocation": 0,
                        "name": "vcores",
                        "resourceType": "COUNTABLE",
                        "units": "",
                        "value": 6
                    }
                ]
            }
        }
    }
}


从响应中我们要获取的关键信息:

  • 内存:总量、已使用量、可使用量等等
  • vcores: 总量、已使用量、可使用量等等

2.4 修改配置

特别提示:

  • 请求体是text/plain不是json
  • tag不能存在,最好用时间戳date.getTime()
    如"tag": “version1574670680695”
  • 在资源分配的时候,所有子队列的资源总和要等于父队列的资源,所以在资源分配的时候,可以考虑先将所有资源分配给default队列,将他做为一个资源池,然后添加队列就从default队列中分出一些资源,删除队列就把相应资源再放回到default队列中。(这是一个队列资源管理的思路吧)

PUT /api/v1/clusters/{clusterName}

curl -X PUT \
  http://10.211.55.20:8080/api/v1/clusters/ctest \
  -H 'X-Requested-By: ambari' \
  -d '{
    "Clusters": {
        "desired_config": {
            "tag": "version1574670680695",
            "type": "capacity-scheduler",
            "properties": {
                "yarn.scheduler.capacity.maximum-am-resource-percent": "0.2",
                "yarn.scheduler.capacity.maximum-applications": "10000",
                "yarn.scheduler.capacity.node-locality-delay": "40",
                "yarn.scheduler.capacity.queue-mappings-override.enable": "false",
                "yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator",
                "yarn.scheduler.capacity.root.accessible-node-labels": "*",
                "yarn.scheduler.capacity.root.acl_administer_queue": "*",
                "yarn.scheduler.capacity.root.acl_submit_applications": "*",
                "yarn.scheduler.capacity.root.capacity": "[memory=10240,vcores=10]",
                "yarn.scheduler.capacity.root.default.acl_submit_applications": "*",
                "yarn.scheduler.capacity.root.default.capacity": "[memory=1024,vcores=1]",
                "yarn.scheduler.capacity.root.default.maximum-capacity": "[memory=2048,vcores=2]",
                "yarn.scheduler.capacity.root.default.priority": "0",
                "yarn.scheduler.capacity.root.default.state": "RUNNING",
                "yarn.scheduler.capacity.root.default.user-limit-factor": "1",
                "yarn.scheduler.capacity.root.bj.acl_administer_queue": "*",
                "yarn.scheduler.capacity.root.bj.acl_submit_applications": "*",
                "yarn.scheduler.capacity.root.bj.capacity": "[memory=1024,vcores=1]",
                "yarn.scheduler.capacity.root.bj.maximum-capacity": "[memory=2048,vcores=2]",
                "yarn.scheduler.capacity.root.bj.minimum-user-limit-percent": "100",
                "yarn.scheduler.capacity.root.bj.ordering-policy": "fifo",
                "yarn.scheduler.capacity.root.bj.priority": "0",
                "yarn.scheduler.capacity.root.bj.state": "RUNNING",
                "yarn.scheduler.capacity.root.bj.user-limit-factor": "1",
                "yarn.scheduler.capacity.root.hb.acl_administer_queue": "*",
                "yarn.scheduler.capacity.root.hb.acl_submit_applications": "*",
                "yarn.scheduler.capacity.root.hb.capacity": "[memory=1024,vcores=1]",
                "yarn.scheduler.capacity.root.hb.maximum-capacity": "[memory=2048,vcores=2]",
                "yarn.scheduler.capacity.root.hb.minimum-user-limit-percent": "100",
                "yarn.scheduler.capacity.root.hb.ordering-policy": "fifo",
                "yarn.scheduler.capacity.root.hb.priority": "0",
                "yarn.scheduler.capacity.root.hb.state": "RUNNING",
                "yarn.scheduler.capacity.root.hb.user-limit-factor": "1",
                "yarn.scheduler.capacity.root.priority": "0",
                "yarn.scheduler.capacity.root.queues": "default,bj,hb"
            }
        }
    }
}'

2.5 刷新队列配置

特别提示: 请求体是text/plain不是json

POST /api/v1/clusters/{clusterName}/requests

curl -X POST \
  http://10.211.55.20:8080/api/v1/clusters/ctest/requests \
  -H 'X-Requested-By: ambari' \
  -d '{
    "Requests/resource_filters": [
        {
            "service_name": "YARN",
            "hosts": "host-10-211-55-20",
            "component_name": "RESOURCEMANAGER"
        }
    ],
    "RequestInfo": {
        "parameters/forceRefreshConfigTags": "capacity-scheduler",
        "context": "Refresh YARN Capacity Scheduler",
        "command": "REFRESHQUEUES"
    }
}'

2.6 获取scheduler Info,查看各队列使用情况

调度资源包含有关集群中配置的当前调度的信息。 它目前支持Fifo和Capacity Scheduler。 根据配置哪个调度程序,您将获得不同的信息,因此请务必查看类型信息。

capacity scheduler 支持分层队列。 这个请求将打印有关所有队列及其所有子队列的信息。
以capacityScheduler为例:

GET /ws/v1/cluster/scheduler

curl -X GET \
  http://10.211.55.20:8088/ws/v1/cluster/scheduler

响应信息:(响应太长,queue有删减只留了一个default)
{
   
    "scheduler": {
   
        "schedulerInfo": {
   
            "type": "capacityScheduler",
            "capacity": 100.0,
            "usedCapacity": 33.333336,
            "maxCapacity": 100.0,
            "queueName": "root",
            "queues": {
   
                "queue": [
                    {
   
                        "type": "capacitySchedulerLeafQueueInfo",
                        "capacity": 59.98047,
                        "usedCapacity": 55.573647,
                        "maxCapacity": 59.98047,
                        "absoluteCapacity": 59.98047,
                        "absoluteMaxCapacity": 59.98047,
                        "absoluteUsedCapacity": 33.333336,
                        "numApplications": 14,
                        "queueName": "default",
                        "state": "RUNNING",
                        "resourcesUsed": {
   
                            "memory": 5120,
                            "vCores": 2,
                            "resourceInformations": {
   
                                "resourceInformation": [
                                    {
   
                                        "maximumAllocation": 9223372036854775807,
                                        "minimumAllocation": 0,
                                        "name": "memory-mb",
                   
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值