python脚本自动生成hadoop集群设置中的内存相关系数(Hadoop调优)

python脚本自动生成hadoop集群设置中的内存相关系数(Hadoop调优)

小编在搭建hadoop集群的过程中,发现集群进程总是莫名其妙的被杀死,通过研究发现是hadoop默认的每台节点分配的内存大小为8G,在学习过程中一般都使用虚拟机进行搭建,这样虚拟机分配的内存如果达不到要求就会被杀死,所以如何合理的配置节点参数就非常重要。

使用方式:

  • 1、安装python运行相关环境
  • 2、将代码复制到.py结尾的文件中并保存(以check.py为例)
  • 3、打开命令行,cd到创建文件所在的目录
  • 4、执行 python check.py -c 每台主机的处理器核心数 -m 每台主机的内存大小 -d 每台主机的磁盘数量 -k 是否安装了hbase(例如 python check.py -c 4 -m 4 -d 1 -k False)
  • 5、运行的结果如图所示:

  • 6、这样就可以修改hadoop集群中yarn-site.xml文件中的相关配置啦!!!

参数说明

  • yarn.scheduler.minimum-allocation-mb-可申请的最少内存资源,以MB为单位
  • yarn.scheduler.maximum-allocation-mb-可申请的最大内存资源,以MB为单位
  • yarn.nodemanager.resource.memory-mb-NM总的可用物理内存,以MB为单位。一旦设置,不可动态修改
  • mapreduce.map.memory.mb-每个map任务的物理内存量
  • mapreduce.map.java.opts-每个map任务的JVM参数
  • mapreduce.reduce.memory.mb-每个reduce任务的物理内存量
  • mapreduce.reduce.java.opts-每个reduce任务的JVM参数
  • yarn.app.mapreduce.am.resource.mb=2048-yarn分配的内存量
  • yarn.app.mapreduce.am.command-opts=-Xmx1638m-yarn的JVM参数
  • mapreduce.task.io.sort.mb=409-MapTask缓冲区所占内存大小
    代码如下:
#!/usr/bin/env python
import optparse
from pprint import pprint
import logging
import sys
import math
import ast

''' Reserved for OS + DN + NM,  Map: Memory => Reservation '''
reservedStack = {4: 1, 8: 2, 16: 2, 24: 4, 48: 6, 64: 8, 72: 8, 96: 12,
                 128: 24, 256: 32, 512: 64}
''' Reserved for HBase. Map: Memory => Reservation '''

reservedHBase = {4: 1, 8: 1, 16: 2, 24: 4, 48: 8, 64: 8, 72: 8, 96: 16,
                 128: 24, 256: 32, 512: 64}
GB = 1024


def getMinContainerSize(memory):
    if (memory <= 4):
        return 256
    elif (memory <= 8):
        return 512
    elif (memory <= 24):
        return 1024
    else:
        return 2048
    pass


def getReservedStackMemory(memory):
    if (reservedStack.has_key(memory)):
        return reservedStack[memory]
    if (memory <= 4):
        ret = 1
    elif (memory >= 512):
        ret = 64
    else:
        ret = 1
    return ret


def getReservedHBaseMem(memory):
    if (reservedHBase.has_key(memory)):
        return reservedHBase[memory]
    if (memory <= 4):
        ret = 1
    elif (memory >= 512):
        ret = 64
    else:
        ret = 2
    return ret


def main():
    log = logging.getLogger(__name__)
    out_hdlr = logging.StreamHandler(sys.stdout)
    out_hdlr.setFormatter(logging.Formatter(' %(message)s'))
    out_hdlr.setLevel(logging.INFO)
    log.addHandler(out_hdlr)
    log.setLevel(logging.INFO)
    parser = optparse.OptionParser()
    memory = 0
    cores = 0
    disks = 0
    hbaseEnabled = True
    parser.add_option('-c', '--cores', default=2,
                      help='Number of cores on each host')
    parser.add_option('-m', '--memory', default=2,
                      help='Amount of Memory on each host in GB')
    parser.add_option('-d', '--disks', default=1,
                      help='Number of disks on each host')
    parser.add_option('-k', '--hbase', default="False",
                      help='True if HBase is installed, False is not')
    (options, args) = parser.parse_args()

    cores = int(options.cores)
    memory = int(options.memory)
    disks = int(options.disks)
    hbaseEnabled = ast.literal_eval(options.hbase)

    log.info("Using cores=" + str(cores) + " memory=" + str(memory) + "GB" +
             " disks=" + str(disks) + " hbase=" + str(hbaseEnabled))
    minContainerSize = getMinContainerSize(memory)
    reservedStackMemory = getReservedStackMemory(memory)
    reservedHBaseMemory = 0
    if (hbaseEnabled):
        reservedHBaseMemory = getReservedHBaseMem(memory)
    reservedMem = reservedStackMemory + reservedHBaseMemory
    usableMem = memory - reservedMem
    memory -= (reservedMem)
    if (memory < 2):
        memory = 2
        reservedMem = max(0, memory - reservedMem)

    memory *= GB

    containers = int(min(2 * cores,
                         min(math.ceil(1.8 * float(disks)),
                             memory / minContainerSize)))
    if (containers <= 2):
        containers = 3

    log.info("Profile: cores=" + str(cores) + " memory=" + str(memory) + "MB"
             + " reserved=" + str(reservedMem) + "GB" + " usableMem="
             + str(usableMem) + "GB" + " disks=" + str(disks))

    container_ram = abs(memory / containers)
    if (container_ram > GB):
        container_ram = int(math.floor(container_ram / 512)) * 512
    log.info("Num Container=" + str(containers))
    log.info("Container Ram=" + str(container_ram) + "MB")
    log.info("Used Ram=" + str(int(containers * container_ram / float(GB))) + "GB")
    log.info("Unused Ram=" + str(reservedMem) + "GB")
    log.info("yarn.scheduler.minimum-allocation-mb=" + str(container_ram))
    log.info("yarn.scheduler.maximum-allocation-mb=" + str(containers * container_ram))
    log.info("yarn.nodemanager.resource.memory-mb=" + str(containers * container_ram))
    map_memory = container_ram
    reduce_memory = 2 * container_ram if (container_ram <= 2048) else container_ram
    am_memory = max(map_memory, reduce_memory)
    log.info("mapreduce.map.memory.mb=" + str(map_memory))
    log.info("mapreduce.map.java.opts=-Xmx" + str(int(0.8 * map_memory)) + "m")
    log.info("mapreduce.reduce.memory.mb=" + str(reduce_memory))
    log.info("mapreduce.reduce.java.opts=-Xmx" + str(int(0.8 * reduce_memory)) + "m")
    log.info("yarn.app.mapreduce.am.resource.mb=" + str(am_memory))
    log.info("yarn.app.mapreduce.am.command-opts=-Xmx" + str(int(0.8 * am_memory)) + "m")
    log.info("mapreduce.task.io.sort.mb=" + str(int(0.4 * map_memory)))
    pass


if __name__ == '__main__':
    try:
        main()
    except(KeyboardInterrupt, EOFError):
        print("\nAborting ... Keyboard Interrupt.")
        sys.exit(1)
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

果丶果

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值