hadoop集群安装部署之操作系统调优

停不下的脚步

于 2020-09-10 16:18:25 发布

阅读量563

点赞数

分类专栏： hadoop

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/mylittlered/article/details/108517153

版权

hadoop 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

部署hadoop之前对操作系统的修改：

1.disable 磁盘的access time 这个将显著提升磁盘IO：

https://www.cnblogs.com/sunss/archive/2010/09/09/1822300.html

2.对于非系统磁盘，设置不给系统盘保留磁盘空间：

#set space during file system creation

$mkfs.ext3 -m 0 /dev/sdb

#or tune the filesystem afterwards

$tune2fs -m 0 /dev/sdb

注意：只对数据盘操作，不能对os盘操作

3.increase process limits

# Set file handles higher (default is 1024)

$ echo hdfs - nofile 32768 >> /etc/security/limits.conf

$ echo mapred - nofile 32768 >> /etc/security/limits.conf

$ echo hbase - nofile 32768 >> /etc/security/limits.conf

# Set process limits higher

$ echo hdfs - nproc 32768 >> /etc/security/limits.conf

$ echo mapred - nproc 32768 >> /etc/security/limits.conf

$ echo hbase - nproc 32768 >> /etc/security/limits.conf

4.reduce swappiness

操作系统会将内存中不活跃的转移到磁盘中，也就是我们常说的swap，对于HDFS和HBASE来说这样会降低性能，所以我们调优让swap空间尽可能的少使用。

# Ad hoc setting, works temporarily

$ echo 1 > /proc/sys/vm/swappiness

# Persist setting across restarts

$ echo "vm.swappiness = 1" >> /etc/sysctl.conf

5.enable time synchronization

为了避免主机时间因为长期运行下所导致的时间偏差，进行时间同步（synchronize）的工作是非常必要的。Linux系统下，一般使用ntp服务器来同步不同机器的时间。

因为集群无法访问网络，所以需要在集群中设置一台ntp服务器。

同步所有集群节点上的时间对于zookeeper、kerberos、hbase这样的应用程序至关重要。当通过日志文件对集群进行故障排除时，在集群中使用同步时间也很重要。

https://blog.csdn.net/zyddj123/article/details/86560921

$ yum install ntpd

$ systemctl enable ntpd

$ systemctl start ntpd

6.enable advanced network settings

the maximum transmission unit (MTU), which is typically 1,500 bytes.

设置MTU值到最大的9000

# vi /etc/sysconfig/network-scripts/ifcfg-eth0

#增加如下内容

MTU="9000"

#保存后重启网卡生效

# service network restart

https://blog.csdn.net/scut845975092/article/details/50570949

7.关闭IPv6

因为hadoop集群在内网中，ipv4完全满足集群数量，关闭Ipv6可以避免双地址处理带来的开销

$ echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/systctl.conf

$ echo "net.ipv6.conf.default.disable_ipv6 = 1" >> /etc/systctl.conf

8.linux开启nscd服务器加速dns缓存。

linux本身是没有dns缓存的,想使用dns缓存的话需要自己安装一个服务程序NSCD(name service cache daemon).

https://developer.aliyun.com/article/516528

//todo 对于中小集群是否需要

9.disable Transparent Huge Pages(THP) 减少cpu的负载

https://blog.csdn.net/u010839779/article/details/78630323

vi /etc/rc.local

增加

# for hadoop , disable thp

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled

echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

10.关闭SELinux和Disable local firewalling

由于我们不需要IPv6，所以可以关闭SELinux和host-level firewall

$service iptables stop

$chkconfig iptables off

$service iptables status

关闭selinux

vi /etc/selinux/config

SELINUX=disabled

11.设置min.user.id 避免操作安全相关，看的不是很懂 page193/220

# Comma-separated list of users who cannot run applications

banned.users=

# Comma-separated list of allowed system users

allowed.system.users=

# Prevent other super users

min.user.id=1000

12.使用工具自动化配置每台节点，比如Ansible、puppet、chef

http://www.ansible.com.cn/docs/playbooks.html

似乎很麻烦

Python Fabric、Puppet MCollective, to address many nodes at once。

还有一种简单的工具：pdsh

https://www.cnblogs.com/liwanliangblog/p/9194146.html

停不下的脚步

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。