Hadoop安装-常见问题

最新推荐文章于 2021-07-30 17:47:20 发布

cruiseisme

最新推荐文章于 2021-07-30 17:47:20 发布

阅读量612

点赞数

分类专栏：大数据云计算

本文链接：https://blog.csdn.net/neoscruise/article/details/9410659

版权

大数据同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

云计算

3 篇文章 0 订阅

订阅专栏

1. NameNode和JobTracker可以放在一起吗？

Typically one machine in the cluster is designated as the NameNode and another machine the as JobTracker , exclusively（官方建议是分开）。

The real kicker is going to be memory consumption of one or both of these
services. The NN in particular uses a large amount of RAM to store the
filesystem image. I think that those who are suggesting a breakeven point of
<= 10 nodes are lowballing. In practice, unless your machines are really
thin on the RAM (e.g., 2--4 GB), I haven't seen any cases where these
services need to be separated below the 20-node mark; I've also seen several
clusters of 40 nodes running fine with these services colocated. It depends
on how many files are in HDFS and how frequently you're submitting a lot of
concurrent jobs to MapReduce.

If you're setting up a production environment that you plan to expand,
however, as a best practice you should configure the master node to have two
hostnames (e.g., "nn" and "jt") so that you can have separate hostnames in
fs.default.name and mapred.job.tracker; when the day comes that these
services are placed on different nodes, you'll then be able to just move one
of the hostnames over and not need to reconfigure all 20--40 other nodes.

（实际上如果节点少，机器内存不小，可以放在一起）

2. 各个节点必须使用相同的用户名，才能互相登陆启动

3. 需要配置主节点到从节点的ssh免密码登陆，复制公钥时要为用户创建密码

4. dfs.name.dir 是存放hdfs name table的地方，默认是 :${hadoop.tmp.dir}