1. NameNode和JobTracker可以放在一起吗?
Typically one machine in the cluster is designated as the
NameNode
and another machine the as
JobTracker
, exclusively(官方建议是分开)。
The real kicker is going to be memory consumption of one or both of these services. The NN in particular uses a large amount of RAM to store the filesystem image. I think that those who are suggesting a breakeven point of <= 10 nodes are lowballing. In practice, unless your machines are really thin on the RAM (e.g., 2--4 GB), I haven't seen any cases where these services need to be separated below the 20-node mark; I've also seen several clusters of 40 nodes running fine with these services colocated. It depends on how many files are in HDFS and how frequently you're submitting a lot of concurrent jobs to MapReduce. If you're setting up a production environment that you plan to expand, however, as a best practice you should configure the master node to have two hostnames (e.g., "nn" and "jt") so that you can have separate hostnames in fs.default.name and mapred.job.tracker; when the day comes that these services are placed on different nodes, you'll then be able to just move one of the hostnames over and not need to reconfigure all 20--40 other nodes.(实际上如果节点少,机器内存不小,可以放在一起)
2. 各个节点必须使用相同的用户名,才能互相登陆启动
3. 需要配置主节点到从节点的ssh免密码登陆,复制公钥时要为用户创建密码
4.
dfs.name.dir
是存放hdfs name table的地方,默认是
:${hadoop.tmp.dir}