When setting up a Hadoop cluster, you'll need to designate one specific node as the master node. This server will typically host the NameNode and JobTraker
daemons. It'll also serve as the base station contacting and activating the DataNode and TaskTracker daemons on all of the slave nodes.
Hadoop uses passphraseless SSH for this purpose. SSH utilizes standard public key cryptography to create a pair of keys for user verification——one public,
one private. The public key is stored locally on every node in the cluster, and the master node sends the private key when attempting to access a remote machine. With both pieces of information, the target machine can validate the login attempt.
1. Define a common account
This access is from a user account on one node to another user account on the target machine. For Hadoop, the accounts should have the same username on
all of the nodes (we use hadoop-user in this book), and for security purpose we recommend it being a user-level account. This account is only for managing your
Hadoop cluster. Once the cluster daemons are up and running, you'll be able to run your actual MapReduce jobs from other accounts.
2. Verify SSH installation
$ which ssh
$ which sshd
$ which ssh-keygen
没有装的话,那就装个OpenSSH
3. Generate SSH key pair
Having verified that SSH is correctly installed on all nodes of the cluster, we use ssh-keygen on the master node to generate an RSA key pair. Be certain to
avoid entering a passphrase, or you'll have to manually enter that phrase every time the master node attempts to access another node.
$ ssh-keygen -t rsa
4. Distribute public key and validate logins
Albeit a bit tedious, you'll next need to copy the public key to every slave node as well as the master node:
[hadoop-user@master]$ scp ~/.ssh/id_rsa.pub hadoop-user@target:~/master_key
Manually log in to the target node and set the master key as an authorized key (or append to the list of authorized keys if you have others defined).
[hadoop-user@target]$ mkdir ~/.ssh
[hadoop-user@target]$ chmod 700 ~/.ssh
[hadoop-user@target]$ mv ~/master_key ~/.ssh/authorized_keys
[hadoop-user@target]$ chmod 600 ~/.ssh/authorized_keys
After generating the key, you can verify it’s correctly defined by attempting to log in to the target node from the master:
[hadoop-user@master]$ ssh target
The authenticity of host 'target (xxx.xxx.xxx.xxx)' can’t be established.
RSA key fingerprint is 72:31:d8:1b:11:36:43:52:56:11:77:a4:ec:82:03:1d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'target' (RSA) to the list of known hosts.
Last login: Sun Jan 4 15:32:22 2009 from master
After confirming the authenticity of a target node to the master node, you won’t be prompted upon subsequent login attempts.
[hadoop-user@master]$ ssh target
Last login: Sun Jan 4 15:32:49 2009 from master
We’ve now set the groundwork for running Hadoop on your own cluster.