- 创建hadoop用户组:sudo addgroup hadoop
- 创建hadoop用户:sudo adduser -ingroup hadoop hadoop
- 给hadoop用户添加权限,打开/etc/sudoers文件: sudo gedit /etc/sudoers
- 在root ALL=(ALL:ALL) ALL下添加hadoop ALL=(ALL:ALL) ALL,
- 在Ubuntu下安装JDK 7
- 建立ssh无密码登录本机
- 首先要转换成hadoop用户,执行以下命令:su - hadoop
- 创建ssh-key,,这里我们采用rsa方式ssh-keygen -t rsa -P ""
- 进入~/.ssh/目录下,将id_rsa.pub追加到authorized_keys授权文件中,开始是没有authorized_keys文件的:cd ~/.sshcat id_rsa.pub >> authorized_keys
- 登录localhost:ssh localhost
- 执行退出命令:exit安装hadoop
- 下载完成后解压
- 配置系统环境变量 $ su - # vi /etc/profile在最后加上以下几行
- export HADOOP_PREFIX="/opt/hadoop"
- PATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
- export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
- export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
- export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
- export YARN_HOME=${HADOOP_PREFIX}
- export HADOOP_CONF_DIR="${HADOOP_PREFIX}/etc/hadoop"
$source /etc/profile - $ cd /opt/hadoop/etc/hadoop/修改hadoop-env.sh修改JAVA_HOME,这里JAVA_HOME的路径必须指定为真实的路径,不能引用${JAVA_HOME},否则运行的时候会有错误JAVA_HOME is not set
- export JAVA_HOME=/opt/jdk
- 修改core-site.xml注:创建/tmp/hadoop/hadoop-hadoop 目录
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/tmp/hadoop/hadoop-hadoop</value>
- </property>
- </configuration>
- 修改hdfs-site.xml
其中,/home/hadoop/dfs/name,/home/hadoop/dfs/data都是文件系统中的目录,需要先新建
- <configuration>
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>file:/home/hadoop/dfs/name</value>
- <description>Determines where on the local filesystem the DFS name node
- should store the name table. If this is a comma-delimited list
- of directories then the name table is replicated in all of the
- directories, for redundancy. </description>
- <final>true</final>
- </property>
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>file:/home/hadoop/dfs/data</value>
- <description>Determines where on the local filesystem an DFS data node
- should store its blocks. If this is a comma-delimited
- list of directories, then data will be stored in all named
- directories, typically on different devices.
- Directories that do not exist are ignored.
- </description>
- <final>true</final>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- <property>
- <name>dfs.permissions</name>
- <value>false</value>
- </property>
- </configuration>
- 修改mapred-site.xml
- <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- <property>
- <name>mapred.system.dir</name>
- <value>file:/home/hadoop/mapred/system</value>
- <final>true</final>
- </property>
- <property>
- <name>mapred.local.dir</name>
- <value>file:/home/hadoop/mapred/local</value>
- <final>true</final>
- </property>
- </configuration>
- 修改yarn-site.xml
- <configuration>
- <!-- Site specific YARN configuration properties -->
- <property>
- <name>yarn.resourcemanager.resource-tracker.address</name>
- <value>localhost:8081</value>
- <description>host is the hostname of the resource manager and
- port is the port on which the NodeManagers contact the Resource Manager.
- </description>
- </property>
- <property>
- <name>yarn.resourcemanager.scheduler.address</name>
- <value>localhost:8082</value>
- <description>host is the hostname of the resourcemanager and port is the port
- on which the Applications in the cluster talk to the Resource Manager.
- </description>
- </property>
- <property>
- <name>yarn.resourcemanager.scheduler.class</name>
- <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
- <description>In case you do not want to use the default scheduler</description>
- </property>
- <property>
- <name>yarn.resourcemanager.address</name>
- <value>localhost:8083</value>
- <description>the host is the hostname of the ResourceManager and the port is the port on
- which the clients can talk to the Resource Manager. </description>
- </property>
- <property>
- <name>yarn.nodemanager.local-dirs</name>
- <value></value>
- <description>the local directories used by the nodemanager</description>
- </property>
- <property>
- <name>yarn.nodemanager.address</name>
- <value>0.0.0.0:port</value>
- <description>the nodemanagers bind to this port</description>
- </property>
- <property>
- <name>yarn.nodemanager.resource.memory-mb</name>
- <value>10240</value>
- <description>the amount of memory on the NodeManager in GB</description>
- </property>
- <property>
- <name>yarn.nodemanager.remote-app-log-dir</name>
- <value>/app-logs</value>
- <description>directory on hdfs where the application logs are moved to </description>
- </property>
- <property>
- <name>yarn.nodemanager.log-dirs</name>
- <value></value>
- <description>the directories used by Nodemanagers as log directories</description>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce.shuffle</value>
- <description>shuffle service that needs to be set for Map Reduce to run </description>
- </property>
- </configuration>
-
启动hdfs以及yarn
完成以上配置后可以检测是否配置成
首先格式化namenode
$ hdfs namenode -format
然后启动hdfs
$ start-dfs.sh
或者
$ hadoop-daemon.sh start namenode
$ hadoop-daemon.sh start datanode
接着启动yarn daemons
$ start-yarn.sh
或者
$ yarn-daemon.sh start resourcemanager
$ yarn-daemon.sh start nodemanager
启动完成后可以进入http://localhost:50070/dfshealth.jsp 查看dfs状态,
ubuntu server 12 上搭建 hadoop2.2 单机伪分布式
最新推荐文章于 2024-07-29 16:53:52 发布