hadoop安装

最新推荐文章于 2021-11-03 18:36:12 发布

天枢dubhe

最新推荐文章于 2021-11-03 18:36:12 发布

阅读量178

点赞数

分类专栏：软件

本文链接：https://blog.csdn.net/qq_27048639/article/details/78968228

版权

软件专栏收录该内容

45 篇文章 0 订阅

订阅专栏

hadoop
一、分布式大数据存储系统
组成
HDFS daemons are NameNode, SecondaryNameNode, and DataNode.
YARN daemons are ResourceManager, NodeManager, and WebAppProxy.
MapReduce then the MapReduce Job History Server will also be running.
开放端口
firewall-cmd --zone=public --add-port=8089/tcp --permanent
firewall-cmd --zone=public --add-port=8088/tcp --permanent
firewall-cmd --reload
二、支持三种部署模式
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
1、本地模式 Local (Standalone) Mode
java进程方式运行，命令行
>hadoop jar test.jar
2、伪分布式模式 Pseudo-Distributed Mode
Execution：
1）配置core-site.xml
$ vi $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2）配置hdfs-site.xml
$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3）ssh
$ ssh localhost
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
4）$HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_91
5）格式化文件系统
$ $HADOOP_HOME/bin/hdfs namenode -format
6）启动数据节点
$ $HADOOP_HOME/sbin/start-dfs.sh
7）访问名称节点
http://localhost:50070/
8）生成目录
$ $HADOOP_HOME/bin/hdfs dfs -mkdir /user
$ $HADOOP_HOME/bin/hdfs dfs -mkdir /user/<username>
9）复制本地文件到hadoop
$ bin/hdfs dfs -put /opt/hadoop /user
10）运行程序
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep /user/hadoop /user/output 'dfs[a-z.]+'
11）从hadoop导出文件到本地
$ bin/hdfs dfs -get output output
$ cat output/*
或
$ bin/hdfs dfs -cat output/*
12）停止系统
$ sbin/stop-dfs.sh
YARN ：
1）vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
2）vi $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
3) 启动
$ $HADOOP_HOME/sbin/start-yarn.sh
4)访问管理节点
http://localhost:8088/
5) 执行任务
6) 停止
$ $HADOOP_HOME/sbin/stop-yarn.sh

3、分布式集群模式 Fully-Distributed Mode
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
/opt/hadoop-2.7.7/
Hadoop Startup
To start a Hadoop cluster you will need to start both the HDFS and YARN cluster.
The first time you bring up HDFS, it must be formatted. Format a new distributed filesystem as hdfs:
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>

Start the HDFS NameNode with the following command on the designated node as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode

Start a HDFS DataNode with the following command on each designated node as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode

If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes can be started with a utility script. As hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh

Start the YARN with the following command, run on the designated ResourceManager as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager

Run a script to start a NodeManager on each designated host as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR start nodemanager

Start a standalone WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start proxyserver

If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be started with a utility script. As yarn:
[yarn]$ $HADOOP_PREFIX/sbin/start-yarn.sh

Start the MapReduce JobHistory Server with the following command, run on the designated server as mapred:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver

Hadoop Shutdown
Stop the NameNode with the following command, run on the designated NameNode as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode

Run a script to stop a DataNode as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode

If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes may be stopped with a utility script. As hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh

Stop the ResourceManager with the following command, run on the designated ResourceManager as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager

Run a script to stop a NodeManager on a slave as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR stop nodemanager

If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be stopped with a utility script. As yarn:
[yarn]$ $HADOOP_PREFIX/sbin/stop-yarn.sh

Stop the WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop proxyserver

Stop the MapReduce JobHistory Server with the following command, run on the designated server as mapred:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver

Daemon   Web Interface   Notes
NameNode   http://nn_host:port/   Default HTTP port is 50070.
ResourceManager   http://rm_host:port/   Default HTTP port is 8088.
MapReduce JobHistory Server   http://jhs_host:port/   Default HTTP port is 19888.

三、常用命令行
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html

#简介
hadoop 创建用户及hdfs权限，hdfs操作等常用shell命令

#参考
http://blog.csdn.net/swuteresa/article/details/13767169

添加一个hadoop组
sudo addgroup hadoop
将当前用户larry加入到hadoop组
sudo usermod -a -G hadoop larry

将hadoop组加入到sudoer
sudo gedit etc/sudoers
在root ALL=(ALL) ALL后 hadoop ALL=(ALL) ALL

修改hadoop目录的权限
sudo chown -R larry:hadoop /home/larry/hadoop<所有者：组文件>

修改hdfs的权限
sudo chmod -R 755 /home/larry/hadoop
sudo bin/hadoop dfs -chmod -R 755 /
sudo bin/hadoop dfs -ls /

修改hdfs文件的所有者
sudo bin/hadoop fs -chown -R larry /

解除hadoop的安全模式
sudo bin/hadoop dfsadmin -safemode leave

拷贝本地文件到hdfs
hadoop fs -copyFromLocal <localsrc> URI

将路径指定文件的内容输出到stdout
hadoop fs -cat file:///file3 /user/hadoop/file4

改变文件的所属组
hadoop fs -chgrp [-R] GROUP URI

改变用户访问权限
hadoop fs -chmod [-R] 755 URI

修改文件的所有者
hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]

拷贝hdfs文件到本地
hadoop fs -copyToLocal URI localdst

拷贝hdfs文件到其它目录
hadoop fs -cp URI [URI …] <dest>

显示目录中所有文件的大小
hadoop fs -du URI [URI …]

合并文件到本地目录
hadoop fs -getmerge <src> <localdst> [addnl]

#问题：Permission denied: user=root, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
#解决：切换成hdfs用户
#root用户登录在hadoop中创建文件夹
sudo -u hdfs hadoop fs -mkdir /nutch
#查看文件列表
hadoop dfs -ls /

core-site.xml

hdfs-site.xml

yarn-site.xml

天枢dubhe

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
hadoop安装

hadoop一、分布式大数据存储系统组成 HDFS daemons are NameNode, SecondaryNameNode, and DataNode. YARN daemons are ResourceManager, NodeManager, and WebAppProxy. MapReduce then the MapReduce Job Histor...
复制链接

扫一扫