hadoop
一、分布式大数据存储系统
组成
HDFS daemons are NameNode, SecondaryNameNode, and DataNode.
YARN daemons are ResourceManager, NodeManager, and WebAppProxy.
MapReduce then the MapReduce Job History Server will also be running.
开放端口
firewall-cmd --zone=public --add-port=8089/tcp --permanent
firewall-cmd --zone=public --add-port=8088/tcp --permanent
firewall-cmd --reload
二、支持三种部署模式
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
1、本地模式 Local (Standalone) Mode
java进程方式运行,命令行
>hadoop jar test.jar
2、伪分布式模式 Pseudo-Distributed Mode
Execution:
1)配置core-site.xml
$ vi $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2)配置hdfs-site.xml
$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3)ssh
$ ssh localhost
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
4)$HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_91
5)格式化文件系统
$ $HADOOP_HOME/bin/hdfs namenode -format
6)启动数据节点
$ $HADOOP_HOME/sbin/start-dfs.sh
7)访问名称节点
http://localhost:50070/
8)生成目录
$ $HADOOP_HOME/bin/hdfs dfs -mkdir /user
$ $HADOOP_HOME/bin/hdfs dfs -mkdir /user/<username>
9)复制本地文件到hadoop
$ bin/hdfs dfs -put /opt/hadoop /user
10)运行程序
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar grep /user/hadoop /user/output 'dfs[a-z.]+'
11)从hadoop导出文件到本地
$ bin/hdfs dfs -get output output
$ cat output/*
或
$ bin/hdfs dfs -cat output/*
12)停止系统
$ sbin/stop-dfs.sh
YARN :
1)vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
2)vi $HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
3) 启动
$ $HADOOP_HOME/sbin/start-yarn.sh
4)访问管理节点
http://localhost:8088/
5) 执行任务
6) 停止
$ $HADOOP_HOME/sbin/stop-yarn.sh
3、分布式集群模式 Fully-Distributed Mode
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html
/opt/hadoop-2.7.7/
Hadoop Startup
To start a Hadoop cluster you will need to start both the HDFS and YARN cluster.
The first time you bring up HDFS, it must be formatted. Format a new distributed filesystem as hdfs:
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
Start the HDFS NameNode with the following command on the designated node as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
Start a HDFS DataNode with the following command on each designated node as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes can be started with a utility script. As hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
Start the YARN with the following command, run on the designated ResourceManager as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
Run a script to start a NodeManager on each designated host as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR start nodemanager
Start a standalone WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start proxyserver
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be started with a utility script. As yarn:
[yarn]$ $HADOOP_PREFIX/sbin/start-yarn.sh
Start the MapReduce JobHistory Server with the following command, run on the designated server as mapred:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver
Hadoop Shutdown
Stop the NameNode with the following command, run on the designated NameNode as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
Run a script to stop a DataNode as hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes may be stopped with a utility script. As hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
Stop the ResourceManager with the following command, run on the designated ResourceManager as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
Run a script to stop a NodeManager on a slave as yarn:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR stop nodemanager
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be stopped with a utility script. As yarn:
[yarn]$ $HADOOP_PREFIX/sbin/stop-yarn.sh
Stop the WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop proxyserver
Stop the MapReduce JobHistory Server with the following command, run on the designated server as mapred:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver
Daemon Web Interface Notes
NameNode http://nn_host:port/ Default HTTP port is 50070.
ResourceManager http://rm_host:port/ Default HTTP port is 8088.
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.
三、常用命令行
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
#简介
hadoop 创建用户及hdfs权限,hdfs操作等常用shell命令
#参考
http://blog.csdn.net/swuteresa/article/details/13767169
添加一个hadoop组
sudo addgroup hadoop
将当前用户larry加入到hadoop组
sudo usermod -a -G hadoop larry
将hadoop组加入到sudoer
sudo gedit etc/sudoers
在root ALL=(ALL) ALL后 hadoop ALL=(ALL) ALL
修改hadoop目录的权限
sudo chown -R larry:hadoop /home/larry/hadoop<所有者:组 文件>
修改hdfs的权限
sudo chmod -R 755 /home/larry/hadoop
sudo bin/hadoop dfs -chmod -R 755 /
sudo bin/hadoop dfs -ls /
修改hdfs文件的所有者
sudo bin/hadoop fs -chown -R larry /
解除hadoop的安全模式
sudo bin/hadoop dfsadmin -safemode leave
拷贝本地文件到hdfs
hadoop fs -copyFromLocal <localsrc> URI
将路径指定文件的内容输出到stdout
hadoop fs -cat file:///file3 /user/hadoop/file4
改变文件的所属组
hadoop fs -chgrp [-R] GROUP URI
改变用户访问权限
hadoop fs -chmod [-R] 755 URI
修改文件的所有者
hadoop fs -chown [-R] [OWNER][:[GROUP]] URI [URI ]
拷贝hdfs文件到本地
hadoop fs -copyToLocal URI localdst
拷贝hdfs文件到其它目录
hadoop fs -cp URI [URI …] <dest>
显示目录中所有文件的大小
hadoop fs -du URI [URI …]
合并文件到本地目录
hadoop fs -getmerge <src> <localdst> [addnl]
#问题:Permission denied: user=root, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
#解决:切换成hdfs用户
#root用户登录在hadoop中创建文件夹
sudo -u hdfs hadoop fs -mkdir /nutch
#查看文件列表
hadoop dfs -ls /
core-site.xml
hdfs-site.xml
yarn-site.xml