以超级用户身份,安装sun-jdk
选定安装目录/usr/lib/jvm,并把安装文件jdk-6u21-linux-i586.bin放入
给予安装文件权限
chmod 777 jdk-6u21-linux-i586.bin
安装
./jdk-6u21-linux-i586.bin
配置环境变量,打开文件/etc/profile,在行“umask 022”前加入以下三行:
export JAVA_HOME=/usr/lib/jvm/jdk1.6.0_21
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
重新加载文件/etc/profile
source /etc/profile
验证
java -version
选定安装目录/usr/local,并把安装文件hadoop-0.20.2.tar.gz放入
以hjwang1身份,增加hadoop的组及同名用户
sudo addgroup hadoop
sudo adduser --ingroup hadoop hadoop
以超级用户身份,为hadoop添加sudo权限
chmod u+w /etc/sudoers
在“root ALL=(ALL) ALL”后面添加一行"hadoop ALL=(ALL) ALL"
chmod u-w /etc/sudoers
安装SSH
以hadoop身份,生成SSH-RSA-key(用空密码)
ssh-keygen -t rsa -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
sudo /etc/init.d/ssh reload
ssh localhost
以hadoop身份,
sudo tar xzf hadoop-0.20.2.tar.gz
sudo chown -R hadoop:hadoop hadoop-0.20.2
修改jdk环境变量,在文件conf/hadoop-env.sh
export JAVA_HOEM=/usr/lib/jvm/jdk1.6.0_21
(HADOOP_OPTS=-Djava.net.preferIPv4Stack=true)决定端口是监听在IPv4,默认在IPv6(在64位系统下)
伪分布部署:
修改配置文件conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
修改配置文件conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
修改配置文件conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
格式化namenode,在HADOOP_HOME下:
./bin/hadoop namenode -format
启动hadoop,在HADOOP_HOME下:
./bin/start-all.sh
验证启动是否成功,在HADOOP_HOME下:
方法一:
jps
方法二:
./bin/hadoop dfsadmin -report
例子:
在/home/hadoop下建文件test_file1,内容如下:
Wang Hongjun good
在/home/hadoop下建文件test_file2,内容如下:
Wang Hongjun best Wang Hongjun
在分布式文件系统里建立目录test-in,命令如下:
./bin/hadoop dfs -mkdir test-in
把文件test_file1、test_file2复制到目录test-in里,命令如下:
./bin/hadoop dfs -copyFromLocal /home/hadoop/test_* test-in
可以查看目录test-in的存储情况:命令如下:
./bin/hadoop dfs -ls test-in
运行统计词频命令:
./bin/hadoop jar hadoop-0.20.2-examples.jar wordcount test-in test-out
可以查看目录test-out的存储情况:命令如下:
./bin/hadoop dfs -ls test-out
查看统计结果,命令如下:
./bin/hadoop dfs -cat test-out/part-r-00000
集群SSH无密码登录
把master的/home/hadoop/.ssh/id_rsa.pub追加到各个slave的/home/hadoop/.ssh/authorized_keys里
把各个slave的/home/hadoop/.ssh/id_rsa.pub追加到master的/home/hadoop/.ssh/authorized_keys里
集群部署:
修改master和各个slave的/etc/host,添加如下内容:
10.60.10.55 hjwang1
10.60.10.75 hjwang2
10.60.10.77 hjwang3
修改core-site.xml、hdfs-site.xml、mapred-site.xml、masters、slaves
修改配置文件conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hjwang1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
修改配置文件conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
修改配置文件conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hjwang1:9001</value>
</property>
</configuration>
修改配置文件masters
hjwang1
修改配置文件slaves
hjwang2
hjwang3
在master和各个slave的同一目录/usr/local部署hadoop-0.20.2,并分配用户和组
以hadoop身份,格式化namenode,在HADOOP_HOME下:
./bin/hadoop namenode -format
以hadoop身份,启动hadoop,在HADOOP_HOME下:
./bin/start-all.sh
验证
在master(hjwang1)上有以下进程
org.apache.hadoop.hdfs.server.namenode.NameNode
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
org.apache.hadoop.mapred.JobTracker
在slave(hjwang2)上有以下进程
org.apache.hadoop.hdfs.server.datanode.DataNode
org.apache.hadoop.mapred.TaskTracker
在slave(hjwang3)上有以下进程
org.apache.hadoop.hdfs.server.datanode.DataNode
org.apache.hadoop.mapred.TaskTracker
namenode的HTTP状态查看
http://10.60.10.55:50070/
secondarynamenode的HTTP状态查看
http://10.60.10.55:50090/
jobtracker的HTTP状态查看
http://10.60.10.55:50030/
datanode的HTTP状态查看
http://10.60.10.75:50075/
http://10.60.10.77:50075/
tasktracker的HTTP状态查看
http://10.60.10.75:50060/
http://10.60.10.77:50060/
***********************************************************************************
以hadoop身份,
sudo tar xzf hbase-0.20.5.tar.gz
sudo chown -R hadoop:hadoop hbase-0.20.5
修改jdk环境变量,在文件conf/hbase-env.sh
export JAVA_HOEM=/usr/lib/jvm/jdk1.6.0_21
集群部署:
修改hbase-site.xml、regionservers
修改配置文件conf/hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hjwang1:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hjwang1,hjwang2,hjwang3</value>
</property>
</configuration>
修改配置文件regionservers
hjwang2
hjwang3
在master和各个slave的同一目录/usr/local部署hbase-0.20.5,并分配用户和组
以hadoop身份,启动hadoop,在HADOOP_HOME下:
./bin/start-all.sh
以hadoop身份,启动hbase,在HBASE_HOME下:
./bin/start-hbase.sh
验证
在master(hjwang1)上有以下进程
org.apache.hadoop.hbase.zookeeper.HQuorumPeer
org.apache.hadoop.hbase.master.HMaster
在slave(hjwang2)上有以下进程
org.apache.hadoop.hbase.zookeeper.HQuorumPeer
org.apache.hadoop.hbase.regionserver.HRegionServer
在slave(hjwang3)上有以下进程
org.apache.hadoop.hbase.zookeeper.HQuorumPeer
org.apache.hadoop.hbase.regionserver.HRegionServer
hbase的HTTP状态查看
http://hjwang1:60010/master.jsp
http://hjwang1:60010/zk.jsp
http://hjwang2:60030/regionserver.jsp
http://hjwang3:60030/regionserver.jsp
在master上的目录HBASE_HOME下,以hadoop身份,运行
./bin/hbase shell,进入hbase控制台
list,显示hbase里的表
***********************************************************************************
名为masters的配置文件:每行一台second namenode机器
Secondary NameNode通过bin/start-dfs.sh在conf/masters中指定的节点上启动
***********************************************************************************
选定安装目录/usr/lib/jvm,并把安装文件jdk-6u21-linux-i586.bin放入
给予安装文件权限
chmod 777 jdk-6u21-linux-i586.bin
安装
./jdk-6u21-linux-i586.bin
配置环境变量,打开文件/etc/profile,在行“umask 022”前加入以下三行:
export JAVA_HOME=/usr/lib/jvm/jdk1.6.0_21
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
重新加载文件/etc/profile
source /etc/profile
验证
java -version
选定安装目录/usr/local,并把安装文件hadoop-0.20.2.tar.gz放入
以hjwang1身份,增加hadoop的组及同名用户
sudo addgroup hadoop
sudo adduser --ingroup hadoop hadoop
以超级用户身份,为hadoop添加sudo权限
chmod u+w /etc/sudoers
在“root ALL=(ALL) ALL”后面添加一行"hadoop ALL=(ALL) ALL"
chmod u-w /etc/sudoers
安装SSH
以hadoop身份,生成SSH-RSA-key(用空密码)
ssh-keygen -t rsa -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
sudo /etc/init.d/ssh reload
ssh localhost
以hadoop身份,
sudo tar xzf hadoop-0.20.2.tar.gz
sudo chown -R hadoop:hadoop hadoop-0.20.2
修改jdk环境变量,在文件conf/hadoop-env.sh
export JAVA_HOEM=/usr/lib/jvm/jdk1.6.0_21
(HADOOP_OPTS=-Djava.net.preferIPv4Stack=true)决定端口是监听在IPv4,默认在IPv6(在64位系统下)
伪分布部署:
修改配置文件conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
修改配置文件conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
修改配置文件conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
格式化namenode,在HADOOP_HOME下:
./bin/hadoop namenode -format
启动hadoop,在HADOOP_HOME下:
./bin/start-all.sh
验证启动是否成功,在HADOOP_HOME下:
方法一:
jps
方法二:
./bin/hadoop dfsadmin -report
例子:
在/home/hadoop下建文件test_file1,内容如下:
Wang Hongjun good
在/home/hadoop下建文件test_file2,内容如下:
Wang Hongjun best Wang Hongjun
在分布式文件系统里建立目录test-in,命令如下:
./bin/hadoop dfs -mkdir test-in
把文件test_file1、test_file2复制到目录test-in里,命令如下:
./bin/hadoop dfs -copyFromLocal /home/hadoop/test_* test-in
可以查看目录test-in的存储情况:命令如下:
./bin/hadoop dfs -ls test-in
运行统计词频命令:
./bin/hadoop jar hadoop-0.20.2-examples.jar wordcount test-in test-out
可以查看目录test-out的存储情况:命令如下:
./bin/hadoop dfs -ls test-out
查看统计结果,命令如下:
./bin/hadoop dfs -cat test-out/part-r-00000
集群SSH无密码登录
把master的/home/hadoop/.ssh/id_rsa.pub追加到各个slave的/home/hadoop/.ssh/authorized_keys里
把各个slave的/home/hadoop/.ssh/id_rsa.pub追加到master的/home/hadoop/.ssh/authorized_keys里
集群部署:
修改master和各个slave的/etc/host,添加如下内容:
10.60.10.55 hjwang1
10.60.10.75 hjwang2
10.60.10.77 hjwang3
修改core-site.xml、hdfs-site.xml、mapred-site.xml、masters、slaves
修改配置文件conf/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hjwang1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
修改配置文件conf/hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
修改配置文件conf/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hjwang1:9001</value>
</property>
</configuration>
修改配置文件masters
hjwang1
修改配置文件slaves
hjwang2
hjwang3
在master和各个slave的同一目录/usr/local部署hadoop-0.20.2,并分配用户和组
以hadoop身份,格式化namenode,在HADOOP_HOME下:
./bin/hadoop namenode -format
以hadoop身份,启动hadoop,在HADOOP_HOME下:
./bin/start-all.sh
验证
在master(hjwang1)上有以下进程
org.apache.hadoop.hdfs.server.namenode.NameNode
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
org.apache.hadoop.mapred.JobTracker
在slave(hjwang2)上有以下进程
org.apache.hadoop.hdfs.server.datanode.DataNode
org.apache.hadoop.mapred.TaskTracker
在slave(hjwang3)上有以下进程
org.apache.hadoop.hdfs.server.datanode.DataNode
org.apache.hadoop.mapred.TaskTracker
namenode的HTTP状态查看
http://10.60.10.55:50070/
secondarynamenode的HTTP状态查看
http://10.60.10.55:50090/
jobtracker的HTTP状态查看
http://10.60.10.55:50030/
datanode的HTTP状态查看
http://10.60.10.75:50075/
http://10.60.10.77:50075/
tasktracker的HTTP状态查看
http://10.60.10.75:50060/
http://10.60.10.77:50060/
***********************************************************************************
以hadoop身份,
sudo tar xzf hbase-0.20.5.tar.gz
sudo chown -R hadoop:hadoop hbase-0.20.5
修改jdk环境变量,在文件conf/hbase-env.sh
export JAVA_HOEM=/usr/lib/jvm/jdk1.6.0_21
集群部署:
修改hbase-site.xml、regionservers
修改配置文件conf/hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hjwang1:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master.port</name>
<value>60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hjwang1,hjwang2,hjwang3</value>
</property>
</configuration>
修改配置文件regionservers
hjwang2
hjwang3
在master和各个slave的同一目录/usr/local部署hbase-0.20.5,并分配用户和组
以hadoop身份,启动hadoop,在HADOOP_HOME下:
./bin/start-all.sh
以hadoop身份,启动hbase,在HBASE_HOME下:
./bin/start-hbase.sh
验证
在master(hjwang1)上有以下进程
org.apache.hadoop.hbase.zookeeper.HQuorumPeer
org.apache.hadoop.hbase.master.HMaster
在slave(hjwang2)上有以下进程
org.apache.hadoop.hbase.zookeeper.HQuorumPeer
org.apache.hadoop.hbase.regionserver.HRegionServer
在slave(hjwang3)上有以下进程
org.apache.hadoop.hbase.zookeeper.HQuorumPeer
org.apache.hadoop.hbase.regionserver.HRegionServer
hbase的HTTP状态查看
http://hjwang1:60010/master.jsp
http://hjwang1:60010/zk.jsp
http://hjwang2:60030/regionserver.jsp
http://hjwang3:60030/regionserver.jsp
在master上的目录HBASE_HOME下,以hadoop身份,运行
./bin/hbase shell,进入hbase控制台
list,显示hbase里的表
***********************************************************************************
名为masters的配置文件:每行一台second namenode机器
Secondary NameNode通过bin/start-dfs.sh在conf/masters中指定的节点上启动
***********************************************************************************