hadoop集群搭建

最近为了解决30亿+清单级数据的查询工作,尝试用presto解决。

方案1:采用deepgreen, 优化表分布,建立索引

方案2:采用hadoop+presto

回顾一下hadoop集群的搭建过程:

1.1,准备机器

10.1.240.183    base0183
10.1.240.184    base0184
10.1.240.185    base0185
10.1.240.186    base0186

10.1.240.187    base0187

新建用户、用户组

groupadd hadoopGroup

useradd -g hadoopGroup hadoop

passwd hadoop

1.2,下载安装文件

http://mirror.bit.edu.cn/apache/hadoop/common/选择相应版本下载,这里使用hadoop-2.7.5.tar.gz

oracle官网下载相应jdk,这里使用jdk-8u161-linux-x64.tar.gz

 

1.3,安装ssh

Hadoop需要通过ssh来启动salve列表中各台主机的守护进程

Generate secret key using rsa method(in ~):

ssh-keygen -t rsa -P ""

Press enter and it will generate files in /home/hadoop/.ssh

Add id_rsa.pub to authorized_keys:cat .ssh/id_rsa.pub >> .ssh/authorized_keys

 

Generate secret key on each Slave:ssh-keygen -t rsa -P ""

Send authorized_keys of Master to each Slave:

scp ~/.ssh/authorized_keys  hadoop@base0183:~/.ssh/

scp ~/.ssh/authorized_keys  hadoop@base0184:~/.ssh/

scp ~/.ssh/authorized_keys  hadoop@base0185:~/.ssh/

scp ~/.ssh/authorized_keys  hadoop@base0186:~/.ssh/

Testing ssh trust: ssh hadoop@base0183

It works if no password enter needed anymore

 

 

1.4,配置节点hostame,这里只修改了主节点为master,不改也可以,只需要修改后续hadoop配置文件用到hostname的地方

 hostname master

vi /etc/hosts,输入如下信息

10.1.240.183    base0183
10.1.240.184    base0184
10.1.240.185    base0185
10.1.240.186    base0186

10.1.240.187    master

 

1.5,安装配置jdk

Hadoop是用Java开发的,Hadoop的编译及MapReduce的运行都需要使用JDK

tar -zxvf jdk-8u161-linux-x64.tar.gz

[root@base0187 ~]# mkdir -p /usr/local/java

[root@base0187 ~]# chown hadoop:hadoopGroup /usr/local/java

vi .bash_profile

export JAVA_HOME=/usr/local/java/jdk1.8.0_161
export JRE_HOME=/usr/local/java/jdk1.8.0_161/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH:

source .bash_profile

验证java -version

[hadoop@master ~]$ java -version
java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

Send java file to each Slave,to same location with Master:

scp -r /usr/local/java/jdk1.8.0_161 hadoop@base0183:/usr/local/java
scp -r /usr/local/java/jdk1.8.0_161 hadoop@base0184:/usr/local/java
scp -r /usr/local/java/jdk1.8.0_161 hadoop@base0185:/usr/local/java
scp -r /usr/local/java/jdk1.8.0_161 hadoop@base0186:/usr/local/java

1.6,安装hadoop

在/opt新建hadoop文件夹,用于hadoop安装目录

在/data新建三个文件夹,用于后续的hadoop配置

mkdir -p /opt/hadoop
mkdir -p /data/hdfs/name
mkdir -p /data/hdfs/data
mkdir -p /data/hdfs/tmp

chown hadoop:hadoopGroup /opt/hadoop
chown -R hadoop:hadoopGroup /data/hdfs

在各个节点分别执行上述命令。

Unzip the installer: tar -zxvf hadoop-2.7.5.tar.gz  /opt/hadoop

hadoop环境变量

 

vi .bash_profile

export HADOOP_HOME=/opt/hadoop/hadoop-2.7.5
export JAVA_HOME=/usr/local/java/jdk1.8.0_161
export JRE_HOME=/usr/local/java/jdk1.8.0_161/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin

source .bash_profile

 

configuring etc/hadoop/hadoop-env.sh,此处如果使用默认的${JAVA_HOME}, 有的系统启动时候会报错,须使用绝对路径

Configuring etc/hadoop/core-site.xml:

!-- Put site-specific property overrides in this file. -->

<configuration>
 <property>
         <!--指定namenode的地址-->
         <name>fs.default.name</name>
        <value>hdfs://master:9000</value>
 </property>

 <property>
         <!--用来指定使用hadoop时产生文件的存放目录-->
         <name>hadoop.tmp.dir</name>
         <value>file:/data/hdfs/tmp</value>
         <description>A base for other temporary directories.</description>
 </property>

  <property>
         <name>hadoop.proxyuser.hadoo.hosts</name>
         <value>*</value>
 </property>

 <property>
         <name>hadoop.proxyuser.hadoop.groups</name>
         <value>*</value>
 </property>
</configuration>

Configuring etc/hadoop/mapred-site.xml(if it didn't exist, rename file mapred-site.xml.template):

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
        <name>mapred.job.tracker</name>
        <value>Master:9001</value>
</property>
</configuration>

Configuring etc/hadoop/hdfs-site.xml

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>dfs.replication</name>
  <value>2</value>
</property>
<property>
  <name>dfs.namenode.name.dir</name>
  <value>file:/data/hdfs/name</value>
  <final>true</final>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>file:/data/hdfs/data</value>
  <final>true</final>
</property>
<property>
  <name>dfs.namenode.secondary.http-address</name>
  <value>master:9001</value>
</property>
<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>
<property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>
</configuration

dfs.namenode.name.dir确定DFS名称节点应在本地文件系统的哪个位置存储名称表(fsimage)。
    如果这是一个以逗号分隔的目录列表,则名称表将被复制到所有目录中,以实现冗余

dfs.datanode.data.dir确定DFS数据节点应该在本地文件系统上存储块的位置。
  如果这是以逗号分隔的目录列表,则数据将存储在所有已命名的目录中,通常位于不同的设备上。 应该为HDFS存储策略标记相应的存储类型([SSD] / [磁盘] / [存档] / [RAM_DISK])。
如果目录没有显式标记存储类型,则默认存储类型为DISK。 如果本地文件系统权限允许,则不存在的目录将被创建。

Add Slave namenode to slaves file:

[hadoop@master hadoop]$ cat slaves 
base0183
base0184
base0185
base0186

Send hadoop file to each Slave,to same location with Master:

scp -r /opt/hadoop/hadoop-2.7.5/ hadoop@base0183:/opt/hadoop/
scp -r /opt/hadoop/hadoop-2.7.5/ hadoop@base0184:/opt/hadoop/
scp -r /opt/hadoop/hadoop-2.7.5/ hadoop@base0185:/opt/hadoop/
scp -r /opt/hadoop/hadoop-2.7.5/ hadoop@base0186:/opt/hadoop/

 Send .bash_profile file to each Slave,to same location with Master:

chown hadoop:hadoopGroup /opt/hadoop
chown -R hadoop:hadoopGroup /data/hdfs

scp .bash_profile  hadoop@base0183:~/
scp .bash_profile  hadoop@base0184:~/
scp .bash_profile  hadoop@base0185:~/
scp .bash_profile  hadoop@base0186:~/

 

1.7,启动集群

 

cd /opt/hadoop/hadoop-2.7.5/bin

./hdfs namenode -format    # 格式化集群

[hadoop@master ~]$ hdfs namenode -fromat
18/03/12 10:45:43 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/10.1.240.187
STARTUP_MSG:   args = [-fromat]
STARTUP_MSG:   version = 2.7.5
STARTUP_MSG:   classpath = /opt/hadoop/hadoop-2.7.5/etc/

 

cd /opt/hadoop/hadoop-2.7.5/sbin

 

./start-all.sh

 

Check connection status in namenode:

 

hdfs dfsadmin -repor

 

浏览器访问:http://10.1.240.187:50070

 

1.8,利用自带的wordcount程序测试环境是否搭建成功

编辑任意andy_test.txt,并上传到hdfs.

hdfs dfs -mkdir /input
hdfs dfs -put andy_test.txt /input

Run workcount demo:

hadoop jar /opt/hadoop/hadoop-2.7.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar  wordcount /input /output

报错如下:

java.net.NoRouteToHostException: No route to host,

检查发现其中一台节点184防火墙没有关闭,

service iptables status

service iptables start

service iptables stop

[root@base0184 ~]# service iptables status
Table: filter
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED 
2    ACCEPT     icmp --  0.0.0.0/0            0.0.0.0/0           
3    ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0           
4    ACCEPT     tcp  --  0.0.0.0/0            0.0.0.0/0           state NEW tcp dpt:22 
5    REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited 

Chain FORWARD (policy ACCEPT)
num  target     prot opt source               destination         
1    REJECT     all  --  0.0.0.0/0            0.0.0.0/0           reject-with icmp-host-prohibited 

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination         

[root@base0184 ~]# service iptables stop
iptables: Setting chains to policy ACCEPT: filter          [  OK  ]
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Unloading modules:                               [  OK  ]
[root@base0184 ~]# 

excute again,success,Check result :

[hadoop@master mapreduce]$ hdfs dfs -ls /output
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2018-03-12 12:58 /output/_SUCCESS
-rw-r--r--   2 hadoop supergroup        303 2018-03-12 12:58 /output/part-r-00000
[hadoop@master mapreduce]$ hdfs dfs -cat  /output/part-r-00000
And    1
Give    3
above    1....

至此,集群搭建成功!

 

查看集群的默认配置

http://10.1.197.45:50070/conf

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值