为了更好学习大数据,需要搭建一个学习的环境。有误的地方希望大家共同指教。
首先介绍下我的硬件:
2台VM
namenode 192.168.1.10
datanode 192.168.1.11
===============================================================================
安装虚拟机VMware 10.0
在VMware上安装系统 CentOS7
================================================================================
登陆系统后
查看系统版本号
cat /etc/redhat-release
cat /etc/centos-release
===========================================================================
准备工作
========================================================================
查看系统安装的程序
rpm -qa
i.e. rpm -qa|grep -i mysql
需要卸载
yum remove software
配置yum 的下载源。
使用的是网易的linux 源 http://mirrors.163.com/centos
>cp CentOS-Base.repo CentOS-Base.repo.bk
>vi /etc/yum.repos.d/CenOS-Base.repo
把
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os&infra=$infra
#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
替换:
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os&infra=$infra
baseurl=http://mirrors.163.com/centos/$releasever/os/$basearch/
更新yum源:
>yum clean all && yum clean metadata && yum clean dbcache && yum makecache && yum update
>yum search "abc"
>yum -y install wget.x86_64
>yum -y install gcc.x86_64
>yum -y install net-tools
>whereis wget
>which wget
setting IP
查看ip
>ifconfig
使用文字图形界面设置ip
>yum install NetworkManager-tui
>nmtui
>nmtui-edit eno16777736 修改网卡配置
>nmtui-connect eno16777736
在这里可以设置IP 网关,DNS (DNS如果不知道如何设置,最好跟网关一样,否则会出现datanode连接不上namenode的异常,下面有介绍)
另外,如果你不是自己有DNS的话,还需要设置本机的DNS对应关系
在所有的NODE上都需要同样的设置
vi /etc/hosts
192.168.1.10 namenode
192.168.1.11 datanode
这样在hadoop的配置文件里面就可以全部使用namenode作为配置主机名称.
重启网络
对于系统service的操作,使用 systemctl
>systemctl restart network
>systemctl status network
or
service network restart
使用下面的命令来验证网络管理器服务的状态:
- $ systemctl status NetworkManager.service
运行以下命令来检查受网络管理器管理的网络接口:
- $ nmcli dev status
======================================================================
安装 java
=================================================================
下载安装文件 jdk-8u72-linux-x64.rpm
http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
rpm -ivh jdk-8u72-linux-x64.rpm
安装到了 /usr/java/jdk1.8.0_72
==================================================================
安装hadoop
===============================================================
下载 hadoop 2.6
http://www.apache.org/dyn/closer.cgi/hadoop/common/
http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
解压, 移动,建立lastest 的软连接, 以后直接更新这个 latest 的链接就行了。
tar zxvf hadoop-2.6.0.tar.gz
mkdir /usr/hadoop
mv hadoop-2.6.0/ /usr/hadoop/
cd /usr/hadoop
ln -s hadoop-2.6.0/ latest
tips: 删除软连接
rm -rf latest 注意不是 rm -rf latest/
vi /etc/profile
export JAVA_HOME=/usr/java/latest
export CLASSPATH=.:%JAVA_HOME%/lib/dt.jar:%JAVA_HOME%/lib/tools.jar:%JAVA_HOME%/lib/jdbc.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/usr/hadoop/latest
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HIVE_HOME=/usr/hive/latest
export PATH=$HIVE_HOME/bin:$PATH
:wq
source /etc/profile
cd $HADOOP_HOME
vi libexec/hadoop-config.sh
添加
export JAVA_HOME=/usr/java/latest
vi etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
vi etc/hadoop/hdfs-site.xml
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/hadoop/latest/dfs/name</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/hadoop/latest/dfs/data</value>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
vi etc/hadoop/yarn-site.xml
<!-- Resource Manager Configs -->
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<description>The minimum allocation for every container request at the RM,
in MBs. Memory requests lower than this will throw a
InvalidResourceRequestException.</description>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<description>The maximum allocation for every container request at the RM,
in MBs. Memory requests higher than this will throw a
InvalidResourceRequestException.</description>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<description>Amount of physical memory, in MB, that can be allocated
for containers.</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<description>Ratio between virtual memory to physical memory when
setting memory limits for containers. Container allocations are
expressed in terms of physical memory, and virtual memory usage
is allowed to exceed this allocation by this ratio.
</description>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<description>List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/tmp/hadoop/nm_tempfile/</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
vi etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs.
Can be one of local, classic or yarn.
</description>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx820m</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>1024</value>
<description>The amount of memory to request from the scheduler for each
reduce task.
</description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>namenode:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>namenode:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>
ssh配置
主机namenode 192.168.1.10
主机datanode 192.168.1.11
需要配置所有主机无密码登录主机namenode
先确保所有主机的防火墙处于关闭状态。
在主机namenode上执行如下:
1. $cd ~/.ssh
2. $ssh-keygen -t rsa --------------------然后一直按回车键,就会按照默认的选项将生成的密钥保存在.ssh/id_rsa文件中。
3. $cp id_rsa.pub authorized_keys
这步完成后,正常情况下就可以无密码登录本机了,即ssh localhost,无需输入密码。
在datanode上
1. $cd ~/.ssh
2. $ssh-keygen -t rsa --------------------然后一直按回车键,就会按照默认的选项将生成的密钥保存在.ssh/id_rsa文件中。
3. $scp id_rsa.pub root@192.168.1.10:~/.ssh/ ------把刚刚产生的authorized_keys文件拷一份到namenode上.
4. $cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys (追加主机公共匙的权限)
进入namenode的.ssh目录,改变authorized_keys文件的许可权限
5. $chmod 644 authorized_keys
当所有datanode 完成后,再把namenode上的(已经拥有所有机器的public key )文件 分发到所有datanode上
$scp authorized_keys root@datanode:~/.ssh/
=========================================================
安装启动操作
$HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
$HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
$HADOOP_PREFIX/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
$HADOOP_PREFIX/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR start nodemanager
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver
或者直接使用(需要使用 etc/hadoop/slaves )
$HADOOP_PREFIX/sbin/start-dfs.sh
$HADOOP_PREFIX/sbin/start-yarn.sh
$HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver
遇到ssh问题时,可以使用hostname 登陆第一次,然后再执行安装
i.e. ssh namenode 而不是 ssh localhost
安装过程中遇到如下问题:
hdfs datanode denied communication with namenode because hostname cannot be resolved
<pre style="margin-top: 0px; margin-bottom: 1em; padding: 5px; border: 0px; font-size: 13px; overflow: auto; width: auto; max-height: 600px; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, sans-serif; word-wrap: normal; background-color: rgb(238, 238, 238);"><code style="margin: 0px; padding: 0px; border: 0px; border-image-source: initial; border-image-slice: initial; border-image-width: initial; border-image-outset: initial; border-image-repeat: initial; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, sans-serif; white-space: inherit;"><span style="color:#222426;">Initialization failed for Block pool BP-232943349-192.168.1.10-1417116665984
(Datanode Uuid null) service to namenode/192.168.1.10:8022
Datanode denied communication with namenode because hostname cannot be resolved
</span><span style="color:#ff0000;">(ip=192.168.1.1, hostname=192.168.1.1)</span><span style="color:#222426;">: DatanodeRegistration(192.168.1.11,
datanodeUuid=49a6dc47-c988-4cb8-bd84-9fabf87807bf, infoPort=50075, ipcPort=50020,
storageInfo=lv=-56;cid=cluster24;nsid=11020533;c=0)</span></code>
解决方法参考了:
http://stackoverflow.com/questions/27195466/hdfs-datanode-denied-communication-with-namenode-because-hostname-cannot-be-reso
http://log.rowanto.com/why-datanode-is-denied-communication-with-namenode/
1)进入 nmtui 工具,删除dns, 然后重启网络
2) 在 $HADOOP_HOME/etc/hadoop/hdfs-site.xml 中添加
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
3) 检查 vi /etc/resolve.conf 是不是已经没有设置DNS了。如果有可以直接删除。这样基本问题可以解决。但没有DNS,就访问不了外网了。所以还需要重新设置DNS.
4) 进入nmtui 设置DNS 192.168.1.1 (之前出问题的是设置了 202.96.128.86 )
补充:后来发现了根本的原因!!!! 是我使用nmtui设置了ip,但没有把 dhcp 改为 manual ,不知道为什么我的IP可以起到作用,但dhcp有随时给我新的IP。所以datanode一直不能把最后的IP确定下来。
解决办法:
进入nmtui,把 automatic 改为 manual 就行了。
======================================================================================
安装HIVE,MYSQL
====================================================================================
解压缩、重命名、设置环境变量
下载HIVE http://apache.fayea.com/hive/stable/
解压到 /usr/hive/hive-version
建立软连接 latest => hive-version
$HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir -p /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
在目录$HIVE_HOME/conf/下,执行命令
cp hive-default.xml.template hive-site.xml
cp hive-env.sh.template hive-env.sh
Mysql安装
从最新版本的linux系统开始,默认的是 Mariadb而不是mysql!
使用系统自带的repos安装很简单:
yum install mariadb mariadb-server
systemctl start mariadb ==> 启动mariadb
systemctl enable mariadb ==> 开机自启动
mysql_secure_installation ==> 设置 root密码等相关 (第一次直接按回车,因为首次root密码为空)
mysql -uroot -p123456 ==> 测试登录!
使用mysql作为hive的metastore
下载最新jdbc driver
wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.38.tar.gz
tar zxvf mysql-connector-java-5.1.38.tar.gz
cp mysql-connector-java-5.1.38-bin.jar $JAVA_HOME/lib/
cd $JAVA_HOME/lib/
ln -s mysql-connector-java-5.1.38-bin.jar jdbc.jar
cd $HIVE_HOME/lib/
ln -s $JAVA_HOME/lib/mysql-connector-java-5.1.38-bin.jar jdbc.jar
把mysql的jdbc驱动添加到$CLASSPATH
vi /etc/profile
export CLASSPATH=.:%JAVA_HOME%/lib/dt.jar:%JAVA_HOME%/lib/tools.jar:%JAVA_HOME%/lib/jdbc.jar
:wq
source /etc/profile 使其生效
修改hive-site.xml文件,修改内容如下:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://namenode:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Test1234</value>
</property>
故障:
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
at jline.TerminalFactory.create(TerminalFactory.java:101)
at jline.TerminalFactory.get(TerminalFactory.java:158)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:229)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:221)
at jline.console.ConsoleReader.<init>(ConsoleReader.java:209)
at org.apache.Hadoop.hive.cli.CliDriver.getConsoleReader(CliDriver.java:773)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:715)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
原因:
hadoop目录下存在老版本jline:
/hadoop-2.6.0/share/hadoop/yarn/lib:
-rw-r--r-- 1 root root 87325 Mar 10 18:10 jline-0.9.94.jar
解决:
cp /usr/hive/latest/lib/jline-2.12.jar /usr/hadoop/latest/share/hadoop/yarn/lib/
rm -y /usr/hadoop/latest/share/hadoop/yarn/lib/jline-0.9.94.jar
Exception in thread "main"java.lang.RuntimeException: java.lang.IllegalArgumentException:java.net.URISyntaxException: Relative path in absolute URI:${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
atorg.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:444)
atorg.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)
atorg.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
atjava.lang.reflect.Method.invoke(Method.java:606)
atorg.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.IllegalArgumentException:java.net.URISyntaxException: Relative path in absolute URI:${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
atorg.apache.hadoop.fs.Path.initialize(Path.java:148)
atorg.apache.hadoop.fs.Path.<init>(Path.java:126)
atorg.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:487)
atorg.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:430)
... 7more
Caused by: java.net.URISyntaxException:Relative path in absolute URI:${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
atjava.net.URI.checkPath(URI.java:1804)
atjava.net.URI.<init>(URI.java:752)
atorg.apache.hadoop.fs.Path.initialize(Path.java:145)
... 10more
解决方案如下:
查看hive-site.xml配置,会看到配置值含有"${system:java.io.tmpdir}/${system:user.name}"的配置项
将配置项的值修改为/tmp/hive
启动hive,成功!
推荐使用的linux 性能监控工具 nmon, htop, glances
参考:http://os.51cto.com/art/201412/460698_all.htm
挂载U盘
1.在vmware上connect u盘
2.查看u盘的设备名称 如 /dev/sdb0
> fdisk -l
3. >mkdir /mnt/usb
>mount /dev/sdb0 /mnt/usb
>cd /mnt/usb
OK!
卸载:
退出所有U盘的目录
>cd /mnt
>umount /mnt/usb