案例任务进度
- 安装Linux操作系统
- 安装关系型数据库MySQL
- 安装大数据处理框架Hadoop
- 安装列族数据库HBase
- 安装数据仓库Hive
- 安装Sqoop
- 对文本文件形式的原始数据集进行预处理
- 把文本文件的数据集导入到数据仓库Hive中
- 对数据仓库Hive中的数据进行查询分析
- 使用Sqoop将数据从Hive导入MySQL
- 使用Sqoop将数据从MySQL导入HBase
- 使用HBase Java API把数据从本地导入到HBase中
- 使用R对MySQL中的数据进行可视化分析
安装关系型数据库MySQL
yum -y install wget
wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql-community-server
systemctl start mysqld.service
systemctl status mysqld.service
grep “password” /var/log/mysqld.log
mysql -uroot -p
ALTER USER ‘root’@‘localhost’ IDENTIFIED BY ‘(number3)WDmysql_’;
mysql> set global validate_password_policy=0;
mysql> set global validate_password_length=1;
ALTER USER ‘root’@‘localhost’ IDENTIFIED BY ’ ’
yum -y remove mysql57-community-release-el7-10.noarch
安装数据仓库Hive
tar -zxvf ./apache-hive-1.2.1-bin.tar.gz
source /etc/profile
[xxxxcentos@xxxxcentos7 ~]$ cd /app/lib/apache-hive-1.2.1-bin/conf/
[xxxxcentos@xxxxcentos7 conf]$ mv hive-default.xml.template hive-default.xml
vim hive-site.xml
配置mysql
tar -zxvf mysql-connector-java-5.1.40.tar.gz #解压
cp mysql-connector-java-5.1.40/mysql-connector-java-5.1.40-bin.jar /usr/local/hive/lib
systemctl start mysqld.service
systemctl status mysqld.service
mysql -u root -p #登陆shell界面
mysql> create database hive; #这个hive数据库与hive-site.xml中localhost:3306/hive的hive对应,用来保存hive元数据
mysql
5. 配置mysql允许hive接入:
mysql> grant all on . to hive@localhost identified by ‘hive’; #将所有数据库的所有表的所有权限赋给hive用户,后面的hive是配置hive-site.xml中配置的连接密码
mysql> flush privileges; #刷新mysql系统权限关系表
[xxxxcentos@xxxxcentos7 apache-hive-1.2.1-bin]$ hive
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/CommandNeedRetryException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.CommandNeedRetryException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 4 more
解决方法如下:
在Hadoop的配置文件hadoop-env.sh中,添加了如下内容:
export HADOOP_CLASSPATH=…:
但是这样配置会覆盖掉HADOOP_CLASSPATH变量原本的信息,所以应该修改为:
export HADOOP_CLASSPATH=…:$HADOOP_CLASSPATH
$hive
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
解决如下
hive --service metastore &
[xxxxcentos@xxxxcentos7 apache-hive-1.2.1-bin]$ Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/lib/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/lib/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083.
解决
查看启动metastore的进程号
$ ps -aux | grep ‘metastore’
杀死相关进程
$ kill -9 进程号
hive --service metastore &
[xxxxcentos@xxxxcentos7 apache-hive-1.2.1-bin]$ Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/lib/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/lib/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
java.lang.NoSuchMethodError: org.apache.thrift.protocol.TBinaryProtocol$Factory.<init>(ZZJJ)V
解决:修改hadooop-env.sh的HADOOP_CLASSPATH
不能出现。。。/hbase/lib/*
重启HADOOP集群
hive --service metastore &
hive
依然报错
org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083.
添加hive 配置
hive.metastore.uris
thrift://cdh1:9083
[xxxxcentos@xxxxcentos7 hadoop-2.7.3]$ hive --service metastore &
[1] 68267
[xxxxcentos@xxxxcentos7 hadoop-2.7.3]$ Starting Hive Metastore Server
javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://192.168.1.179:3306/hive?createDatabaseIfNotExist=true, username = root. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: null, message from server: "Host 'xxxxcentos7.com' is not allowed to connect to this MySQL server"
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select 'host' from user where user='root';
+------+
| host |
+------+
| host |
+------+
1 row in set (0.06 sec)
mysql> update user set host = '%' where user = 'root';
Query OK, 1 row affected (0.05 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select host, user from user;
+-----------+---------------+
| host | user |
+-----------+---------------+
| % | root |
| localhost | hive |
| localhost | mysql.session |
| localhost | mysql.sys |
+-----------+---------------+
4 rows in set (0.00 sec)
并且修改hive中的url为localhost
再次运行 hive --service metastore &又报错
javax.jdo.JDODataStoreException: Required table missing : "`VERSION`" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.autoCreateTables"
确定了hive元数据库没有初始化是罪魁祸首,最后,得知hive可以通过命令初始化元数据库
第一次需要执行初始化命令:schematool -dbType mysql -initSchema
查看那初始化后的信息 schematool -dbType mysql -info
[xxxxcentos@xxxxcentos7 ~]$ hive
Logging initialized using configuration in jar:file:/app/lib/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive>
安装Sqoop
tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
cd sqoop/conf/
cat sqoop-env-template.sh >> sqoop-env.sh
#将sqoop-env-template.sh复制一份并命名为sqoop-env.sh
vim sqoop-env.sh #编辑sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=/usr/local/hadoop
export HBASE_HOME=/usr/local/hbase
export HIVE_HOME=/usr/local/hive
#export ZOOCFGDIR= #如果读者配置了ZooKeeper,也需要在此配置ZooKeeper的路径
修改profile
export SQOOP_HOME=/usr/local/sqoop
export PATH= P A T H : PATH: PATH:SQOOP_HOME/bin
export CLASSPATH= C L A S S