大数据项目《网站用户行为分析》综合实验记录

案例任务进度

  • 安装Linux操作系统
  • 安装关系型数据库MySQL
  • 安装大数据处理框架Hadoop
  • 安装列族数据库HBase
  • 安装数据仓库Hive
  • 安装Sqoop
  • 对文本文件形式的原始数据集进行预处理
  • 把文本文件的数据集导入到数据仓库Hive中
  • 对数据仓库Hive中的数据进行查询分析
  • 使用Sqoop将数据从Hive导入MySQL
  • 使用Sqoop将数据从MySQL导入HBase
  • 使用HBase Java API把数据从本地导入到HBase中
  • 使用R对MySQL中的数据进行可视化分析
安装关系型数据库MySQL

yum -y install wget
wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql-community-server
systemctl start mysqld.service
systemctl status mysqld.service
grep “password” /var/log/mysqld.log
mysql -uroot -p
ALTER USER ‘root’@‘localhost’ IDENTIFIED BY ‘(number3)WDmysql_’;
mysql> set global validate_password_policy=0;
mysql> set global validate_password_length=1;
ALTER USER ‘root’@‘localhost’ IDENTIFIED BY ’ ’
yum -y remove mysql57-community-release-el7-10.noarch

安装数据仓库Hive

tar -zxvf ./apache-hive-1.2.1-bin.tar.gz
source /etc/profile
[xxxxcentos@xxxxcentos7 ~]$ cd /app/lib/apache-hive-1.2.1-bin/conf/
[xxxxcentos@xxxxcentos7 conf]$ mv hive-default.xml.template hive-default.xml
vim hive-site.xml

配置mysql

tar -zxvf mysql-connector-java-5.1.40.tar.gz #解压
cp mysql-connector-java-5.1.40/mysql-connector-java-5.1.40-bin.jar /usr/local/hive/lib

systemctl start mysqld.service
systemctl status mysqld.service
mysql -u root -p #登陆shell界面
mysql> create database hive; #这个hive数据库与hive-site.xml中localhost:3306/hive的hive对应,用来保存hive元数据
mysql
5. 配置mysql允许hive接入:

mysql> grant all on . to hive@localhost identified by ‘hive’; #将所有数据库的所有表的所有权限赋给hive用户,后面的hive是配置hive-site.xml中配置的连接密码
mysql> flush privileges; #刷新mysql系统权限关系表

[xxxxcentos@xxxxcentos7 apache-hive-1.2.1-bin]$ hive
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/CommandNeedRetryException
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.CommandNeedRetryException
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 4 more

解决方法如下:
在Hadoop的配置文件hadoop-env.sh中,添加了如下内容:
export HADOOP_CLASSPATH=…:
但是这样配置会覆盖掉HADOOP_CLASSPATH变量原本的信息,所以应该修改为:
export HADOOP_CLASSPATH=…:$HADOOP_CLASSPATH

 $hive
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

解决如下

hive --service metastore &
[xxxxcentos@xxxxcentos7 apache-hive-1.2.1-bin]$ Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/lib/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/lib/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083.

解决
查看启动metastore的进程号
$ ps -aux | grep ‘metastore’
杀死相关进程
$ kill -9 进程号

hive --service metastore &
[xxxxcentos@xxxxcentos7 apache-hive-1.2.1-bin]$ Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/app/lib/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/app/lib/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
java.lang.NoSuchMethodError: org.apache.thrift.protocol.TBinaryProtocol$Factory.<init>(ZZJJ)V

解决:修改hadooop-env.sh的HADOOP_CLASSPATH
不能出现。。。/hbase/lib/*

重启HADOOP集群
hive --service metastore &
hive
依然报错
org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083.

添加hive 配置

hive.metastore.uris
thrift://cdh1:9083

[xxxxcentos@xxxxcentos7 hadoop-2.7.3]$ hive --service metastore &
[1] 68267
[xxxxcentos@xxxxcentos7 hadoop-2.7.3]$ Starting Hive Metastore Server
javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://192.168.1.179:3306/hive?createDatabaseIfNotExist=true, username = root. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: null,  message from server: "Host 'xxxxcentos7.com' is not allowed to connect to this MySQL server"

mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select 'host' from user where user='root';
+------+
| host |
+------+
| host |
+------+
1 row in set (0.06 sec)

mysql> update user set host = '%' where user = 'root';
Query OK, 1 row affected (0.05 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select host, user from user;
+-----------+---------------+
| host      | user          |
+-----------+---------------+
| %         | root          |
| localhost | hive          |
| localhost | mysql.session |
| localhost | mysql.sys     |
+-----------+---------------+
4 rows in set (0.00 sec)

并且修改hive中的url为localhost

再次运行 hive --service metastore &又报错

javax.jdo.JDODataStoreException: Required table missing : "`VERSION`" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.autoCreateTables"
确定了hive元数据库没有初始化是罪魁祸首,最后,得知hive可以通过命令初始化元数据库

第一次需要执行初始化命令:schematool -dbType mysql -initSchema
查看那初始化后的信息 schematool -dbType mysql -info

[xxxxcentos@xxxxcentos7 ~]$ hive

Logging initialized using configuration in jar:file:/app/lib/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> 

安装Sqoop

tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
cd sqoop/conf/
cat sqoop-env-template.sh >> sqoop-env.sh
#将sqoop-env-template.sh复制一份并命名为sqoop-env.sh
vim sqoop-env.sh #编辑sqoop-env.sh
export HADOOP_COMMON_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=/usr/local/hadoop
export HBASE_HOME=/usr/local/hbase
export HIVE_HOME=/usr/local/hive
#export ZOOCFGDIR= #如果读者配置了ZooKeeper,也需要在此配置ZooKeeper的路径

修改profile
export SQOOP_HOME=/usr/local/sqoop
export PATH= P A T H : PATH: PATH:SQOOP_HOME/bin
export CLASSPATH= C L A S S

  • 2
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值