SparkSQL和Alluxio结合使用

1.先在hive配置文件中增加alluxio相关配置,然后再安装hive

修改配置文件 playbooks/hive/vars.yml:

#alluxio
alluxio_path: "/opt/alluxio-1.8.1"
alluxio_zookeeper_address: "192.168.6.12:2181,192.168.6.13:2181,192.168.6.14:2181"

 

# hive configration增加

hive_site_properties:
 - {
    "name":"fs.alluxio.impl",
    "value":"alluxio.hadoop.FileSystem"
  }
  - {
    "name":"hive.metastore.schema.verification",
    "value":"false"
  }
  - {
    "name":"alluxio.zookeeper.enabled",
    "value":"true"
  }
  - {
    "name":"alluxio.zookeeper.address",
    "value":"{{ alluxio_zookeeper_address }}"
  }

修改配置文件 roles/hive/templates/hive_env.sh:

 

export HIVE_AUX_JARS_PATH={{ alluxio_path }}/client/alluxio-1.8.1-client.jar:$HIVE_AUX_JARS_PATH
export CLASSPATH=$CLASSPATH.:$HIVE_HOME/lib:$HIVE_AUX_JARS_PATH

2.准备测试数据

 

https://grouplens.org/datasets/movielens/

bin/alluxio fs mkdir /ml-100k
bin/alluxio fs copyFromLocal /opt/ml-100k/u.user alluxio://192.168.6.12:19998/ml-100k

3.启动spark thriftserver

sbin/start-thriftserver.sh

4.使用beeline测试

beeline
!connect jdbc:hive2://192.168.6.12:10000/default
hadoop #username 
hadoop #password 

 

 

用hadoop连接成功后,先创建表再查询:

内部表
CREATE TABLE u_user (
userid INT,
age INT,
gender CHAR(1),
occupation STRING,
zipcode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION 'alluxio://192.168.6.12:19998/ml-100k';

查询:

select * from u_user limit 10;

统计查询:

select count(*) from u_user;

查看表结构、分组查询:

desc u_user;

select avg(age),gender from u_user group by gender;

select count(*),gender from u_user group by gender;

创建外部表

CREATE EXTERNAL TABLE u_user_e (
userid INT,
age INT,
gender CHAR(1),
occupation STRING,
zipcode STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
LOCATION 'alluxio://192.168.6.12:19998/ml-100k';

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值