shark0.8.1安装部署

3 篇文章 0 订阅
3 篇文章 0 订阅

 

            作为Hadoop生态系统一部分的Hive,使得用户可以编写类sql(HQL)语句后再由Hive进行转化成不同的map-reduce任务交给hadoop来执行。Spark是一个分布式的内存计算系统,主要充当分布式计算的部分,比hadoop的map-reduce速度更快。shark是spark的一个组件,一个开源的、分布式的、容错的基于内存的分析系统,使用了现有的Hive客户端和元数据存储,兼容了Hive的功能,速度比hive快很多,是一个在spark之上调用shark来运行hive数据的系统,它把HQL转化成多个小任务在spark上运行。  shark是大数据实时查询分析的利器,支持HiveQL,Hive数据格式以及udf函数,shark SQL查询比Hive快100倍,此外可以用来查询HDFS,HBase数据,机器学习比hadoop快100倍。

  •  Spark 0.8.1
  •  Shark 0.8.1
  •  Hive   0.9.0
  •  Hadoop2.0.0-CDH4.4.0
  •  Scala-0.9.3  

以下是各种下载地址:直接wget即可。

名称

下载地址

Spark 0.8.1

http://d3kbcqa49mib13.cloudfront.net/spark-0.8.1-incubating-bin-cdh4.tgz

Shark 0.8.1

https://github.com/amplab/shark/releases/download/v0.8.1/shark-0.8.1-bin-cdh4.tgz

Hive 0.9.0

https://github.com/amplab/shark/releases/download/v0.8.1/hive-0.9.0-bin.tgz

Hadoop2-CDH4.3

http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.3.0.tar.gz

Scala 0.9.3

http://www.scala-lang.org/files/archive/scala-2.9.3.tgz

 

 spark:        

$cd $YOUR_SPARK_HOME/
$vim conf/spark-env.sh

        修改spark-env.sh

SCALA_HOME=/usr/lib/spark-0.8.1-incubating-bin-cdh4/scala-2.9.3
JAVA_HOME=/usr/java/jdk1.7.0_25
export CLASSPATH=/usr/java/jdk1.7.0_25/lib
SPARK_MASTER_IP=192.168.10.220
SPARK_MASTER_PORT=8081
SPARK_MASTER_WEBUI_PORT=8090
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=4g
SPARK_WORKER_PORT=8091
SPARK_WORKER_WEBUI_PORT=8092
SPARK_WORKER_INSTANCES=1
export SPARK_JAVA_OPTS="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps"
export HADOOP_HOME=/usr/lib/hadoop

Hive:

         若使用mysql保存hive的元数据相关信息,需要拷贝mysql-connector-java-3.1.13-bin.jar 到$HIVE_HOME/lib 目录下。

#cd $YOUR_HIVE_HOME
#cp conf/hive-env.sh.template conf/hive-env.sh
#cp conf/hive-default.xml.template conf/hive-site.xml
#vim conf/hive-env.sh

       修改hive-env.xml         

....
HADOOP_HOME=$YOUR_HADOOP2_HOME
....

       修改hive-site.xml

 

.....

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/SHARK_DATABASE?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>shark</value>
<description>username to use against metastore database</description>
</property>

<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>shark</value>
<description>password to use against metastore database</description>
</property>

......

 

  Shark:       

#cd $YOUR_SHARK_HOME
#cp conf/shark-env.sh.template conf/shark-env.sh
#vim conf/shark-env.sh

       修改shark-env.sh

......

export SCALA_HOME=/usr/lib/scala-2.9.3 export HADOOP_HOME=/usr/lib/hadoop export HIVE_HOME=/usr/lib/shark-0.8.1-bin-cdh4/hive-0.9.0-bin/ export MASTER=spark://192.168.10.220:8081 export SPARK_HOME=/usr/lib/spark-0.8.1-incubating-bin-cdh4 export HIVE_CONF_DIR=/usr/lib/shark-0.8.0-bin-cdh4/hive-0.9.0-bin/conf

# Java options # On EC2, change the local.dir to /mnt/tmp SPARK_JAVA_OPTS="-Dspark.local.dir=/tmp " SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 " SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps " export SPARK_JAVA_OPTS

......
 

启动并测试     

#cd $YOUR_HADOOP_HOME
#bin/hadoop namenode -format
#sbin/start-all.sh

#cd $YOUR_SPARK_HOME
#bin/start-all.sh

#cd $YOUR_SHARK_HOME
#bin/shark
Starting the Shark Command Line Client
......
......
shark>show databases;
shark>create database SHARK_DB;
shark>use SHARK_DB;
shark>create table tbl_test(ID STRING);

 

可以看到shark作业按照队列逐一运行,在spark集群上运行状态如下:

Spark Master at spark://192.168.10.220:8081

 

  • URL: spark://192.168.10.220:8081
  • Workers: 4
  • Cores: 16 Total, 16 Used
  • Memory: 57.6 GB Total, 4.0 GB Used
  • Applications: 2 Running, 11 Completed

 

Workers
Id Address State Cores Memory
worker-20140518173608-CHBM220-34134 CHBM220:8081 ALIVE 4 (4 Used) 14.4 GB (1024.0 MB Used)
worker-20140518173608-CHBM221-41264 CHBM221:8081 ALIVE 4 (4 Used) 14.4 GB (1024.0 MB Used)
worker-20140518173610-CHBM223-36479 CHBM223:8081 ALIVE 4 (4 Used) 14.4 GB (1024.0 MB Used)
worker-20140518173611-CHBM224-41840 CHBM224:8081 ALIVE 4 (4 Used) 14.4 GB (1024.0 MB Used)

 

Running Applications
ID Name Cores Memory per Node Submitted Time User State Duration
app-20140519085613-0012 Shark::CHBM220 0 1024.0 MB 2014/05/19 08:56:13 root WAITING 9.4 min
app-20140519083651-0009 Shark::CHBM220 16 1024.0 MB 2014/05/19 08:36:51 root RUNNING 29 min

 

Completed Applications
ID Name Cores Memory per Node Submitted Time User State Duration
app-20140519083741-0011 Shark::CHBM220 0 1024.0 MB 2014/05/19 08:37:41 root FINISHED 8.7 min
app-20140519083723-0010 Shark::CHBM220 0 1024.0 MB 2014/05/19 08:37:23 root FINISHED 1 s

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值