hive on spark的安装实现


    Hive on Spark安装,hive是基于Hadoop的数据仓库,hdfs为hive存储空间,mapreduce为hive的sql计算引擎。但是由于mapreduce很多计算过程都要经过硬盘读写等劣势,和spark等计算引擎相比,无论是计算速度,还是计算灵活度上都有很多劣势,这也导致了hive on mapreduce计算速度并不是令人很满意。本篇来讲下hive on spark,将hive的计算引擎替换为spark,速度将有很大的提升.

一、环境准备

centos6.5
hadoop2.6集群,需要hdfs、yarn
hive2.0.0
spark1.5源码
maven3.5(自行安装)
jdk1.8(自行安装)
scala2.10(自行安装)


二、maven编译spark,在官网下载spark1.5源码,在源码根目录下运行

./make-distribution.sh --name "hadoop2-without-hive" --tgz "-Pyarn,hadoop-provided,hadoop-2.6,parquet-provided"

生成spark-1.5.0-bin-hadoop2-without-hive.tgz

三、安装hadoop2.6集群
1、免密登陆并修改主机名
ssh-keygen -t rsa
ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop

2、解压
3、配置环境变量
export JAVA_HOME=/usr/local/jdk1.8.0_121/
export JAVA_BIN=$JAVA_HOME/bin
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

HADOOP_HOME=/home/hadoop/apps/hadoop-2.6.0-cdh5.5.2
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME HADOOP_CONF_DIR PATH

source  .bash_profile 

4、修改core-site.xml
 vi core-site.xml 

<configuration>
<property>
   <name>fs.defaultFS</name>
   <value>hdfs://weisc:9000</value>
   <description>NameNode URI.</description>
 </property>

 <property>
   <name>io.file.buffer.size</name>
   <value>131072</value>
   <description>Size of read/write buffer used inSequenceFiles.</description>
 </property>
</configuration>

5、
编辑hdfs-site.xml
[hadoop@h201 hadoop-2.6.0]$ mkdir -p dfs/name
[hadoop@h201 hadoop-2.6.0]$ mkdir -p dfs/data
[hadoop@h201 hadoop-2.6.0]$ mkdir -p dfs/namesecondary

[hadoop@h201 hadoop]$ vi hdfs-site.xml

 <property>
   <name>dfs.namenode.secondary.http-address</name>
   <value>weisc:50090</value>
   <description>The secondary namenode http server address andport.</description>
 </property>

 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:///home/hadoop/apps/hadoop-2.6.0-cdh5.5.2/dfs/name</value>
   <description>Path on the local filesystem where the NameNodestores the namespace and transactions logs persistently.</description>
 </property>

 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:///home/hadoop/apps/hadoop-2.6.0-cdh5.5.2/dfs/data</value>
   <description>Comma separated list of paths on the local filesystemof a DataNode where it should store its blocks.</description>
 </property>

 <property>
   <name>dfs.namenode.checkpoint.dir</name>
   <value>file:///home/hadoop/apps/hadoop-2.6.0-cdh5.5.2/dfs/namesecondary</value>
   <description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description>
 </property>

<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>


6、
编辑mapred-site.xml

[hadoop@h201 hadoop]$ cp mapred-site.xml.template mapred-site.xml

<property>
   <name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Theruntime framework for executing MapReduce jobs. Can be one of local, classic oryarn.</description>
  </property>

  <property>
   <name>mapreduce.jobhistory.address</name>
    <value>weisc:10020</value>
    <description>MapReduce JobHistoryServer IPC host:port</description>
  </property>

  <property>
   <name>mapreduce.jobhistory.webapp.address</name>
    <value>weisc:19888</value>
    <description>MapReduce JobHistoryServer Web UI host:port</description>
  </property>

*****
属性”mapreduce.framework.name“表示执行mapreduce任务所使用的运行框架,默认为local,需要将其改为”yarn”
*****
7、
 编辑yarn-site.xml
[hadoop@h201 hadoop]$ vi yarn-site.xml

<property>
   <name>yarn.resourcemanager.hostname</name>
  <value>weisc</value>
  <description>The hostname of theRM.</description>
</property>

 <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
   <description>Shuffle service that needs to be set for Map Reduceapplications.</description>
 </property>

8、
[hadoop@h201 hadoop]$ vi hadoop-env.sh 
export JAVA_HOME=/usr/jdk1.7.0_25
9、
[hadoop@h201 hadoop]$ vi slaves 
h202
h203
10、格式化
bin/hadoop namenode -format


四、安装mysql-server
yum -y install mysql-server
mysql
create user 'hive' identified by 'hive';
必须设置远程可登陆
grant all privileges on *.* to hive@'%' identified by 'hive' with grant option;
flush privileges;
create database hive;

五、安装hive2.0

1、修改conf/hive-site.xml

<property>
   <name>javax.jdo.option.ConnectionURL</name>
   <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
   <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
  <description>username to use against metastore database</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hive</value>
  <description>password to use against metastore database</description>
</property>

<property>
    <name>hive.exec.scratchdir</name>
    <value>/tmp/hive</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>

  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/tmp/hive/local</value>
    <description>Local scratch space for Hive jobs</description>
  </property>

  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/tmp/hive/resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
2、初始化数据库bin/schematool -initSchema -dbType mysql



六、安装scala2.10.1

tar -zxvf scala-2.10.1.tgz
vi .bash_profile 修改环境变量
export SCALA_HOME=/apps/scala-2.10.1
PATH=$HADOOP_HOME/bin:$PATH:$SCALA_HOME/bin

查看scala -version


七、安装spark后 bin/run-example org.apache.spark.examples.SparkPi
 
 
 修改hive-site.xml
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>hive.enable.spark.execution.engine</name>
<value>true</value>
</property>
<property>
<name>spark.home</name>
<value>/apps/spark</value>
</property>
<!--sparkcontext -->
<property>
<name>spark.master</name>
<value>yarn-cluster</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
<property>
<name>spark.executor.memeory</name>
<value>2g</value>
</property>
<property>
<name>spark.driver.memeory</name>
<value>1g</value>
</property>
<property>
<name>spark.executor.cores</name>
<value>2</value>
</property>
<property>
<name>spark.executor.instances</name>
<value>4</value>
</property>
<property>
<name>spark.app.name</name>
<value>myInceptor</value>
</property>
    <!--事务相关 -->
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.enforce.bucketing</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>1</value>
</property>
<property>
<name>spark.executor.extraJavaOptions</name>
<value>-XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>

八、运行hive
select count(*) from test; 



  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值