Hive-on-Spark

<div id="article_content" class="article_content clearfix">
                                                <div class="article-copyright">
                                                    <svg class="icon" title="CSDN认证原创" aria-hidden="true" style="width:53px; height: 18px; vertical-align: -4px;">
                                <use xlink:href="#CSDN_Cert"></use>
                            </svg>
                        
                        版权声明:本文为博主原创文章,未经博主允许不得转载。                        <a class="copy-right-url" href=" https://blog.csdn.net/zuochang_liu/article/details/82292076"> https://blog.csdn.net/zuochang_liu/article/details/82292076</a>
                    </div>
                                                    <link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/ck_htmledit_views-3019150162.css">
                                        <link rel="stylesheet" href="https://csdnimg.cn/release/phoenix/template/css/ck_htmledit_views-3019150162.css">
                <div class="htmledit_views" id="content_views">
                                            <h1><a name="t0"></a>1 HiveOnSpark简介</h1>

<p style="margin-left:0pt;">Hive On Spark (跟hive没太大的关系,就是使用了hive的标准(HQL, 元数据库、UDF、序列化、反序列化机制))</p>

<p style="margin-left:0pt;">Hive原来的计算模型是MR,有点慢(将中间结果写入到HDFS中)</p>

<p style="margin-left:0pt;">Hive On Spark 使用RDD(DataFrame),然后运行在spark 集群上</p>

<p style="margin-left:0pt;">真正要计算的数据是保存在HDFS中,mysql这个元数据库,保存的是hive表的描述信息,描述了有哪些database、table、以及表有多少列,每一列是什么类型,还要描述表的数据保存在hdfs的什么位置?</p>

<p style="margin-left:0pt;">&nbsp;</p>

<p style="margin-left:0pt;">hive跟mysql的区别?</p>

<p style="margin-left:0pt;">hive是一个数据仓库(存储数据并分析数据,分析数据仓库中的数据量很大,一般要分析很长的时间)</p>

<p style="margin-left:0pt;">mysql是一个关系型数据库(关系型数据的增删改查(低延迟))</p>

<p style="margin-left:0pt;">&nbsp;</p>

<p style="margin-left:0pt;">hive的元数据库中保存要计算的数据吗?</p>

<p style="margin-left:0pt;">不保存,保存hive仓库的表、字段、等描述信息</p>

<p style="margin-left:0pt;">&nbsp;</p>

<p style="margin-left:0pt;">真正要计算的数据保存在哪里了?</p>

<p style="margin-left:0pt;">保存在HDFS中了</p>

<p style="margin-left:0pt;">&nbsp;</p>

<p style="margin-left:0pt;">hive的元数据库的功能</p>

<p style="margin-left:0pt;">建立了一种映射关系,执行HQL时,先到MySQL元数据库中查找描述信息,然后根据描述信息生成任务,然后将任务下发到spark集群中执行</p>

<p style="margin-left:0pt;">&nbsp;</p>

<p style="margin-left:0pt;">hive&nbsp;&nbsp;on spark &nbsp;使用的仅仅是hive的标准,规范,不需要有hive数据库一样可行。</p>

<p style="margin-left:0pt;">hive&nbsp;: 元数据,是存放在mysql中,然后真正的数据是存放在hdfs中。</p>

<h1 style="margin-left:0pt;"><a name="t1"></a>2 安装mysql</h1>

<p>mysql数据库作为hive使用的元数据</p>

<h1><a name="t2"></a>3 配置HiveOnSpark</h1>

<p style="margin-left:0pt;">生成hive的元数据库表,根据hive的配置文件,生成对应的元数据库表。</p>

<p style="margin-left:0pt;">&nbsp;</p>

<p style="margin-left:0pt;">spark-sql 是spark专门用于编写sql的交互式命令行。</p>

<p style="margin-left:0pt;">当直接启动spark-sql以local模式运行时,如果报错:</p>

<p style="margin-left:0pt;"><img alt="" class="has" height="77" src="https://img-blog.csdn.net/20180901235524809?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3p1b2NoYW5nX2xpdQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70" width="813"></p>

<p style="margin-left:0pt;">是因为配置了Hadoop的配置参数导致的:</p>

<p style="margin-left:0pt;"><img alt="" class="has" height="204" src="https://img-blog.csdn.net/20180901235541736?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3p1b2NoYW5nX2xpdQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70" width="813"></p>

<p style="margin-left:0pt;">执行测试命令:</p>

<p style="margin-left:0pt;">create table test (name string);</p>

<p style="margin-left:0pt;">insert into test values(“xxtest”);</p>

<p style="margin-left:0pt;"><img alt="" class="has" height="47" src="https://img-blog.csdn.net/20180901235849375?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3p1b2NoYW5nX2xpdQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70" width="813"></p>

<p style="margin-left:0pt;">local模式下,默认使用derby数据库,数据存储于本地位置。</p>

<p style="margin-left:0pt;"><strong><span style="color:#ff0000;"><strong>要想使用hive的标准,需要把hive的配置文件放到spark的conf目录下</strong></span></strong></p>

<p style="margin-left:0pt;">cd /root/apps/spark-2.2.0-bin-hadoop2.7/conf/</p>

<p style="margin-left:0pt;">vi hive-site.xml</p>

<p style="margin-left:0pt;">&nbsp;</p>

<p style="margin-left:0pt;">hive-site.xml文件:</p>

<pre class="has" name="code"><code class="hljs xml"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-tag">&lt;<span class="hljs-name">configuration</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-tag">&lt;<span class="hljs-name">property</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">name</span>&gt;</span>javax.jdo.option.ConnectionURL<span class="hljs-tag">&lt;/<span class="hljs-name">name</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">value</span>&gt;</span>jdbc:mysql://hdp-01:3306/hive?createDatabaseIfNotExist=true<span class="hljs-tag">&lt;/<span class="hljs-name">value</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">description</span>&gt;</span>JDBC connect string for a JDBC metastore<span class="hljs-tag">&lt;/<span class="hljs-name">description</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-tag">&lt;/<span class="hljs-name">property</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-tag">&lt;<span class="hljs-name">property</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="9"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">name</span>&gt;</span>javax.jdo.option.ConnectionDriverName<span class="hljs-tag">&lt;/<span class="hljs-name">name</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="10"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">value</span>&gt;</span>com.mysql.jdbc.Driver<span class="hljs-tag">&lt;/<span class="hljs-name">value</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="11"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">description</span>&gt;</span>Driver class name for a JDBC metastore<span class="hljs-tag">&lt;/<span class="hljs-name">description</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="12"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-tag">&lt;/<span class="hljs-name">property</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="13"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="14"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-tag">&lt;<span class="hljs-name">property</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="15"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">name</span>&gt;</span>javax.jdo.option.ConnectionUserName<span class="hljs-tag">&lt;/<span class="hljs-name">name</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="16"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">value</span>&gt;</span>root<span class="hljs-tag">&lt;/<span class="hljs-name">value</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="17"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">description</span>&gt;</span>username to use against metastore database<span class="hljs-tag">&lt;/<span class="hljs-name">description</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="18"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-tag">&lt;/<span class="hljs-name">property</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="19"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"> </div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="20"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-tag">&lt;<span class="hljs-name">property</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="21"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">name</span>&gt;</span>javax.jdo.option.ConnectionPassword<span class="hljs-tag">&lt;/<span class="hljs-name">name</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="22"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">value</span>&gt;</span>123456<span class="hljs-tag">&lt;/<span class="hljs-name">value</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="23"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">        <span class="hljs-tag">&lt;<span class="hljs-name">description</span>&gt;</span>password to use against metastore database<span class="hljs-tag">&lt;/<span class="hljs-name">description</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="24"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-tag">&lt;/<span class="hljs-name">property</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="25"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-tag">&lt;/<span class="hljs-name">configuration</span>&gt;</span></div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" οnclick="hljs.copyCode(event)"></div></pre>

<p style="margin-left:0pt;">把该配置文件,发送给集群中的其他节点:</p>

<blockquote>
<p style="margin-left:0pt;">cd /root/apps/spark-2.2.0-bin-hadoop2.7/conf/</p>

<p style="margin-left:0pt;">for i in 2 3 ;do scp hive-site.xml hdp-0$i:`pwd` ;done</p>
</blockquote>

<p style="margin-left:0pt;">重新停止并重启spark: start-all.sh</p>

<p style="margin-left:0pt;">启动spark-sql时,</p>

<p style="margin-left:0pt;">出现如下错误是因为操作mysql时缺少mysql的驱动jar包,</p>

<p style="margin-left:0pt;">解决方案1:--jars&nbsp;或者 --driver-class-path &nbsp;引入msyql的jar包</p>

<p style="margin-left:0pt;">解决方案2: 把mysql的jar包添加到$spark_home/jars目录下</p>

<p style="margin-left:0pt;"><img alt="" class="has" height="83" src="https://img-blog.csdn.net/20180902000211826?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3p1b2NoYW5nX2xpdQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70" width="813"></p>

<p style="margin-left:0pt;">启动时指定集群:(如果不指定master,默认就是local模式)</p>

<p style="margin-left:0pt;">spark-sql --master spark://hdp-01:7077 &nbsp;--jars /root/mysql-connector-java-5.1.38.jar</p>

<p style="margin-left:0pt;">sparkSQL会在mysql上创建一个database,需要手动改一下DBS表中的DB_LOCATION_UIR改成hdfs的地址</p>

<p style="margin-left:0pt;"><img alt="" class="has" height="170" src="https://img-blog.csdn.net/20180902000504421?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3p1b2NoYW5nX2xpdQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70" width="813"></p>

<p style="margin-left:0pt;">hdfs://hdp-01:9000/user/hive/spark-warehouse</p>

<p style="margin-left:0pt;">&nbsp;</p>

<p style="margin-left:0pt;">也需要查看一下,自己创建的数据库表的存储路径是否是hdfs的目录。</p>

<p style="margin-left:0pt;"><img alt="" class="has" height="122" src="https://img-blog.csdn.net/20180902000652708?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3p1b2NoYW5nX2xpdQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70" width="813"></p>

<p style="margin-left:0pt;">执行spark-sql任务之后:可以在集群的监控界面查看</p>

<p style="margin-left:0pt;"><img alt="" class="has" height="369" src="https://img-blog.csdn.net/2018090200071394?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3p1b2NoYW5nX2xpdQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70" width="813"></p>

<p style="margin-left:0pt;">同样 ,会有SparkSubmit进程存在。</p>

<p style="margin-left:0pt;">&nbsp;</p>

<h1 style="margin-left:0pt;"><a name="t3"></a>4 IDEA编程</h1>

<p style="margin-left:0pt;"><strong><strong>要先开启spark对hive的支持</strong></strong></p>

<pre class="has" name="code"><code class="hljs kotlin"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">//如果想让hive运行在spark上,一定要开启spark对hive的支持</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-keyword">val</span> session = SparkSession.builder()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">  .master(<span class="hljs-string">"local"</span>)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">  .appName(<span class="hljs-string">"xx"</span>)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">  .enableHiveSupport() <span class="hljs-comment">// 启动对hive的支持, 还需添加支持jar包</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">  .getOrCreate()</div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" οnclick="hljs.copyCode(event)"></div></pre>

<p style="margin-left:0pt;">要添加spark对hive的兼容jar包</p>

<pre class="has" name="code"><code class="hljs xml"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">&lt;!--sparksql对hive的支持--&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.apache.spark<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spark-hive_2.11<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>${spark.version}<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span></div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" οnclick="hljs.copyCode(event)"></div></pre>

<p style="margin-left:0pt;">在本地运行,还需把hive-site.xml文件拷贝到resource目录下。</p>

<p style="margin-left:0pt;">resources目录,存放着当前项目的配置文件</p>

<p style="margin-left:0pt;"><img alt="" class="has" height="209" src="https://img-blog.csdn.net/2018090200102849?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3p1b2NoYW5nX2xpdQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70" width="320"></p>

<p style="margin-left:0pt;">编写代码,local模式下测试:</p>

<pre class="has" name="code"><code class="hljs go"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">// 执行查询</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">val query = session.sql(<span class="hljs-string">"select * from t_access_times"</span>)</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">query.show()</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">// 释放资源</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">session.<span class="hljs-built_in">close</span>()</div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" οnclick="hljs.copyCode(event)"></div></pre>

<p style="margin-left:0pt;">创建表的时候,需要伪装客户端身份</p>

<pre class="has" name="code"><code class="hljs java"><ol class="hljs-ln"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">System.setProperty(<span class="hljs-string">"HADOOP_USER_NAME"</span>, <span class="hljs-string">"root"</span>) <span class="hljs-comment">// 伪装客户端的用户身份为root</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-comment">//  或者添加运行参数 –DHADOOP_USER_NAME=root</span></div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" οnclick="hljs.copyCode(event)"></div></pre>

<p style="margin-left:0pt;">&nbsp;</p>

<p style="margin-left:0pt;">基本操作</p>

<pre class="has" name="code"><code class="hljs sql"><ol class="hljs-ln" style="width:1285px"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">  // 求每个用户的每月总金额</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">    //    session.sql("<span class="hljs-keyword">select</span> username,<span class="hljs-keyword">month</span>,<span class="hljs-keyword">sum</span>(salary) <span class="hljs-keyword">as</span> salary <span class="hljs-keyword">from</span> t_access_times <span class="hljs-keyword">group</span> <span class="hljs-keyword">by</span> username,<span class="hljs-keyword">month</span><span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // 创建表</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //    session.sql("</span><span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> t_access1(username <span class="hljs-keyword">string</span>,<span class="hljs-keyword">month</span> <span class="hljs-keyword">string</span>,salary <span class="hljs-built_in">int</span>) <span class="hljs-keyword">row</span> <span class="hljs-keyword">format</span> <span class="hljs-keyword">delimited</span> <span class="hljs-keyword">fields</span> <span class="hljs-keyword">terminated</span> <span class="hljs-keyword">by</span> <span class="hljs-string">','</span><span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // 删除表</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //    session.sql("</span><span class="hljs-keyword">drop</span> <span class="hljs-keyword">table</span> t_access1<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="9"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // 插入数据</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="10"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //    session.sql("</span><span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span>  t_access1  <span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> t_access_times<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="11"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //    .show()</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="12"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // 覆盖写数据</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="13"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //    session.sql("</span><span class="hljs-keyword">insert</span> overwrite <span class="hljs-keyword">table</span>  t_access1  <span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> t_access_times <span class="hljs-keyword">where</span> username=<span class="hljs-string">'A'</span><span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="14"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="15"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // 覆盖load新数据</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="16"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //    C,2015-01,10</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="17"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //    C,2015-01,20</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="18"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //    session.sql("</span><span class="hljs-keyword">load</span> <span class="hljs-keyword">data</span> <span class="hljs-keyword">local</span> inpath <span class="hljs-string">'t_access_time_log'</span> overwrite <span class="hljs-keyword">into</span> <span class="hljs-keyword">table</span> t_access1<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="19"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="20"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // 清空数据</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="21"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //    session.sql("</span><span class="hljs-keyword">truncate</span> <span class="hljs-keyword">table</span> t_access1<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="22"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="23"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //      .show()</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="24"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="25"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // 写入自定义数据</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="26"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    val access: Dataset[String] = session.createDataset(List("</span>b,<span class="hljs-number">2015</span><span class="hljs-number">-01</span>,<span class="hljs-number">10</span><span class="hljs-string">", "</span>c,<span class="hljs-number">2015</span><span class="hljs-number">-02</span>,<span class="hljs-number">20</span><span class="hljs-string"><span class="hljs-string">"))</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="27"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="28"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    val accessdf = access.map({</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="29"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">      t =&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="30"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">        val lines = t.split("</span>,<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="31"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">        (lines(0), lines(1), lines(2).toInt)</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="32"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    }).toDF("</span>username<span class="hljs-string">", "</span><span class="hljs-keyword">month</span><span class="hljs-string">", "</span>salary<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="33"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="34"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //    .show()</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="35"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="36"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    accessdf.createTempView("</span>t_ac<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="37"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    //    session.sql("</span><span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> t_access1 <span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> t_ac<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="38"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="39"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // overwrite模式会重新创建新的表 根据指定schema信息   SaveMode.Overwrite</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="40"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // 本地模式只支持 overwrite,必须在sparksession上添加配置参数:</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="41"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">//     .config("</span>spark.sql.warehouse.dir<span class="hljs-string">", "</span>hdfs://hdp<span class="hljs-number">-01</span>:<span class="hljs-number">9000</span>/<span class="hljs-keyword">user</span>/hive/warehouse<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="42"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    accessdf</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="43"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">      .write.mode("</span>overwrite<span class="hljs-string">").saveAsTable("</span>t_access1<span class="hljs-string">")</span></div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" οnclick="hljs.copyCode(event)"></div></pre>

<p style="margin-left:0pt;">&nbsp;</p>

<p style="margin-left:0pt;">集群运行:</p>

<p style="margin-left:0pt;">需要把hive-site.xml配置文件,添加到$SPARK_HOME/conf目录中去,重启spark</p>

<p style="margin-left:0pt;">上传一个mysql连接驱动(sparkSubmit也要连接MySQL,获取元数据信息)</p>

<p style="margin-left:0pt;">spark-sql --master spark://hdp-01:7077 --driver-class-path /root/mysql-connector-java-5.1.38.jar</p>

<p style="margin-left:0pt;">--class &nbsp;&nbsp;xx.jar</p>

<p style="margin-left:0pt;">&nbsp;</p>

<p style="margin-left:0pt;">然后执行代码的编写:</p>

<pre class="has" name="code"><code class="hljs sql"><ol class="hljs-ln" style="width:1249px"><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="1"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">  // 执行查询 hive的数据表</div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="2"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line">//    session.sql("<span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> t_access_times<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="3"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">//      .show()</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="4"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="5"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // 创建表</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="6"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">//    session.sql("</span><span class="hljs-keyword">create</span> <span class="hljs-keyword">table</span> t_access1(username <span class="hljs-keyword">string</span>,<span class="hljs-keyword">month</span> <span class="hljs-keyword">string</span>,salary <span class="hljs-built_in">int</span>) <span class="hljs-keyword">row</span> <span class="hljs-keyword">format</span> <span class="hljs-keyword">delimited</span> <span class="hljs-keyword">fields</span> <span class="hljs-keyword">terminated</span> <span class="hljs-keyword">by</span> <span class="hljs-string">','</span><span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="7"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="8"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="9"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">//      session.sql("</span><span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> t_access1 <span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> t_access_times<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="10"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">//    .show()</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="11"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="12"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // 写数据</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="13"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    val access: Dataset[String] = session.createDataset(List("</span>b,<span class="hljs-number">2015</span><span class="hljs-number">-01</span>,<span class="hljs-number">10</span><span class="hljs-string">", "</span>c,<span class="hljs-number">2015</span><span class="hljs-number">-02</span>,<span class="hljs-number">20</span><span class="hljs-string"><span class="hljs-string">"))</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="14"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="15"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    val accessdf = access.map({</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="16"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">      t =&gt;</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="17"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">        val lines = t.split("</span>,<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="18"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">        (lines(0), lines(1), lines(2).toInt)</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="19"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    }).toDF("</span>username<span class="hljs-string">", "</span><span class="hljs-keyword">month</span><span class="hljs-string">", "</span>salary<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="20"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="21"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="22"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    accessdf.createTempView("</span>v_tmp<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="23"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    // 插入数据</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="24"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">//    session.sql("</span><span class="hljs-keyword">insert</span> overwrite <span class="hljs-keyword">table</span> t_access1 <span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> v_tmp<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="25"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    session.sql("</span><span class="hljs-keyword">insert</span> <span class="hljs-keyword">into</span> t_access1 <span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> v_tmp<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="26"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">//    .show()</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="27"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string"></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="28"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">// insertInto的api  入库</span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="29"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">accessdf.write.insertInto("</span>databaseName.tableName<span class="hljs-string"><span class="hljs-string">")</span></span></div></div></li><li><div class="hljs-ln-numbers"><div class="hljs-ln-line hljs-ln-n" data-line-number="30"></div></div><div class="hljs-ln-code"><div class="hljs-ln-line"><span class="hljs-string">    session.close()</span></div></div></li></ol></code><div class="hljs-button {2}" data-title="复制" οnclick="hljs.copyCode(event)"></div></pre>

<h1 style="margin-left:0pt;"><a name="t4"></a>5 sparksql连接方式</h1>

<h2 style="margin-left:0pt;"><a name="t5"></a>5.1 交互式的命令行</h2>

<p><img alt="" class="has" height="47" src="https://img-blog.csdnimg.cn/20190729232532323.png" width="693"></p>

<p style="margin-left:0pt;">spark-sql &nbsp;本地模式运行</p>

<p style="margin-left:0pt;">spark-sql &nbsp;--master&nbsp;spark://hdp-01:7077 集群模式运行&nbsp;</p>

<p style="margin-left:0pt;">如果没有hive-site.xml文件,spark-sql&nbsp;默认使用的是derby数据库,数据写在执行命令的当前目录(spark-warehouse)。</p>

<p style="margin-left:0pt;"><img alt="" class="has" height="87" src="https://img-blog.csdnimg.cn/20190729232631578.png" width="693"></p>

<p style="margin-left:0pt;">如果有hive-site.xml&nbsp;,才能实现,元数据用mysql管理,数据存储在HDFS中</p>

<p style="margin-left:0pt;">&nbsp;</p>

<h2 style="margin-left:0pt;"><a name="t6"></a>5.2 jdbc的连接方式</h2>

<p>在服务端修改配置文件hive-site.xml</p>

<blockquote>
<p style="margin-left:0pt;">&lt;property&gt;</p>

<p style="margin-left:0pt;">&lt;name&gt;hive.server2.thrift.bind.host&lt;/name&gt;</p>

<p style="margin-left:0pt;">&lt;value&gt;<span style="color:#ff0000;">hdp-03</span>&lt;/value&gt;</p>

<p style="margin-left:0pt;">&lt;description&gt;Bind host on which to run the HiveServer2 Thrift service.&lt;/description&gt;</p>

<p style="margin-left:0pt;">&lt;/property&gt;</p>

<p style="margin-left:0pt;">&lt;property&gt;</p>

<p style="margin-left:0pt;">&lt;name&gt;hive.server2.thrift.port&lt;/name&gt;</p>

<p style="margin-left:0pt;">&lt;value&gt;<span style="color:#ff0000;">10000</span>&lt;/value&gt;</p>

<p style="margin-left:0pt;">&lt;description&gt;Port number of HiveServer2 Thrift interface when hive.server2.transport.mo</p>

<p style="margin-left:0pt;">de is 'binary'.&lt;/description&gt;</p>

<p style="margin-left:0pt;">&lt;/property&gt;</p>
</blockquote>

<p>&nbsp;</p>

<p>启动服务端</p>

<p><img alt="" class="has" height="32" src="https://img-blog.csdnimg.cn/20190729232858663.png" width="693"></p>

<p style="margin-left:0pt;">服务端的进程是SparkSubmit:</p>

<p style="margin-left:0pt;"><img alt="" class="has" height="181" src="https://img-blog.csdnimg.cn/20190729232923760.png" width="609"></p>

<p style="margin-left:0pt;">启动客户端:</p>

<p style="margin-left:0pt;"><img alt="" class="has" height="157" src="https://img-blog.csdnimg.cn/201907292329372.png" width="693"></p>

<p style="margin-left:0pt;">&nbsp;</p>
                                    </div>
                    </div>

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值