原文地址:http://blog.csdn.net/korder/article/details/47422025
本程序运行环境:Spark+HDFS+HBase+Yarn
Hadoop(HDFS+Yarn)集群搭建,参考:http://blog.csdn.net/korder/article/details/46909253
Spark on Yarn,参考:http://blog.csdn.net/korder/article/details/47422345
HBase集群搭建,参考:http://blog.csdn.net/korder/article/details/47423247
hbase表结构为:表名table,列族fam,列为col。
第一步:上代码
object inputHbase:
<code class="hljs scala has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.apache.hadoop.hbase.client._ <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.apache.hadoop.hbase.util.Bytes <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.apache.spark.{SparkContext, SparkConf} <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.apache.hadoop.hbase._ <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** * Created by Chensy on 15-8-10. */</span> <span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">object</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">inputHbase</span> {</span> <span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/** * hbase table:table col-family:fam col:col */</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> main(args: Array[String]) { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> conf = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> SparkConf() <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> sc = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> SparkContext(conf) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> readFile = sc.textFile(args(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>)).map(x => x.split(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">","</span>)) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> tableName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"table"</span> readFile.foreachPartition{ x=> { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> myConf = HBaseConfiguration.create() myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.zookeeper.quorum"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"172.23.27.45,172.23.27.46,172.23.27.47"</span>) myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.zookeeper.property.clientPort"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"2181"</span>) myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.defaults.for.version.skip"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"true"</span>) myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.master"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"172.23.27.39:60000"</span>) myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.cluster.distributed"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"true"</span>) myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.rootdir"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hdfs://cdh5-test/hbase"</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> myTable = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> HTable(myConf,TableName.valueOf(tableName)) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//将自动提交关闭,如果不关闭,每写一条数据都会进行提交,是导入数据较慢的做主要因素。</span> myTable.setAutoFlush(<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">false</span>,<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">false</span>) <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//设置缓存大小,当缓存大于设置值时,hbase会自动提交。此处可自己尝试大小,一般对大数据量,设置为5M即可。</span> myTable.setWriteBufferSize(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>*<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1024</span>*<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1024</span>) x.foreach{ y=> { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> p = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> Put(Bytes.toBytes(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"row"</span>+y(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>))) p.add(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"fam"</span>.getBytes,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"col"</span>.getBytes,Bytes.toBytes(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"value"</span>+y(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>))) myTable.put(p) } } <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//每一个分片结束后都进行flushCommits(),如果不执行,当hbase最后缓存小于上面设定值时,不会进行提交,导致数据丢失。</span> myTable.flushCommits() } } System.exit(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>) } } </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li></ul>
第二步:打包,并传至HDFS
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">idea打包就不说了,inputHbase<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.jar</span> hadoop fs -put inputHbase<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.jar</span> /xxx/spark/streaming </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>
第三步:添加相关jars
建个公共库,把需要用到的jar包存放一起,方便添加
第四步:编写执行脚本:submit-yarn-inputHbase.sh
<code class="hljs haml has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">[root@JXQ-23-27-38 streaming]# vim submit-yarn-inputHbase.sh cd $SPARK_HOME #pwd ./bin/spark-submit --name inputHbase \ -<span class="ruby" style="box-sizing: border-box;">-<span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">com</span>.<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">wylog</span>.<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">hbase</span>.<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">inputHbase</span> \</span> </span> -<span class="ruby" style="box-sizing: border-box;">-master yarn-cluster \ </span> -<span class="ruby" style="box-sizing: border-box;">-num-executors <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span> \ </span> -<span class="ruby" style="box-sizing: border-box;">-executor-memory <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>g \ </span> -<span class="ruby" style="box-sizing: border-box;">-executor-cores <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span> \ </span> -<span class="ruby" style="box-sizing: border-box;">-driver-memory <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>g \ </span> -<span class="ruby" style="box-sizing: border-box;">-driver-cores <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span> \ </span> -<span class="ruby" style="box-sizing: border-box;">-jars /root/spark/streaming/public_lib/hbase-client-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">98.6</span>-cdh5.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3.2</span>.jar, </span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">/root/spark/streaming/public_lib/hbase-server-0.98.6-cdh5.3.2.jar,</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">/root/spark/streaming/public_lib/hbase-protocol-0.98.6-cdh5.3.2.jar,</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">/root/spark/streaming/public_lib/htrace-core-2.04.jar \</span> hdfs://cdh5-test/xxx/spark/streaming/inputHbase.jar \ hdfs://cdh5-test/data/notify-server/172.17.88.88/notify-server2_detail.log.*</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li></ul>