sparkstreaming项目

1、项目的流程:

每一个IP对应的名称:

2、需求

实时统计每个品类被点击的次数(用饼状图展示):


3、分析设计项目

新建一个Maven项目:

pom文件:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
<span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>1711categorycount<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>1711categorycount<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>1.0-SNAPSHOT<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">dependencies</span>&gt;</span>

    <span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.apache.hadoop<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>hadoop-client<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>2.7.5<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>

    <span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.apache.spark<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spark-streaming_2.11<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>2.2.0<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>


    <span class="hljs-comment">&lt;!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8_2.11 --&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.apache.spark<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>spark-streaming-kafka-0-8_2.11<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>2.2.0<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>

    <span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>

    <span class="hljs-comment">&lt;!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client --&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.apache.hbase<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>hbase-client<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>0.98.6-hadoop2<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>


    <span class="hljs-comment">&lt;!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-server --&gt;</span>
    <span class="hljs-tag">&lt;<span class="hljs-name">dependency</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">groupId</span>&gt;</span>org.apache.hbase<span class="hljs-tag">&lt;/<span class="hljs-name">groupId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">artifactId</span>&gt;</span>hbase-server<span class="hljs-tag">&lt;/<span class="hljs-name">artifactId</span>&gt;</span>
        <span class="hljs-tag">&lt;<span class="hljs-name">version</span>&gt;</span>0.98.6-hadoop2<span class="hljs-tag">&lt;/<span class="hljs-name">version</span>&gt;</span>
    <span class="hljs-tag">&lt;/<span class="hljs-name">dependency</span>&gt;</span>


<span class="hljs-tag">&lt;/<span class="hljs-name">dependencies</span>&gt;</span>

</project>

4、模拟实时数据

往data.txt文件里面写入数据(Java代码):

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Random;

public class SimulateData {
public static void main(String[] args) {
BufferedWriter bw = null;
try {
bw = new BufferedWriter(new FileWriter(“G:\Scala\实时统计每日的品类的点击次数\data.txt”));

        <span class="hljs-keyword">int</span> i = <span class="hljs-number">0</span>;
        <span class="hljs-keyword">while</span> (i &lt; <span class="hljs-number">20000</span>){
            <span class="hljs-keyword">long</span> time = System.currentTimeMillis();

            <span class="hljs-keyword">int</span> categoryid = <span class="hljs-keyword">new</span> Random().nextInt(<span class="hljs-number">23</span>);
            bw.write(<span class="hljs-string">"ver=1&amp;en=e_pv&amp;pl=<a href="https://www.baidu.com/s?wd=website&amp;tn=24004469_oem_dg&amp;rsv_dl=gh_pl_sl_csd" target="_blank">website</a>&amp;sdk=js&amp;b_rst=1920*1080&amp;u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&amp;l=zh-CN&amp;u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&amp;c_time="</span>+time+<span class="hljs-string">"&amp;p_url=http://list.iqiyi.com/www/"</span>+categoryid+<span class="hljs-string">"/---.html"</span>);
            bw.newLine();
            i++;
        }
    } <span class="hljs-keyword">catch</span> (IOException e) {
        e.printStackTrace();
    }<span class="hljs-keyword">finally</span> {
        <span class="hljs-keyword">try</span> {
            bw.close();
        } <span class="hljs-keyword">catch</span> (IOException e) {
            e.printStackTrace();
        }
    }


}

}

data.txt文件部分结果:

ver=1&en=e_pv&pl=website&sdk=js&b_rst=19201080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174569&p_url=http://list.iqiyi.com/www/9/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=1920
1080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/4/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=19201080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/10/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=1920
1080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/4/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=19201080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/1/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=1920
1080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/13/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=19201080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/8/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=1920
1080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/3/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=19201080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/17/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=1920
1080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/6/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=19201080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/22/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=1920
1080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/14/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=19201080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/3/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=1920
1080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/10/—.html
ver=1&en=e_pv&pl=website&sdk=js&b_rst=1920*1080&u_ud=12GH4079-223E-4A57-AC60-C1A04D8F7A2F&l=zh-CN&u_sd=8E9559B3-DA35-44E1-AC98-85EB37D1F263&c_time=1526975174570&p_url=http://list.iqiyi.com/www/14/—.html

把data.txt数据放入Linux系统

模拟数据实时读取数据:


模拟数据实时的写入data.log:


实时读取data.log里面的数据:


5、配置kafka,Flume集群

Kafka学习(四)Kafka的安装

Flume学习(三)Flume的配置方式

6、flume发送数据到kafka

从data.log文件中读取实时数据到kafka:

第一步:配置Flume文件:(file2kafka.properties)

a1.sources = r1
a1.sinks = k1
a1.channels =c1

a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/data.log

a1.channel.c1 = memory

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = aura
a1.sinks.k1.brokerList = hodoop02:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 5

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动Flume的时候,需要一直实时的往data.log文件中写入数据。也就是说那个写入的脚本文件需要一直启动着。

第二步:需要一直启动着:

[hadoop@hadoop02 ~]$ cat data.txt | while read line
> do
> echo "KaTeX parse error: Expected 'EOF', got '&' at position 14: line"</span> &̲gt;&gt; data.lo… bin/kafka-console-consumer.sh --zookeeper hadoop02:2181from-beginning --topic aura

第四步:启动Flume命令:Flume官网

[hadoop@hadoop02 apache-flume-1.8.0-bin]$ bin/flume-ng agent --conf conf --conf-file /home/hadoop/apps/apache-flume-1.8.0-bin/flumetest/file2kafka.properties –name a1 -Dflume.root.logger=INFO,console 


代码实现从kafka(2.11-1.0.0)读取数据(java):
package Category;

import kafka.serializer.StringDecoder;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;

import org.apache.spark.streaming.kafka010.ConsumerStrategies;
import org.apache.spark.streaming.kafka010.KafkaUtils;
import org.apache.spark.streaming.kafka010.LocationStrategies;
import scala.Tuple2;

import java.util.*;

public class CategoryRealCount {
public static void main(String[] args) {

    <span class="hljs-comment">//初始化程序入口</span>
    <span class="hljs-type">SparkConf</span> conf = new <span class="hljs-type">SparkConf</span>();
    conf.setMaster(<span class="hljs-string">"local"</span>);
    conf.setAppName(<span class="hljs-string">"CategoryRealCount"</span>);

    <span class="hljs-type">JavaStreamingContext</span> ssc = new <span class="hljs-type">JavaStreamingContext</span>(conf,<span class="hljs-type">Durations</span>.seconds(<span class="hljs-number">3</span>));



    <span class="hljs-comment">//读取数据</span>
    <span class="hljs-comment">/*HashMap&lt;String, String&gt; kafkaParams = new HashMap&lt;&gt;();
    kafkaParams.put("metadata.broker.list","hadoop02:9092,hadoop03:9092,hadoop04:9092");*/</span>
    <span class="hljs-type">Map</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">Object</span>&gt; kafkaParams = new <span class="hljs-type">HashMap</span>&lt;&gt;();
    kafkaParams.put(<span class="hljs-string">"bootstrap.servers"</span>, <span class="hljs-string">"192.168.123.102:9092,192.168.123.103:9092"</span>);
    kafkaParams.put(<span class="hljs-string">"key.deserializer"</span>, <span class="hljs-type">StringDeserializer</span>.<span class="hljs-keyword">class</span>);
    kafkaParams.put(<span class="hljs-string">"value.deserializer"</span>, <span class="hljs-type">StringDeserializer</span>.<span class="hljs-keyword">class</span>);
    kafkaParams.put(<span class="hljs-string">"group.id"</span>, <span class="hljs-string">"use_a_separate_group_id_for_each_stream"</span>);
    kafkaParams.put(<span class="hljs-string">"auto.offset.reset"</span>, <span class="hljs-string">"latest"</span>);
    kafkaParams.put(<span class="hljs-string">"enable.auto.commit"</span>, <span class="hljs-literal">false</span>);
    <span class="hljs-comment">/*HashSet&lt;String&gt; topics = new HashSet&lt;&gt;();
    topics.add("aura");*/</span>
    <span class="hljs-type">Collection</span>&lt;<span class="hljs-type">String</span>&gt; topics = <span class="hljs-type">Arrays</span>.asList(<span class="hljs-string">"aura"</span>);

    <span class="hljs-type">JavaDStream</span>&lt;<span class="hljs-type">String</span>&gt; logDStream = <span class="hljs-type">KafkaUtils</span>.createDirectStream(
            ssc,
            <span class="hljs-type">LocationStrategies</span>.<span class="hljs-type">PreferConsistent</span>(),
            <span class="hljs-type">ConsumerStrategies</span>.&lt;<span class="hljs-type">String</span>, <span class="hljs-type">String</span>&gt;<span class="hljs-type">Subscribe</span>(topics, kafkaParams)
    ).<span class="hljs-built_in">map</span>(new <span class="hljs-type">Function</span>&lt;<span class="hljs-type">ConsumerRecord</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">String</span>&gt;, <span class="hljs-type">String</span>&gt;() {
        @<span class="hljs-type">Override</span>
        <span class="hljs-keyword">public</span> <span class="hljs-type">String</span> call(<span class="hljs-type">ConsumerRecord</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">String</span>&gt; stringStringConsumerRecord) <span class="hljs-keyword">throws</span> <span class="hljs-type">Exception</span> {
            <span class="hljs-keyword">return</span> stringStringConsumerRecord.value();
        }
    });
    logDStream.<span class="hljs-built_in">print</span>();
   <span class="hljs-comment">/* JavaDStream&lt;String&gt; logDStream;
    logDStream = KafkaUtils.createDirectStream(
            ssc,
            String.class,
            String.class,
            StringDecoder.class,
            topics,
            StringDecoder.class,
            kafkaParams
    ).map(new Function&lt;Tuple2&lt;String, String&gt;, String&gt;() {
        @Override
        public String call(Tuple2&lt;String, String&gt; tuple2) throws Exception {
            return tuple2._2;
        }
    });*/</span>


    <span class="hljs-comment">//代码的逻辑</span>
    <span class="hljs-comment">//启动应用程序</span>
    ssc.start();

    <span class="hljs-keyword">try</span> {
        ssc.awaitTermination();
    } <span class="hljs-keyword">catch</span> (<span class="hljs-type">InterruptedException</span> e) {
        e.printStackTrace();
    }
    ssc.stop();


}

}

7、品类实时统计

实时统计每日的品类的点击次数,存储到HBase(HBase表示如何设计的,rowkey是怎样设计)

rowkey的设计是:时间+name

例:2018.05.22_电影。这样做为rowkey。

创建一个HBase表:(java代码)
package habase;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;

import java.io.IOException;

public class CreatTableTest {
public static void main(String[] args) {

    <span class="hljs-comment">//设置Hbase数据库的连接配置参数</span>
    <span class="hljs-type">Configuration</span> conf = <span class="hljs-type">HBaseConfiguration</span>.create();
    conf.<span class="hljs-keyword">set</span>(<span class="hljs-string">"hbase.zookeeper.quorum"</span>,<span class="hljs-string">"192.168.123.102"</span>);
    conf.<span class="hljs-keyword">set</span>(<span class="hljs-string">"hbase.zookeeper.property.clientPort"</span>,<span class="hljs-string">"2181"</span>);
    <span class="hljs-type">String</span> tablename = <span class="hljs-string">"aura"</span>;
    <span class="hljs-type">String</span>[] famliy = {<span class="hljs-string">"f"</span>};
    <span class="hljs-keyword">try</span> {
        <span class="hljs-type">HBaseAdmin</span> hBaseAdmin = new <span class="hljs-type">HBaseAdmin</span>(conf);
        <span class="hljs-comment">//创建表对象</span>
        <span class="hljs-type">HTableDescriptor</span> tableDescriptor = new <span class="hljs-type">HTableDescriptor</span>(tablename);
        <span class="hljs-keyword">for</span> (int i = <span class="hljs-number">0</span>;i &lt; famliy.length;i++){
            <span class="hljs-comment">//设置表字段</span>
            tableDescriptor.addFamily(new <span class="hljs-type">HColumnDescriptor</span>(famliy[i]));
        }
        <span class="hljs-comment">//判断表是否存在,不存在则创建,存在则打印提示信息</span>
        <span class="hljs-keyword">if</span> (hBaseAdmin.tableExists(tablename)){
            <span class="hljs-type">System</span>.out.<span class="hljs-built_in">println</span>(<span class="hljs-string">"表存在"</span>);
            <span class="hljs-type">System</span>.exit(<span class="hljs-number">0</span>);
        }<span class="hljs-keyword">else</span> {
            hBaseAdmin.createTable(tableDescriptor);
            <span class="hljs-type">System</span>.out.<span class="hljs-built_in">println</span>(<span class="hljs-string">"创建表成功"</span>);
        }
    } <span class="hljs-keyword">catch</span> (<span class="hljs-type">IOException</span> e) {
        e.printStackTrace();
    }
}

}

读取Hbase数据的代码主程序:(java代码)
package Catefory1.Category;

import dao.HBaseDao;
import dao.factory.HBaseFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapred.TableOutputFormat;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.mapred.JobConf;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.Optional;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.Time;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.kafka010.ConsumerStrategies;
import org.apache.spark.streaming.kafka010.KafkaUtils;
import org.apache.spark.streaming.kafka010.LocationStrategies;
import scala.Tuple2;
import utils.DateUtils;
import utils.Utils;

import java.util.*;

public class CategoryRealCount11 {
public static String ck = “G:\Scala\spark1711\day25-项目实时统计\资料\新建文件夹”;

<span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> void main(<span class="hljs-type">String</span>[] args) {

    <span class="hljs-comment">//初始化程序入口</span>
    <span class="hljs-type">SparkConf</span> conf = new <span class="hljs-type">SparkConf</span>();
    conf.setMaster(<span class="hljs-string">"local"</span>);
    conf.setAppName(<span class="hljs-string">"CategoryRealCount"</span>);

    <span class="hljs-type">JavaStreamingContext</span> ssc = new <span class="hljs-type">JavaStreamingContext</span>(conf,<span class="hljs-type">Durations</span>.seconds(<span class="hljs-number">3</span>));
    ssc.checkpoint(ck);



    <span class="hljs-comment">//读取数据</span>
    <span class="hljs-comment">/*HashMap&lt;String, String&gt; kafkaParams = new HashMap&lt;&gt;();
    kafkaParams.put("metadata.broker.list","hadoop02:9092,hadoop03:9092,hadoop04:9092");*/</span>
    <span class="hljs-type">Map</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">Object</span>&gt; kafkaParams = new <span class="hljs-type">HashMap</span>&lt;&gt;();
    kafkaParams.put(<span class="hljs-string">"bootstrap.servers"</span>, <span class="hljs-string">"192.168.123.102:9092,192.168.123.103:9092"</span>);
    kafkaParams.put(<span class="hljs-string">"key.deserializer"</span>, <span class="hljs-type">StringDeserializer</span>.<span class="hljs-keyword">class</span>);
    kafkaParams.put(<span class="hljs-string">"value.deserializer"</span>, <span class="hljs-type">StringDeserializer</span>.<span class="hljs-keyword">class</span>);
    kafkaParams.put(<span class="hljs-string">"group.id"</span>, <span class="hljs-string">"use_a_separate_group_id_for_each_stream"</span>);
    kafkaParams.put(<span class="hljs-string">"auto.offset.reset"</span>, <span class="hljs-string">"latest"</span>);
    kafkaParams.put(<span class="hljs-string">"enable.auto.commit"</span>, <span class="hljs-literal">false</span>);
    <span class="hljs-comment">/*HashSet&lt;String&gt; topics = new HashSet&lt;&gt;();
    topics.add("aura");*/</span>
    <span class="hljs-type">Collection</span>&lt;<span class="hljs-type">String</span>&gt; topics = <span class="hljs-type">Arrays</span>.asList(<span class="hljs-string">"aura"</span>);

    <span class="hljs-type">JavaDStream</span>&lt;<span class="hljs-type">String</span>&gt; logDStream = <span class="hljs-type">KafkaUtils</span>.createDirectStream(
            ssc,
            <span class="hljs-type">LocationStrategies</span>.<span class="hljs-type">PreferConsistent</span>(),
            <span class="hljs-type">ConsumerStrategies</span>.&lt;<span class="hljs-type">String</span>, <span class="hljs-type">String</span>&gt;<span class="hljs-type">Subscribe</span>(topics, kafkaParams)
    ).<span class="hljs-built_in">map</span>(new <span class="hljs-type">Function</span>&lt;<span class="hljs-type">ConsumerRecord</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">String</span>&gt;, <span class="hljs-type">String</span>&gt;() {
        @<span class="hljs-type">Override</span>
        <span class="hljs-keyword">public</span> <span class="hljs-type">String</span> call(<span class="hljs-type">ConsumerRecord</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">String</span>&gt; stringStringConsumerRecord) <span class="hljs-keyword">throws</span> <span class="hljs-type">Exception</span> {
            <span class="hljs-keyword">return</span> stringStringConsumerRecord.value();
        }
    });

    logDStream.mapToPair(new <span class="hljs-type">PairFunction</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">String</span>, <span class="hljs-type">Long</span>&gt;() {
        @<span class="hljs-type">Override</span>
        <span class="hljs-keyword">public</span> <span class="hljs-type">Tuple2</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">Long</span>&gt; call(<span class="hljs-type">String</span> line) <span class="hljs-keyword">throws</span> <span class="hljs-type">Exception</span> {
            <span class="hljs-keyword">return</span> new <span class="hljs-type">Tuple2</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">Long</span>&gt;(<span class="hljs-type">Utils</span>.getKey(line),1L);
        }
    }).reduceByKey(new <span class="hljs-type">Function2</span>&lt;<span class="hljs-type">Long</span>, <span class="hljs-type">Long</span>, <span class="hljs-type">Long</span>&gt;() {
        @<span class="hljs-type">Override</span>
        <span class="hljs-keyword">public</span> <span class="hljs-type">Long</span> call(<span class="hljs-type">Long</span> aLong, <span class="hljs-type">Long</span> aLong2) <span class="hljs-keyword">throws</span> <span class="hljs-type">Exception</span> {

            <span class="hljs-keyword">return</span> aLong + aLong2;
        }
    }).foreachRDD(new <span class="hljs-type">VoidFunction2</span>&lt;<span class="hljs-type">JavaPairRDD</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">Long</span>&gt;, <span class="hljs-type">Time</span>&gt;() {
        @<span class="hljs-type">Override</span>
        <span class="hljs-keyword">public</span> void call(<span class="hljs-type">JavaPairRDD</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">Long</span>&gt; <span class="hljs-type">RDD</span>, <span class="hljs-type">Time</span> time) <span class="hljs-keyword">throws</span> <span class="hljs-type">Exception</span> {
            <span class="hljs-type">RDD</span>.foreachPartition(new <span class="hljs-type">VoidFunction</span>&lt;<span class="hljs-type">Iterator</span>&lt;<span class="hljs-type">Tuple2</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">Long</span>&gt;&gt;&gt;() {
                @<span class="hljs-type">Override</span>
                <span class="hljs-keyword">public</span> void call(<span class="hljs-type">Iterator</span>&lt;<span class="hljs-type">Tuple2</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">Long</span>&gt;&gt; <span class="hljs-built_in">partition</span>) <span class="hljs-keyword">throws</span> <span class="hljs-type">Exception</span> {
                    <span class="hljs-type">HBaseDao</span> hBaseDao = <span class="hljs-type">HBaseFactory</span>.getHBaseDao();
                    <span class="hljs-keyword">while</span> (<span class="hljs-built_in">partition</span>.hasNext()){
                        <span class="hljs-type">Tuple2</span>&lt;<span class="hljs-type">String</span>, <span class="hljs-type">Long</span>&gt; tuple = <span class="hljs-built_in">partition</span>.next();
                        hBaseDao.save(<span class="hljs-string">"aura"</span>,tuple.<span class="hljs-number">_1</span>,<span class="hljs-string">"f"</span>,<span class="hljs-string">"name"</span>,tuple.<span class="hljs-number">_2</span>);
                        <span class="hljs-type">System</span>.out.<span class="hljs-built_in">println</span>(tuple.<span class="hljs-number">_1</span>+<span class="hljs-string">" "</span>+  tuple.<span class="hljs-number">_2</span>);

                    }


                }
            });
        }
    });





   <span class="hljs-comment">/* JavaDStream&lt;String&gt; logDStream;
    logDStream = KafkaUtils.createDirectStream(
            ssc,
            String.class,
            String.class,
            StringDecoder.class,
            topics,
            StringDecoder.class,
            kafkaParams
    ).map(new Function&lt;Tuple2&lt;String, String&gt;, String&gt;() {
        @Override
        public String call(Tuple2&lt;String, String&gt; tuple2) throws Exception {
            return tuple2._2;
        }
    });*/</span>
    <span class="hljs-comment">//代码的逻辑</span>
    <span class="hljs-comment">//启动应用程序</span>
    ssc.start();

    <span class="hljs-keyword">try</span> {
        ssc.awaitTermination();
    } <span class="hljs-keyword">catch</span> (<span class="hljs-type">InterruptedException</span> e) {
        e.printStackTrace();
    }
    ssc.stop();
}

}

辅助类:
(bean):
package bean;

import java.io.Serializable;

public class CategoryClickCount implements Serializable {
//点击的品类
private String name;

<span class="hljs-comment">//点击的次数</span>
<span class="hljs-keyword">private</span> <span class="hljs-keyword">long</span> count;

<span class="hljs-function"><span class="hljs-keyword">public</span> String <span class="hljs-title">getName</span><span class="hljs-params">()</span> </span>{
    <span class="hljs-keyword">return</span> name;
}

<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">setName</span><span class="hljs-params">(String name)</span> </span>{
    <span class="hljs-keyword">this</span>.name = name;
}

<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">long</span> <span class="hljs-title">getCount</span><span class="hljs-params">()</span> </span>{
    <span class="hljs-keyword">return</span> count;
}

<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">setCount</span><span class="hljs-params">(<span class="hljs-keyword">long</span> count)</span> </span>{
    <span class="hljs-keyword">this</span>.count = count;
}

<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-title">CategoryClickCount</span><span class="hljs-params">(String name, <span class="hljs-keyword">long</span> count)</span> </span>{
    <span class="hljs-keyword">this</span>.name = name;
    <span class="hljs-keyword">this</span>.count = count;
}

}

(Utils):
package utils;

import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;

public class Utils {
public static String getKey(String line) {
HashMap<String, String> map = new HashMap<String, String>();
map.put(“0”, “其他”);
map.put(“1”, “电视剧”);
map.put(“2”, “电影”);
map.put(“3”, “综艺”);
map.put(“4”, “动漫”);
map.put(“5”, “纪录片”);
map.put(“6”, “游戏”);
map.put(“7”, “资讯”);
map.put(“8”, “娱乐”);
map.put(“9”, “财经”);
map.put(“10”, “网络电影”);
map.put(“11”, “片花”);
map.put(“12”, “音乐”);
map.put(“13”, “军事”);
map.put(“14”, “教育”);
map.put(“15”, “体育”);
map.put(“16”, “儿童”);
map.put(“17”, “旅游”);
map.put(“18”, “时尚”);
map.put(“19”, “生活”);
map.put(“20”, “汽车”);
map.put(“21”, “搞笑”);
map.put(“22”, “广告”);
map.put(“23”, “原创”);

    <span class="hljs-comment">//获取到品类ID</span>
    <span class="hljs-type">String</span> categoryid = line.<span class="hljs-built_in">split</span>(<span class="hljs-string">"&amp;"</span>)[<span class="hljs-number">9</span>].<span class="hljs-built_in">split</span>(<span class="hljs-string">"/"</span>)[<span class="hljs-number">4</span>];
    <span class="hljs-comment">//获取到品类的名称</span>
    <span class="hljs-type">String</span> name = <span class="hljs-built_in">map</span>.<span class="hljs-keyword">get</span>(categoryid);
    <span class="hljs-comment">//获取用户访问数据的时间</span>
    <span class="hljs-type">String</span> stringTime = line.<span class="hljs-built_in">split</span>(<span class="hljs-string">"&amp;"</span>)[<span class="hljs-number">8</span>].<span class="hljs-built_in">split</span>(<span class="hljs-string">"="</span>)[<span class="hljs-number">1</span>];
    <span class="hljs-comment">//获取日期</span>
    <span class="hljs-type">String</span> date = getDay(<span class="hljs-type">Long</span>.valueOf(stringTime));
    <span class="hljs-keyword">return</span> date + <span class="hljs-string">"_"</span> + name;

}
<span class="hljs-keyword">public</span> <span class="hljs-keyword">static</span> <span class="hljs-type">String</span> getDay(long time){
    <span class="hljs-type">SimpleDateFormat</span> simpleDateFormat = new <span class="hljs-type">SimpleDateFormat</span>(<span class="hljs-string">"yyyy-MM-dd"</span>);
    <span class="hljs-keyword">return</span> simpleDateFormat.format(new <span class="hljs-type">Date</span>());
}

}

(dao):
package dao;

import bean.CategoryClickCount;

import java.util.List;

public interface HBaseDao {

<span class="hljs-comment">//往hbase里面插入一条数据</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">save</span> <span class="hljs-params">(String tableName,String rowkey,
                  String family,String q ,<span class="hljs-keyword">long</span> value)</span></span>;

<span class="hljs-comment">//根据条件查询数据</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> List&lt;CategoryClickCount&gt; <span class="hljs-title">count</span><span class="hljs-params">(String tableName, String rowkey)</span></span>;

}

(dao.impl):
package dao.impl;

import bean.CategoryClickCount;
import dao.HBaseDao;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.PrefixFilter;
import org.apache.hadoop.hbase.util.Bytes;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class HBaseImpl implements HBaseDao {

HConnection hatablePool = <span class="hljs-keyword">null</span>;
<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-title">HBaseImpl</span><span class="hljs-params">()</span></span>{
    Configuration conf = HBaseConfiguration.create();
    <span class="hljs-comment">//HBase自带的zookeeper</span>
    conf.set(<span class="hljs-string">"hbase.zookeeper.quorum"</span>,<span class="hljs-string">"hadoop02:2181"</span>);
    <span class="hljs-keyword">try</span> {
        hatablePool = HConnectionManager.createConnection(conf);
    } <span class="hljs-keyword">catch</span> (IOException e) {
        e.printStackTrace();
    }
}

<span class="hljs-comment">/**
 * 根据表名获取表对象
 * <span class="hljs-doctag">@param</span> tableName  表名
 * <span class="hljs-doctag">@return</span> 表对象
 */</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> HTableInterface <span class="hljs-title">getTable</span><span class="hljs-params">(String tableName)</span></span>{
    HTableInterface table = <span class="hljs-keyword">null</span>;
    <span class="hljs-keyword">try</span> {
        table = hatablePool.getTable(tableName);
    } <span class="hljs-keyword">catch</span> (IOException e) {
        e.printStackTrace();
    }
    <span class="hljs-keyword">return</span> table;
}


<span class="hljs-comment">/**
 * 往hbase里面插入一条数据
 * <span class="hljs-doctag">@param</span> tableName 表名
 * <span class="hljs-doctag">@param</span> rowkey rowkey
 * <span class="hljs-doctag">@param</span> family 列族
 * <span class="hljs-doctag">@param</span> q 品类
 * <span class="hljs-doctag">@param</span> value 出现了的次数
 *              2018-12-12_电影 f q 19
 *              updateStateBykey 对内存的要求高一点
 *              reduceBykey 对内存要求低一点
 */</span>
<span class="hljs-meta">@Override</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> <span class="hljs-keyword">void</span> <span class="hljs-title">save</span><span class="hljs-params">(String tableName, String rowkey, String family, String q, <span class="hljs-keyword">long</span> value)</span> </span>{

    HTableInterface table = getTable(tableName);
    <span class="hljs-keyword">try</span> {
        table.incrementColumnValue(rowkey.getBytes(),family.getBytes(),q.getBytes(),value);
    } <span class="hljs-keyword">catch</span> (IOException e) {
        e.printStackTrace();
    }<span class="hljs-keyword">finally</span> {
        <span class="hljs-keyword">if</span> (table != <span class="hljs-keyword">null</span>){
            <span class="hljs-keyword">try</span> {
                table.close();
            } <span class="hljs-keyword">catch</span> (IOException e) {
                e.printStackTrace();
            }
        }
    }


}

<span class="hljs-comment">/**
 * 根据rowkey 返回数据
 * <span class="hljs-doctag">@param</span> tableName 表名
 * <span class="hljs-doctag">@param</span> rowkey rowkey
 * <span class="hljs-doctag">@return</span>
 */</span>
<span class="hljs-meta">@Override</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> List&lt;CategoryClickCount&gt; <span class="hljs-title">count</span><span class="hljs-params">(String tableName, String rowkey)</span> </span>{
    ArrayList&lt;CategoryClickCount&gt; list = <span class="hljs-keyword">new</span> ArrayList&lt;&gt;();
    HTableInterface table = getTable(tableName);
    PrefixFilter prefixFilter = <span class="hljs-keyword">new</span> PrefixFilter(rowkey.getBytes());<span class="hljs-comment">//用左查询进行rowkey查询</span>
    Scan scan = <span class="hljs-keyword">new</span> Scan();
    scan.setFilter(prefixFilter);

    <span class="hljs-keyword">try</span> {
        ResultScanner scanner = table.getScanner(scan);
        <span class="hljs-keyword">for</span> (Result result : scanner){
            <span class="hljs-keyword">for</span> (Cell cell : result.rawCells()){
                <span class="hljs-keyword">byte</span>[] date_name = CellUtil.cloneRow(cell);
                String name = <span class="hljs-keyword">new</span> String(date_name).split(<span class="hljs-string">"_"</span>)[<span class="hljs-number">1</span>];
                <span class="hljs-keyword">byte</span>[] value = CellUtil.cloneValue(cell);
                <span class="hljs-keyword">long</span> count = Bytes.toLong(value);
                CategoryClickCount categoryClickCount = <span class="hljs-keyword">new</span> CategoryClickCount(name, count);
                list.add(categoryClickCount);

            }
        }
    } <span class="hljs-keyword">catch</span> (IOException e) {
        e.printStackTrace();
    }<span class="hljs-keyword">finally</span> {
        <span class="hljs-keyword">if</span> (table != <span class="hljs-keyword">null</span>){
            <span class="hljs-keyword">try</span> {
                table.close();
            } <span class="hljs-keyword">catch</span> (IOException e) {
                e.printStackTrace();
            }
        }
    }
    <span class="hljs-keyword">return</span> list;
}

}

(dao.factory):
package dao.factory;

import dao.HBaseDao;
import dao.impl.HBaseImpl;

public class HBaseFactory {
public static HBaseDao getHBaseDao(){
return new HBaseImpl();
}
}

测试类:
(Test):
package test;

import bean.CategoryClickCount;
import dao.HBaseDao;
import dao.factory.HBaseFactory;

import java.util.List;

public class Test {
public static void main(String[] args) {

    <span class="hljs-type">HBaseDao</span> hBaseDao = <span class="hljs-type">HBaseFactory</span>.getHBaseDao();

    hBaseDao.save(<span class="hljs-string">"aura"</span>,
            <span class="hljs-string">"2018-05-23_电影"</span>,<span class="hljs-string">"f"</span>,<span class="hljs-string">"name"</span>,10L);
    hBaseDao.save(<span class="hljs-string">"aura"</span>,
            <span class="hljs-string">"2018-05-23_电影"</span>,<span class="hljs-string">"f"</span>,<span class="hljs-string">"name"</span>,20L);
    hBaseDao.save(<span class="hljs-string">"aura"</span>,
            <span class="hljs-string">"2018-05-21_电视剧"</span>,<span class="hljs-string">"f"</span>,<span class="hljs-string">"name"</span>,11L);
    hBaseDao.save(<span class="hljs-string">"aura"</span>,
            <span class="hljs-string">"2018-05-21_电视剧"</span>,<span class="hljs-string">"f"</span>,<span class="hljs-string">"name"</span>,24L);
    hBaseDao.save(<span class="hljs-string">"aura"</span>,
            <span class="hljs-string">"2018-05-23_电视剧"</span>,<span class="hljs-string">"f"</span>,<span class="hljs-string">"name"</span>,110L);
    hBaseDao.save(<span class="hljs-string">"aura"</span>,
            <span class="hljs-string">"2018-05-23_电视剧"</span>,<span class="hljs-string">"f"</span>,<span class="hljs-string">"name"</span>,210L);

    <span class="hljs-type">List</span>&lt;<span class="hljs-type">CategoryClickCount</span>&gt; list = hBaseDao.<span class="hljs-built_in">count</span>(<span class="hljs-string">"aura"</span>, <span class="hljs-string">"2018-05-21"</span>);
    <span class="hljs-keyword">for</span> (<span class="hljs-type">CategoryClickCount</span> cc : list){
        <span class="hljs-type">System</span>.out.<span class="hljs-built_in">println</span>(cc.getName() + <span class="hljs-string">" "</span>+ cc.getCount());
    }

}

}

8、流程




1、项目架构






































  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值