1)安装HBASE
https://blog.csdn.net/hailunw/article/details/119057361
2)在HBASE中创建表
[user@NewBieSlave1 hbase-2.3.5]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/user/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/user/hbase-2.3.5/lib/client-facing-thirdparty/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.3.5, rfd3fdc08d1cd43eb3432a1a70d31c3aece6ecabe, Thu Mar 25 20:50:15 UTC 2021
Took 0.0014 seconds
hbase(main):001:0> create 'course_clickcount','info'
Created table course_clickcount
Took 1.1887 seconds
=> Hbase::Table - course_clickcount
hbase(main):002:0> create 'course_search_clickcount','info'
Created table course_search_clickcount
Took 0.6424 seconds
=> Hbase::Table - course_search_clickcount
hbase(main):003:0> list
TABLE
category_clickcount
course_clickcount
course_search_clickcount
helloWorld
4 row(s)
Took 0.0186 seconds
=> ["category_clickcount", "course_clickcount", "course_search_clickcount", "helloWorld"]
hbase(main):004:0> describe 'course_clickcount'
Table course_clickcount is ENABLED
course_clickcount
COLUMN FAMILIES DESCRIPTION
{NAME => 'info', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VE
RSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s)
Quota is disabled
Took 0.1509 seconds
hbase(main):005:0> scan 'course_clickcount'
ROW COLUMN+CELL
0 row(s)
Took 0.0960 seconds
3) 创建实体类ClickLog,CourseClickCount 以及CourseSearchClickCount
4)创建日期格式 转换工具类(Scala实现)
5)创建 HBASE的DAO类 CourseClickCountDAO 和 CourseSearchClickCountDAO
6) 修改 Kafka集群的SparkStream读取类,增加数据清洗的逻辑
7)修改 Kafka集群的SparkStream读取类,增加数据清洗,以及统计后写入数据库的逻辑