1.kafka数据,:生产数据——>消费数据
kafka安装
https://www.cnblogs.com/justuntil/p/8033792.html
1.创建主题
kafka-topics.sh --create --zookeeper 10.101.43.54:2181 --replication-factor 1 --partitions 1 --topic Hello-Kafka3
2.获取主题列表
kafka-topics.sh --list --zookeeper 10.101.43.54:2181
3.启动生产者,发送消息
kafka-console-producer.sh --broker-list 10.101.43.54:9092 --topic Hello-Kafka3
4.启动消费者,接收消息
kafka-console-consumer.sh --zookeeper 10.101.43.54:2181 --topic Hello-Kafka3 --from-beginning
高版本:./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic metrics --from-beginning
kafka-topics.sh --describe --zookeeper 10.101.43.54:2181 --topic Hello-Kafka2
5.删除主题
kafka-topics.sh --delete --zookeeper 10.101.43.54:2181 --topic Hello-Kafka2
完全删除
https://blog.csdn.net/belalds/article/details/80575751
2.hbase简单操作
简单工具:https://github.com/HY-ZhengWei/HBaseClient
3.spark_maven 开发团队: edu.uchicago.mpcs53013
http://www.mvnjar.com/edu.uchicago.mpcs53013/spark-streaming-flights-archetype/jar.html
4.sparkstreaming+kafka实战
spark-streaming-flights-archetype程序数据还原
kafka数据源
topic: flights
{
"flight": "index",
"originName": "A1",
"destinationName": "B1",
"departureDelay": 10
}
./kafka-console-consumer.sh --zookeeper 10.101.43.47:2181 --topic flights --from-beginning
hbase数据源
tablename: weather_delays_by_route_8
row: (originName + destinationName)->(A20B20)
列簇 (对应的字段)
delay(
{
"clear_flights": 0,
"clear_delays": 1,
"fog_flights": 0,
"fog_delays": 2,
"rain_flights": 0,
"rain_delays": 3,
"snow_flights": 0,
"snow_delays": 4,
"hail_flights": 0,
"hail_delays": 5,
"thunder_flights": 0,
"thunder_delays": 6,
"tornado_flights": 0,
"tornado_delays": 0
}
)
create "weather_delays_by_route_8" "delay"
put 'weather_delays_by_route_8','A20B20','delay:clear_flights',0
put 'weather_delays_by_route_8','A20B20','delay:clear_delays',1
put 'weather_delays_by_route_8','A20B20','delay:fog_flights',0
put 'weather_delays_by_route_8','A20B20','delay:fog_delays',2
put 'weather_delays_by_route_8','A20B20','delay:rain_flights',0
put 'weather_delays_by_route_8','A20B20','delay:rain_delays',3
put 'weather_delays_by_route_8','A20B20','delay:snow_flights',0
put 'weather_delays_by_route_8','A20B20','delay:snow_delays',5
put 'weather_delays_by_route_8','A20B20','delay:hail_flights',0
put 'weather_delays_by_route_8','A20B20','delay:hail_delays',6
put 'weather_delays_by_route_8','A20B20','delay:thunder_flights',0
put 'weather_delays_by_route_8','A20B20','delay:thunder_delays',0
put 'weather_delays_by_route_8','A20B20','delay:tornado_flights',0
put 'weather_delays_by_route_8','A20B20','delay:tornado_delays',0
提交spark-streaming程序
spark-submit --master yarn --class StreamFlights --driver-memory 1G --executor-memory 512M --num-executors 3 /opt/data/uber-sparkscala-0.0.1-SNAPSHOT.jar
查看运行中的sparkstreaming程序
yarn application -list
关闭sparkstreaming程序
yarn application -kill application_1537148028090_0007
5.flink研究
6.kafka性能测试
https://blog.csdn.net/high2011/article/details/79526689
https://www.cnblogs.com/xiaodf/p/6023531.html (参考详细)
7.spark+streaming消费kafka的压力测试
https://blog.csdn.net/u4110122855/article/details/75090337
8.sparkstreaming+kafka实战
http://dblab.xmu.edu.cn/blog/1536/
9.spark算子
https://blog.csdn.net/fortuna_i/article/details/81170565
foreachPartition
https://blog.csdn.net/fox64194167/article/details/80777715
spark实战
https://gitee.com/featureai/spark-example/tree/master
10.hbase 分区
https://blog.csdn.net/u013870094/article/details/79440312 (重点一)
https://www.cnblogs.com/darange/p/9386079.html (重点二)