- 博客(27)
- 资源 (1)
- 收藏
- 关注
原创 Sqoop 导入数据的基本案例
创建表数据脚本(company.sql)create database company;use company;create table company.staff(id int(4) primary key not null auto_increment, name varchar(255), sex varchar(255));insert into company.staff(name, sex) values(‘Thomas’, ‘Male’);insert into company.st
2020-06-17 12:07:16 254 1
原创 使用Flume监控本机的端口,将数据发送给Kafka
使用Flume监控本机的6666端口,将数据发送给Kafka,并启动Kafka的消费者,将数据打印到控制台,其中Kafka的topic自定义首先启动 zk 和 kafkabin/zookeeper-server-start.sh config/zookeeper.propertiesbin/kafka-server-start.sh config/server.properties然后 创建一个主题bin/kafka-topics.sh --create --zookeeper hadoop10
2020-06-16 23:08:10 785
原创 Sparksql 基本使用
package com.spark.week3import org.apache.spark.sql.SparkSessionobject One { System.setProperty("hadoop.home.dir","D:/soft/hadoop/hadoop-2.7.3") def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appNam
2020-06-09 21:30:57 184
原创 模拟道路监控车辆信息及sql操作和SparkStreamming消费者获取监控信息
模拟数据:package com.chx.yuekaomoniimport java.io.PrintWriterimport java.text.SimpleDateFormatimport java.util.{Date, Properties}import org.apache.kafka.clients.producer.{KafkaProducer, Producer, ProducerRecord}import org.apache.kafka.common.serializat
2020-06-09 15:12:12 851
原创 Linux 上安装Scala
1.scala.zip上传到centos虚拟机某个目录中2.unzip scala.zip -d <自定义目录>3.授权:chmod -R 755 scala4.配置环境变量: vim /etc/profile或vim ~/.bashrcexport SCALA_HOME=/home/test/scala:$SCALA_HOME/bin追加到PATH变量中5. source /etc/profile或source ~/.bashrc6.编写scala程序7.scalac编译
2020-06-08 16:41:52 114
原创 kafka的基本使用命令
版本1:启动zookeeper:bin/zookeeper-server-start.sh config/zookeeper.properties 启动kafka:bin/kafka-server-start.sh config/server.properties 创建主题:bin/kafka-topics.sh --create --zookeeper linux-star:2181 --replication-factor 1 --partitions 1 --topic test查
2020-06-08 15:32:00 126
原创 利用sparkStreaming接受kafak中的消息,采用的是低层次Api-- direct (kafka 10版本)
package com.spark.streamingimport org.apache.kafka.common.serialization.StringDeserializerimport org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.streaming.{Seconds, StreamingContext}//todo:利用sparkStreaming接受kafak中的消息,采用的是低层次Api-- d
2020-06-05 23:58:20 120
原创 利用sparkStreaming接受kafka中的数据实现单词计数----采用receivers
package com.spark.streamingimport org.apache.spark.streaming.dstream.DStreamimport org.apache.spark.streaming.kafka.KafkaUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}import scala.
2020-06-05 23:19:03 206
原创 利用sparkStreaming接受kafak中的消息,采用的是低层次Api-- direct
package com.spark.streamingimport kafka.serializer.StringDecoderimport org.apache.spark.streaming.dstream.{DStream, InputDStream}import org.apache.spark.streaming.kafka.KafkaUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}import org
2020-06-05 23:09:47 125
原创 sparkStreaming整合flume 推模式Push
package com.spark.streamingimport java.net.InetSocketAddressimport org.apache.spark.storage.StorageLevelimport org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}import org.apache.spark.streaming.flume.{FlumeUtils, SparkFlumeEvent}imp
2020-06-05 23:05:23 131
原创 sparkStreaming整合flume 拉模式Poll
package com.spark.streamingimport java.net.InetSocketAddressimport org.apache.spark.storage.StorageLevelimport org.apache.spark.streaming.flume.FlumeUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf
2020-06-05 23:04:29 107
原创 sparkStreming开窗函数应用----统计一定时间内的热门词汇
package com.spark.streamingimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}/** * sparkStreming开窗函数应用----统计一定时间内的热门词汇 */object SparkStreamingTCPWindowHotWords { def main(args: Array[
2020-06-05 22:36:35 221
原创 sparkStreming开窗函数---统计一定时间内单词出现的次数
package com.spark.streamingimport org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}import org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}/** * sparkStreming开窗函数---统计一定时间内单词出现的
2020-06-05 22:34:34 581
原创 sparkStreaming流式处理,接受socket数据,实现单词统计并且每个批次数据结果累加
package com.spark.streamingimport org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}import org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}/** * sparkStreaming流式处理,接受socket数据,实现
2020-06-05 22:33:04 459
原创 Linux中nc的安装和作用
什么是ncnc是netcat的简写,有着网络界的瑞士军刀美誉。因为它短小精悍、功能实用,被设计为一个简单、可靠的网络工具nc的作用(1)实现任意TCP/UDP端口的侦听,nc可以作为server以TCP或UDP方式侦听指定端口(2)端口的扫描,nc可以作为client发起TCP或UDP连接(3)机器之间传输文件(4)机器之间网络测速安装命令:[star@linux-star opt]$ yum install -y nc在一个终端上 输入:[star@linux-star opt]$
2020-06-05 22:27:42 892 1
原创 sparkStreming流式处理接受socket数据,实现单词统计
package com.spark.streamingimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}/** * sparkStreming流式处理接受socket数据,实现单词统计 */object SparkStreamingTCP { def main(args: Array[String]): Unit
2020-06-05 22:20:45 148
原创 Spark 通过Rdd进行倒叙排序
测试数据:1 1603A 952 1603B 853 1603C 754 1603D 965 1604F 946 1604E 957 1604K 918 1604G 899 1501A 7910 1502A 6911 1503A 5912 1504A 8913 1701A 9914 1702A 10015 1703A 65测试结果:(1702A,100)(1701A,99)(1603D,96)(1603A,95)(1604E,95)(1604F,94)(1604
2020-06-05 15:46:50 2010
原创 Spark 使用UDAF获取平均值
测试数据:{“name”:“zhangsan”,“age”:20}{“name”:“lisi”,“age”:21}{“name”:“wangwu”,“age”:22}{“name”:“zhaoliu”,“age”:23}{“name”:“tianqi”,“age”:24}测试结果:±----±-----+|count|ageavg|±----±-----+| 5| 22.0|±----±-----+package com.spark.week3import org.apa
2020-06-05 15:41:51 255
原创 Spark 通过df操作对sql进行处理
package com.spark.sqlimport org.apache.spark.sql.{DataFrame, Encoder, Row, SparkSession}import org.apache.spark.sql.catalyst.encoders.ExpressionEncoderobject DataOperation { System.setProperty("hadoop.home.dir","D:\\soft\\hadoop\\hadoop-2.7.3")
2020-06-05 15:33:12 482
原创 Spark 读取各种文件获得df并写入
package com.spark.sqlimport org.apacheimport org.apache.spark.sql.catalyst.encoders.ExpressionEncoderimport org.apache.spark.sql.types.{StringType, StructField, StructType}import org.apache.spark.sql.{Encoder, Row, SaveMode, SparkSession}object DataS
2020-06-05 15:30:07 1405
原创 Spark RDD DataSet 和 DataFrame之间的相互转换
package com.spark.sqlimport org.apache.spark.rdd.RDDimport org.apache.spark.sql.catalyst.encoders.ExpressionEncoderimport org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}import org.apache.spark.sql._object Rdd2DataFrame
2020-06-05 15:24:59 508
原创 Spark 的IP 地址查询
package com.spark.coreimport java.sql.{Connection, DriverManager, PreparedStatement}import org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext}/** * ip地址查询 */object IPLocation { System.setProperty("hadoop.home.dir","D:\\sof
2020-06-05 15:20:23 423
原创 Spark 的UV和PV操作
UV:测试数据:192.168.33.16,hunter,2017-09-16 10:30:20,/a192.168.33.16,jack,2017-09-16 10:30:40,/a192.168.33.16,jack,2017-09-16 10:30:40,/a192.168.33.16,jack,2017-09-16 10:30:40,/a192.168.33.16,jack,2017-09-16 10:30:40,/a192.168.33.18,polo,2017-09-16 10:3
2020-06-05 15:16:20 283
原创 Spark 的二次排序
package com.spark.coreimport org.apache.spark.sql.SparkSessionimport org.apache.spark.{Partitioner, SparkConf}/** * Spark的二次排序 **/object SparkSecondarySort { System.setProperty("hadoop.home.dir","d://soft/hadoop/hadoop-2.7.3") def main(args: A
2020-06-05 15:12:18 322
原创 Spark GroupTopN ( 分组TopN)操作
数据:zhangsan chinese 80zhangsan math 90zhangsan english 85lisi chinese 90lisi math 80lisi english 90wangwu chinese 84wangwu math 89wangwu english 70maliu chinese 82maliu math 75maliu english 100结果:math:908980chinese:908482english:100
2020-06-05 15:10:37 162
原创 Spark TopN操作
package com.spark.coreimport org.apache.spark.{SparkConf, SparkContext}//orderid,userid,money,productidobject TopN { System.setProperty("hadoop.home.dir","D:\\soft\\hadoop\\hadoop-2.7.3") def main(args: Array[String]): Unit = { val conf = n
2020-06-05 15:06:57 228
原创 Streaming flume 的poll push 的配置信息和启动
poll: 先启动flume 后启动项目 然后向指定的文件中放入东西 控制台输出bin/flume-ng agent -n a1 -c conf -f conf/flume-poll.conf -Dflume.root.logger=INFO,consolepush: (hostnome ip地址:为windows 的ip 地址) 先启动idea项目 然后启动 flume 然后向指定的文件中放入东西 控制台输出bin/flume-ng age
2020-06-03 13:40:47 129
linux上安装scala资源包 scala.zip
2020-06-08
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人