自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(27)
  • 资源 (1)
  • 收藏
  • 关注

原创 Sqoop 导入数据的基本案例

创建表数据脚本(company.sql)create database company;use company;create table company.staff(id int(4) primary key not null auto_increment, name varchar(255), sex varchar(255));insert into company.staff(name, sex) values(‘Thomas’, ‘Male’);insert into company.st

2020-06-17 12:07:16 254 1

原创 使用Flume监控本机的端口,将数据发送给Kafka

使用Flume监控本机的6666端口,将数据发送给Kafka,并启动Kafka的消费者,将数据打印到控制台,其中Kafka的topic自定义首先启动 zk 和 kafkabin/zookeeper-server-start.sh config/zookeeper.propertiesbin/kafka-server-start.sh config/server.properties然后 创建一个主题bin/kafka-topics.sh --create --zookeeper hadoop10

2020-06-16 23:08:10 785

原创 Sparksql 基本使用

package com.spark.week3import org.apache.spark.sql.SparkSessionobject One { System.setProperty("hadoop.home.dir","D:/soft/hadoop/hadoop-2.7.3") def main(args: Array[String]): Unit = { val spark = SparkSession.builder().master("local").appNam

2020-06-09 21:30:57 184

原创 模拟道路监控车辆信息及sql操作和SparkStreamming消费者获取监控信息

模拟数据:package com.chx.yuekaomoniimport java.io.PrintWriterimport java.text.SimpleDateFormatimport java.util.{Date, Properties}import org.apache.kafka.clients.producer.{KafkaProducer, Producer, ProducerRecord}import org.apache.kafka.common.serializat

2020-06-09 15:12:12 851

原创 Linux 上安装Scala

1.scala.zip上传到centos虚拟机某个目录中2.unzip scala.zip -d <自定义目录>3.授权:chmod -R 755 scala4.配置环境变量: vim /etc/profile或vim ~/.bashrcexport SCALA_HOME=/home/test/scala:$SCALA_HOME/bin追加到PATH变量中5. source /etc/profile或source ~/.bashrc6.编写scala程序7.scalac编译

2020-06-08 16:41:52 114

原创 kafka的基本使用命令

版本1:启动zookeeper:bin/zookeeper-server-start.sh config/zookeeper.properties 启动kafka:bin/kafka-server-start.sh config/server.properties 创建主题:bin/kafka-topics.sh --create --zookeeper linux-star:2181 --replication-factor 1 --partitions 1 --topic test查

2020-06-08 15:32:00 126

原创 利用sparkStreaming接受kafak中的消息,采用的是低层次Api-- direct (kafka 10版本)

package com.spark.streamingimport org.apache.kafka.common.serialization.StringDeserializerimport org.apache.spark.{SparkConf, SparkContext}import org.apache.spark.streaming.{Seconds, StreamingContext}//todo:利用sparkStreaming接受kafak中的消息,采用的是低层次Api-- d

2020-06-05 23:58:20 120

原创 利用sparkStreaming接受kafka中的数据实现单词计数----采用receivers

package com.spark.streamingimport org.apache.spark.streaming.dstream.DStreamimport org.apache.spark.streaming.kafka.KafkaUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}import scala.

2020-06-05 23:19:03 206

原创 利用sparkStreaming接受kafak中的消息,采用的是低层次Api-- direct

package com.spark.streamingimport kafka.serializer.StringDecoderimport org.apache.spark.streaming.dstream.{DStream, InputDStream}import org.apache.spark.streaming.kafka.KafkaUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}import org

2020-06-05 23:09:47 125

原创 sparkStreaming整合flume 推模式Push

package com.spark.streamingimport java.net.InetSocketAddressimport org.apache.spark.storage.StorageLevelimport org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}import org.apache.spark.streaming.flume.{FlumeUtils, SparkFlumeEvent}imp

2020-06-05 23:05:23 131

原创 sparkStreaming整合flume 拉模式Poll

package com.spark.streamingimport java.net.InetSocketAddressimport org.apache.spark.storage.StorageLevelimport org.apache.spark.streaming.flume.FlumeUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf

2020-06-05 23:04:29 107

原创 sparkStreming开窗函数应用----统计一定时间内的热门词汇

package com.spark.streamingimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}/** * sparkStreming开窗函数应用----统计一定时间内的热门词汇 */object SparkStreamingTCPWindowHotWords { def main(args: Array[

2020-06-05 22:36:35 221

原创 sparkStreming开窗函数---统计一定时间内单词出现的次数

package com.spark.streamingimport org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}import org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}/** * sparkStreming开窗函数---统计一定时间内单词出现的

2020-06-05 22:34:34 581

原创 sparkStreaming流式处理,接受socket数据,实现单词统计并且每个批次数据结果累加

package com.spark.streamingimport org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream}import org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}/** * sparkStreaming流式处理,接受socket数据,实现

2020-06-05 22:33:04 459

原创 Linux中nc的安装和作用

什么是ncnc是netcat的简写,有着网络界的瑞士军刀美誉。因为它短小精悍、功能实用,被设计为一个简单、可靠的网络工具nc的作用(1)实现任意TCP/UDP端口的侦听,nc可以作为server以TCP或UDP方式侦听指定端口(2)端口的扫描,nc可以作为client发起TCP或UDP连接(3)机器之间传输文件(4)机器之间网络测速安装命令:[star@linux-star opt]$ yum install -y nc在一个终端上 输入:[star@linux-star opt]$

2020-06-05 22:27:42 892 1

原创 sparkStreming流式处理接受socket数据,实现单词统计

package com.spark.streamingimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}/** * sparkStreming流式处理接受socket数据,实现单词统计 */object SparkStreamingTCP { def main(args: Array[String]): Unit

2020-06-05 22:20:45 148

原创 Spark 通过Rdd进行倒叙排序

测试数据:1 1603A 952 1603B 853 1603C 754 1603D 965 1604F 946 1604E 957 1604K 918 1604G 899 1501A 7910 1502A 6911 1503A 5912 1504A 8913 1701A 9914 1702A 10015 1703A 65测试结果:(1702A,100)(1701A,99)(1603D,96)(1603A,95)(1604E,95)(1604F,94)(1604

2020-06-05 15:46:50 2010

原创 Spark 使用UDAF获取平均值

测试数据:{“name”:“zhangsan”,“age”:20}{“name”:“lisi”,“age”:21}{“name”:“wangwu”,“age”:22}{“name”:“zhaoliu”,“age”:23}{“name”:“tianqi”,“age”:24}测试结果:±----±-----+|count|ageavg|±----±-----+| 5| 22.0|±----±-----+package com.spark.week3import org.apa

2020-06-05 15:41:51 255

原创 Spark 通过df操作对sql进行处理

package com.spark.sqlimport org.apache.spark.sql.{DataFrame, Encoder, Row, SparkSession}import org.apache.spark.sql.catalyst.encoders.ExpressionEncoderobject DataOperation { System.setProperty("hadoop.home.dir","D:\\soft\\hadoop\\hadoop-2.7.3")

2020-06-05 15:33:12 482

原创 Spark 读取各种文件获得df并写入

package com.spark.sqlimport org.apacheimport org.apache.spark.sql.catalyst.encoders.ExpressionEncoderimport org.apache.spark.sql.types.{StringType, StructField, StructType}import org.apache.spark.sql.{Encoder, Row, SaveMode, SparkSession}object DataS

2020-06-05 15:30:07 1405

原创 Spark RDD DataSet 和 DataFrame之间的相互转换

package com.spark.sqlimport org.apache.spark.rdd.RDDimport org.apache.spark.sql.catalyst.encoders.ExpressionEncoderimport org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType}import org.apache.spark.sql._object Rdd2DataFrame

2020-06-05 15:24:59 508

原创 Spark 的IP 地址查询

package com.spark.coreimport java.sql.{Connection, DriverManager, PreparedStatement}import org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext}/** * ip地址查询 */object IPLocation { System.setProperty("hadoop.home.dir","D:\\sof

2020-06-05 15:20:23 423

原创 Spark 的UV和PV操作

UV:测试数据:192.168.33.16,hunter,2017-09-16 10:30:20,/a192.168.33.16,jack,2017-09-16 10:30:40,/a192.168.33.16,jack,2017-09-16 10:30:40,/a192.168.33.16,jack,2017-09-16 10:30:40,/a192.168.33.16,jack,2017-09-16 10:30:40,/a192.168.33.18,polo,2017-09-16 10:3

2020-06-05 15:16:20 283

原创 Spark 的二次排序

package com.spark.coreimport org.apache.spark.sql.SparkSessionimport org.apache.spark.{Partitioner, SparkConf}/** * Spark的二次排序 **/object SparkSecondarySort { System.setProperty("hadoop.home.dir","d://soft/hadoop/hadoop-2.7.3") def main(args: A

2020-06-05 15:12:18 322

原创 Spark GroupTopN ( 分组TopN)操作

数据:zhangsan chinese 80zhangsan math 90zhangsan english 85lisi chinese 90lisi math 80lisi english 90wangwu chinese 84wangwu math 89wangwu english 70maliu chinese 82maliu math 75maliu english 100结果:math:908980chinese:908482english:100

2020-06-05 15:10:37 162

原创 Spark TopN操作

package com.spark.coreimport org.apache.spark.{SparkConf, SparkContext}//orderid,userid,money,productidobject TopN { System.setProperty("hadoop.home.dir","D:\\soft\\hadoop\\hadoop-2.7.3") def main(args: Array[String]): Unit = { val conf = n

2020-06-05 15:06:57 228

原创 Streaming flume 的poll push 的配置信息和启动

poll: 先启动flume 后启动项目 然后向指定的文件中放入东西 控制台输出bin/flume-ng agent -n a1 -c conf -f conf/flume-poll.conf -Dflume.root.logger=INFO,consolepush: (hostnome ip地址:为windows 的ip 地址) 先启动idea项目 然后启动 flume 然后向指定的文件中放入东西 控制台输出bin/flume-ng age

2020-06-03 13:40:47 129

linux上安装scala资源包 scala.zip

linux上发布scala进行scala编程操作 配置环境:https://blog.csdn.net/star5610/article/details/106623300

2020-06-08

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除