![](https://img-blog.csdnimg.cn/3ad78a3ee27341ec865c82eaa8dc4077.png?x-oss-process=image/resize,m_fixed,h_224,w_224)
spark
spark相关
菩提树下的呆子
分享所想分享的
展开
-
大数据集群安装03之spark配置
spark配置必看配置千万条,网络第一条。配置不规范,bug改到吐。 内外ip要分清,本机配置内ip,连接请用外ip1.下载上传插件rz【安装命令】:yum install -y lrzsz2.上传spark压缩包【上传命令】:## 上传压缩包rz## 压缩tar -zxvf [包名]3.配置spark(1)编辑.bashrc文件在所有节点的.bashrc文件中添加如下内容:(也可以在profile文件中添加)# jdkexport JAVA_HOME=/root/jd原创 2020-08-07 21:50:20 · 463 阅读 · 0 评论 -
sparksql根据字段排好序后存入mysql
在做sparkSQL的时候发现明明在DataFrame中已经排好序列了,但是存进mysql后发现还是无序的代码如下import org.apache.spark.{SparkConf}import org.apache.spark.sql.{ SaveMode, SparkSession}object timetest { def main(args: Array[String])...原创 2020-02-09 20:48:23 · 435 阅读 · 1 评论 -
spark RDD基础装换操作--sortBy操作
18.sortBy操作将词频统计的结果按照当出现的次数进行倒序排列。scala> val rddData1 = sc.parallelize(Array(("dog",3),("cat",1),("hadoop",2),("spark",3),("apple",2)))rddData1: org.apache.spark.rdd.RDD[(String, Int)] = Parall...原创 2020-02-01 15:06:17 · 1068 阅读 · 0 评论 -
spark RDD基础装换操作--ZipWithUniqueld操作
17.ZipWithUniqueld操作创建由字母A~E组成的RDD,然后将每个元素与其对应的唯一ID进行拉链(zip)操作。scala> val rddData1 = sc.parallelize(Array("A","B","C","D","E"))rddData1: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD...原创 2020-02-01 14:56:16 · 385 阅读 · 0 评论 -
spark RDD基础装换操作--ZipWithindex操作
16.ZipWithindex操作创建由字母A~E组成的RDD,然后将每个元素与其对应的索引值进行合并。scala> val rddData1 = sc.parallelize(Array("A","B","C","D","E"))rddData1: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at para...原创 2020-02-01 14:48:41 · 1225 阅读 · 0 评论 -
spark RDD基础装换操作--ZipPartitions操作
15.ZipPartitions操作创建一个由数字1~100组成的RDD,并且设置为10个分区。然后执行coalesce操作,将分区数聚合为“5”,然后再将其拓展为“7”,观察操作后的效果。...原创 2020-01-30 22:51:22 · 759 阅读 · 0 评论 -
spark RDD基础装换操作--zip操作
14.zip操作将数字1~3组成的RDD,与字母A到C组成的RDD应用拉链(zip)操作,合并到一个新的RDD中。scala> val rddData1 = sc.parallelize(1 to 10,5)rddData1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[32] at parallelize at <...原创 2020-01-30 15:02:03 · 554 阅读 · 0 评论 -
spark RDD基础装换操作--glom操作
13.glom操作创建一个由数字1~10组成的RDD,并且设置为5个分区。然后将对应分区转换为数组。scala> val rddData1 = sc.parallelize(1 to 10,5)rddData1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[32] at parallelize at <consol...原创 2020-01-30 11:22:08 · 368 阅读 · 0 评论 -
spark RDD基础装换操作--randomSplit操作
12.randomSplit操作将由数字1~10组成的RDD,用randomSplit操作拆分成3个RDD。scala> val rddData1 = sc.parallelize(1 to 10,3)rddData1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[28] at parallelize at <co...原创 2020-01-30 11:12:25 · 1109 阅读 · 0 评论 -
spark RDD基础装换操作--reparation操作
11.repartition操作创建一个由数字1~100组成的RDD,并且设置为10个分区。然后执行repartition操作,将分区数聚合为“5”,然后再将其拓展为“7”,观察操作后的效果。scala> val rddData1 = sc.parallelize(1 to 100,10)rddData1: org.apache.spark.rdd.RDD[Int] = Parall...原创 2020-01-30 10:56:32 · 770 阅读 · 0 评论 -
spark RDD基础装换操作--coalesce操作
10.coalesce操作创建一个由数字1~100组成的RDD,并且设置为10个分区。然后执行coalesce操作,将分区数聚合为“5”,然后再将其拓展为“7”,观察操作后的效果。scala> val rddData1 = sc.parallelize(1 to 100,10)rddData1: org.apache.spark.rdd.RDD[Int] = ParallelColl...原创 2020-01-29 23:08:58 · 537 阅读 · 1 评论 -
spark RDD基础装换操作--subtract操作
9.subtract操作对封装有数字1~10的RDD和封装有数字1到10的RDD求差集scala> val rddData1 = sc.parallelize(Array(1,1,2))rddData1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[5] at parallelize at <console>...原创 2020-01-29 22:31:21 · 1336 阅读 · 0 评论 -
spark RDD基础装换操作--intersection操作
8.intersection操作对包含数字1,1,2的RDD与包含数字2,2,3的RDD进行交集运算。scala> val rddData1 = sc.parallelize(Array(1,1,2))rddData1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[5] at parallelize at <con...原创 2020-01-29 22:26:26 · 599 阅读 · 0 评论 -
spark RDD基础装换操作--union操作
7.union操作对封装有数字1~10的RDD和封装有数字1到20的RDD求并集scala> val rddData1 = sc.parallelize(1 to 10)rddData1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at parallelize at <console>:24sca...原创 2020-01-29 22:19:55 · 409 阅读 · 0 评论 -
spark RDD基础装换操作--mapPartitionsWithIndex操作
6.mapPartitionsWithIndex操作将RDD中所有考试分数高于95分的“学生准考证号”,“对应分数” 和“当前数据所在分区”拼接后输出。scala> val rddData =sc.parallelize(Array(("20180001",83),("201800002",97),("20180003",100),("20180004",95),("20180005",...原创 2020-01-29 22:14:39 · 383 阅读 · 0 评论 -
spark RDD基础装换操作--mappartitions操作
5.mappartitions操作将RDD中所有考试分数高于95分的“学生准考证号”与“对应分数” 拼接后输出。scala> val rddData =sc.parallelize(Array(("20180001",83),("201800002",97),("20180003",100),("20180004",95),("20180005",87)),2)rddData: org...原创 2020-01-29 22:00:34 · 341 阅读 · 0 评论 -
spark RDD基础装换操作--distinct操作
4.distinct操作将RDD中用户数据按照“姓名”去重。scala> val rddData = sc.parallelize(Array(“Alice”,“Nick”,“Alice”,“Kotlin”,“Catalina”,“Catalina”),3)rddData: org.apache.spark.rdd.RDD[String] = ParallelCollectionRD...原创 2020-01-29 13:49:38 · 1752 阅读 · 0 评论 -
spark RDD基础装换操作--filter操作
3.filter操作将自然数1~100的RDD中所有的质数分配到新RDD中。scala> val rddData = sc.parallelize(1 to 100)rddData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[7] at parallelize at :39scala> import scala....原创 2020-01-29 13:19:03 · 932 阅读 · 0 评论 -
spark RDD基础装换操作--flatmap操作
2.flatMap操作scala> val rddData = sc.parallelize(Array(“one,two.three”,“four,five,six”,“seven,eight,nine,ten”))rddData: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at...原创 2020-01-29 12:49:03 · 483 阅读 · 0 评论 -
spark RDD基础装换操作--map操作
1.map操作scala> val rddData = sc.parallelize(1 to 10)rddData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at :24scala> val rddData2 = rddData.map(_ * 10)rddData2: o...原创 2020-01-29 12:29:46 · 610 阅读 · 0 评论