Spark-core笔记

Spark-core笔记


Spark框架简介

spark 配置

image_1citvkvnc11jk136b9hn172t1rn69.png-19.8kB

shell启动

image_1cj0md5av1sna1a9l1f08aftr3g34.png-22.1kB
image_1citvt0gfdh517gk1efbk3nk272d.png-24.7kB

local[2] 启动模式
image_1citvvhbhikd1c7m1a53vp6up72q.png-1.9kB

spark分类

image_1citvpg3t1rmq111gt641q4m54j13.png-258.8kB

简介

image_1citvie811grd4dq1btf18k0ql9.png-31kB
image_1citvj1i8db124i16pgel06rim.png-35.5kB

spark运行模式

image_1citvmlovt69teo1kng1afom86m.png-12.9kB

spark shell

image_1citvq6ab14roslctjqovo1p971g.png-122.6kB

结构图、运行模式解析

image_1cj0n8qsn1e2b1v4o14jh60c1hbh9k.png-428.7kB

image_1cj0ncm42iuf183iceno46aqaa1.png-137.9kB
image_1cj0mv7di1vbsnmn5kdm5d1d7n7j.png-190.4kB
image_1cj0n0kbq6np1chk61h1v0k5dd80.png-273.3kB
image_1cj0n1fvdb4u1n7ttb91b2nfgt8d.png-258.9kB
image_1cj0n362kv9f1g1ok7e3p61v4n8q.png-256.2kB

RDD

五大特点

image_1cj0n5hau1c1ke3g44qjdjmbr97.png-175.5kB

image_1cj0m4a4v1hm0k837j62gu15a19.png-40kB
image_1cj0m4ss01lan7i1jldoo5qbfm.png-14.1kB
image_1cj0m6h8pqt1rqf1dab1s3t1v0313.png-24.4kB
image_1cj0m74kkqj75qu8gnf5u5o51g.png-23.9kB
image_1cj0m7n9tee6bmj1o4e192i12rg1t.png-20.5kB
image_1cj0m84ng1pjq1m5a6n2155ofjl2a.png-29.8kB

运行模式与配置

image_1cj0m9h1l17g21ipaefhif1l4v2n.png-31.9kB

Spark程序

提交程序

image_1cj0mi4n143d1ng81oj1jph1aqm3h.png-9.2kB

自定义目录和 运行模式

image_1cj0mivetfbf1g3h1rf1e0d1p873u.png-9.3kB

运行在yarn上 并设置内存 线程

image_1cj0mk4jeodgddrkpq1bdj1shs4b.png-13.1kB

Spark开发模板

image_1ciu07dcffemccmornl661b3g37.png-64.4kB
image_1ciu080n8f4v2am1ngg1jbgqd544.png-34.4kB

WordCount 程序

image_1ciu09m3i1q9pc991n1ls5p12414h.png-69.1kB
image_1ciu0a2rk1mu33qh1tq31mrd18aa4u.png-64kB
image_1ciu0aji0l05a1he41m6kmbv5b.png-51.4kB
image_1ciu0atqa11k9uqkoc8106bejt5o.png-25kB

数据结果导入MySQL

image_1ciu0f151slh2qg1mvd15r011u065.png-54.2kB
image_1ciu0fjqv1btscg81nu8b11edv6i.png-68.4kB
image_1ciu0g0jd14sr7apoa8118250d6v.png-55.3kB
image_1ciu0g8671qr5dn1mrhft777a7c.png-22kB

分组、排序和TopKey

image_1cj0mlqeq18er1ta91jii17941n584o.png-66.2kB
image_1cj0mmik5h1icq41cfqafk1rue55.png-43.6kB
image_1cj0mn10k6p81bph1unk1f89eqm5i.png-40.4kB
image_1cj0mn6sn1vtq53ieqnjo213pe5v.png-9.7kB

分组排序、TopKey 并且改进map和局部聚合

image_1cj0mq9b01l3kpp41aio194t7m16c.png-68.5kB
image_1cj0mqhc97ib1bqmadn9mv1eab6p.png-45kB
image_1cj0mqv3u13c61cefriec6t12qf76.png-69.4kB

当你提交Spark Core源码学习笔记的时候,并注册Driver的流程,以Java的WordCount为例。 首先,编写提交Job的代码: ```java import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; public class WordCount { public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("WordCount"); JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> lines = sc.textFile(args[0]); JavaRDD<String> words = lines.flatMap(line -> Arrays.asList(line.split(" ")).iterator()); JavaPairRDD<String, Integer> pairs = words.mapToPair(word -> new Tuple2<>(word, 1)); JavaPairRDD<String, Integer> counts = pairs.reduceByKey((a, b) -> a + b); counts.saveAsTextFile(args[1]); } } ``` 之后就是提交任务和注册Driver: ```java import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.SparkContext; public class WordCount { public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("WordCount"); JavaSparkContext sc = new JavaSparkContext(conf); SparkContext spark = SparkContext.getOrCreate(conf); spark.addSparkListener(new MySparkListener()); JavaRDD<String> lines = sc.textFile(args[0]); JavaRDD<String> words = lines.flatMap(line -> Arrays.asList(line.split(" ")).iterator()); JavaPairRDD<String, Integer> pairs = words.mapToPair(word -> new Tuple2<>(word, 1)); JavaPairRDD<String, Integer> counts = pairs.reduceByKey((a, b) -> a + b); counts.saveAsTextFile(args[1]); } } class MySparkListener extends SparkListener { public void onApplicationStart(SparkListenerApplicationStart applicationStart) { String appName = applicationStart.appName(); System.out.println("Application started: " + appName); } public void onApplicationEnd(SparkListenerApplicationEnd applicationEnd) { long time = applicationEnd.time(); System.out.println("Application ended: " + time); } } ``` 这个代码会在提交任务的时候,自动注册Driver,并且添加了自定义的Listener。注意,在提交任务前,需要先启动Spark集群,并将提交路径和结果路径传递给代码的args数组。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值