Spark

最新推荐文章于 2023-05-05 10:56:49 发布

NigelChow

最新推荐文章于 2023-05-05 10:56:49 发布

阅读量309

点赞数

本文链接：https://blog.csdn.net/NigelChow/article/details/53527076

版权

Matei Zaharia -> http://people.csail.mit.edu/matei/#opensource

Spark RDD shell 操作：
1、http://www.iteblog.com/archives/1040
2、http://book.51cto.com/art/201408/448477.htm

SparkNet:
https://github.com/amplab/SparkNet

Spark API: http://10.101.73.94:8088/ws/v1/cluster/apps
Spark-GPU：http://iamtrask.github.io/2014/11/22/spark-gpu/

AMPLab: https://amplab.cs.berkeley.edu/
Michael I. Jordan: http://www.cs.berkeley.edu/~jordan/
arxiv: http://arxiv.org/

gitxiv: http://gitxiv.com/

1、nohup /spark-1.4.1/bin/spark-submit --master yarn --class $1 --driver-cores $2 --driver-memory $3 --num-executors $4 --executor-cores $5 --executor-memory $6 $7 >$8 $9 2>&1 &

2、nohup /spark-1.4.1/bin/spark-submit --master yarn --class org.apache.spark.WordCount --driver-cores 2 --driver-memory 4G --num-executors 3 --executor-cores 2 --executor-memory 4G /jar/WordCount-jar-with-dependencies.jar >/log/test.log hdfs://hdfscluster/user/test/hs_err_pid79943.log 2>&1 &

3、/spark-1.5.1/bin/spark-submit --driver-memory 8G --class org.apache.spark.WordCount --master yarn --executor-cores 4 --executor-memory 4096m --num-executors 64 /jar/WordCount-jar-with-dependencies.jar >/log/test.log hdfs://hdfscluster/user/test/hs_err_pid79943.log 2>&1 &

Spark 日志中的 SparkUI：http://30.6.87.116:4040 可以查看节点各种运行情况
对于 spark streaming on local：master 设置为 local[5] 表示 1 个 CPU 核用于 driver，另外 4 个 CPU 核用于 receiver。
对于 spark streaming on yarn：若 10 台机器中分配了 20 个 executor，则每台机器会有 2 个 executor 进程，每个 executor 进程又拥有 2 核 4 GB 内存等资源【其中，1 个核心会分配给 receiver（不会停掉，会一直接收数据），而另外 1 个核心分给计算（计算完可以停止）】，1 个 executor 进程 = 1 个 receiver 线程 + 1 个或多个 computing 线程。Batch 是针对 spark streaming 而言，一个 batch 可能包含多个 RDD，取决于 context 的时间窗口和计算时的窗口（还有接收速率），一个 batch 可能会对应 SparkUI 中多个 Job Id（不断递增的），1 个 JobId 实际上是最终会对应 1 个 action，当某个 JobId 存在 shuffle 操作（例如：groupBy、reduceByKey 等操作时），该 Job 会分割成多个 stage（stage 只能按照 shuffle 操作划分，1 个 shuffle 操作会划分出两个 stage），另外存在 1 个 stage 会有多个 task，1 个 task 对应 1 个 partition 对应 1 个 block，1 个 task 会放到 executor 进程申请到的其中 1 个核去执行。

NigelChow

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Spark

Matei Zaharia -> http://people.csail.mit.edu/matei/#opensourceSpark RDD shell 操作：1、http://www.iteblog.com/archives/10402、http://book.51cto.com/art/201408/448477.htmSparkNet:https://g
复制链接

扫一扫

Spark

“相关推荐”对你有帮助么？