linux 安装部署spark:
https://www.cnblogs.com/tijun/p/7561718.html
https://blog.csdn.net/heartsdance/article/details/119751588
https://blog.csdn.net/weixin_43854358/article/details/90666193
(这篇不错,有case)
如何排查启动失败问题:
https://blog.csdn.net/C_time/article/details/100023332
测试圆周率:
bin/spark-submit --master spark://10.153.110.18:8077 --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.11-2.4.7.jar 100 (100可自行设定)
pyspark SparkConf详解:
https://blog.csdn.net/weixin_40161254/article/details/87916880
b9b上的spark位置:
/home/disk1/software/spark-2.4.7-bin-hadoop2.7/
b9b 上的 spark日志位置:
/home/disk1/software/spark-2.4.7-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-yq01-aip-aip07a73b9b.yq01.baidu.com.out
b9b的spark web-ui(可查看任务日志等):
http://yq01-aip-aip07a73b9b.yq01.baidu.com:8078/
本地提交spark任务排查
https://www.freesion.com/article/7551171582/
spark统计:
rdd统计:
https://blog.csdn.net/liangzelei/article/details/80573015
dataframe 统计:
https://cloud.tencent.com/developer/article/1031061
https://blog.csdn.net/suzyu12345/article/details/79673557
https://zhuanlan.zhihu.com/p/237637848
内存不足的解决方案:WARN MemoryStore: Not enough space to cache rdd
https://www.playpi.org/2020012201.html
关于是否会删除缓存:
https://www.jianshu.com/p/761fa2ee868e
pyspark 提交 :spark-sumbit
www.jianshu.com/p/df0a189ff28b
https://www.cnblogs.com/piperck/p/10121097.html
https://zhuanlan.zhihu.com/p/101740397
Pyspark 支持yarn模式:
https://www.cnblogs.com/yanshw/p/12083488.html
关于在python脚本中设置master(不想用submit脚本,这个是做不到滴):
https://stackoverflow.com/questions/31327275/pyspark-on-yarn-cluster-mode
pyspark支持yarn-cluster:
spark-submit具体参数含义:
https://www.malaoshi.top/show_1IXnhwPEDg0.html
https://xujiyou.work/%E5%A4%A7%E6%95%B0%E6%8D%AE/Spark/spark-submit%E8%AF%A6%E8%A7%A3.html
b9b上面排查spark的python任务错误的日志(注意std_out):
/home/work/hadoop-2.10.0/logs/userlogs/application_1629377733083_3679/
or
/home/work/hadoop-2.10.0/logs/userlogs
spark自定义分区:
https://blog.csdn.net/weixin_45102492/article/details/104726795
spark actions 、 transfoamations等:
https://spark.apache.org/docs/latest/rdd-programming-guide.html#actions
厂内b9b机器跑bmr:
./bin/run-example --master yarn --deploy-mode cluster --files /home/aicu-tob/software/baidu_spark_emr/output_afs_agent/conf/yarn-site.xml SparkPi
./bin/run-example --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi examples/jars/spark-examples_2.11-2.4.3.2-baidu.jar
yarn-site.xml 配置:
https://ifeve.com/spark-yarn-run-spark/
https://blog.csdn.net/Jerry_991/article/details/85042305
工单链接:
https://console.cloud.baidu-int.com/ticket/new/?productId=217
厂内的spark的各种问题(这煞笔公司什么时候倒闭算了):
Spark_env教程:
https://blog.csdn.net/u010199356/article/details/89056304
spark join教程:
https://www.sohu.com/a/427258627_315839
https://zhuanlan.zhihu.com/p/317226768