创建socket命令:
nc -lk 9999
接收socket命令:
nc hadoop1 9999
看到第一天11
提交python的命令
/root/apps/spark-yarn/bin/spark-submit /root/apps/code/hello.py
bin/pyspark --master local
/root/apps/spark-yarn/bin/spark-submit /root/apps/code/HyperLPR/demo1.py
/root/apps/spark-yarn/bin/spark-submit /root/apps/code/HyperLPR/demo1.py -master local[2] –conf spark.pyspark.python=/usr/bin/python3.6 –conf spark.pyspark.driver.python=/usr/bin/python3.6
hadoop1的Python3的安装目录:
PYTHONPATH=$PYTHONPATH:/root/apps/python3.6/lib/python3.6/site-packages
修改python默认版本的命令
rm -rf /usr/bin/python
ln -s /usr/bin/python3.6 /usr/bin/python
一些环境变量的添加与修改
vim ~/.bashrc
vim ~/.bash_profile
source ~/.bashrc
hadoop1的python版本是3.6.8,位置在/usr/bin/python3.6
进入pyspark
bin/pyspark
安装kafka
kafaka版本要与scala版本一致
网页端口:8080,8088,18080
启动流程:
1.启动hadoop:start-all.sh
2.启动spark:sbin/start-all.sh(如果work没有启动,需要在sbin/spark-config.sh中配置export JAVA_HOME=/usr/local/jdk1.7.0_80)
3.启动spark历史服务器(sbin/start-history-server.sh)
4.启动hadoop的历史服务器和yarn的历史服务器(mr-jobhistory-daemon.sh start historyserver)
5.提交要执行的程序
bin/spark-submit
–class org.apache.spark.examples.SparkPi
–master yarn
–deploy-mode cluster
–executor-memory 2G
–executor-cores 2
./examples/jars/spark-examples_2.11-2.1.1.jar
10
为啥只有driver在hadoop1时才可以?其他不行
1.在mapred-site.xml文件中添加日志服务的配置
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
master:10020
mapreduce.jobhistory.webapp.address
master:19888
2.
mapreduce.jobhistory.webapp.address 和mapreduce.jobhistory.address值分别为默认0.0.0.0:19888 和0.0.0.0:10020
3.将修改后的配置文件拷贝到集群中的其他机器(单机版hadoop可以跳过该步骤)
4.重新启动集群的Hdfs和Yarn服务
…/是返回上一层
从linux系统中读取文件的格式:/root/apps/code/HyperLPR/data/test2.mp4
从hdfs中读取文件的格式: