--conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -verbose:gc -XX:+UseG1GC -Xloggc:gc.log" \ --conf 'spark.driver.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -verbose:gc -XX:+UseG1GC -Xloggc:gc.log' \
1、异常问题记录:
解决办法:去http://search.maven.org上下载对应的.jar,如下载:spark-streaming-kafka-0-8-assembly_2.11-2.4.5.jar
放在site-page的目录下,我这边的路径为:/usr/lib/python2.7/site-packages/pyspark/jars,而我python安装路径:/usr/bin/python2.7
2. hdfs上无权限问题:
org.apache.hadoop.security.AccessControlException: Permission denied: user=angela, access=WRITE,
inode="/user/angela/checkpoint/sparkstreaming_windows_31229":hdfs:hdfs:drwxr-xr-x
解决方案 ;找对应的同事新建一个/user/angela目录,并赋予相应权限。
3.代码报错:
20/04/01 14:26:30 ERROR scheduler.JobScheduler: Error generating jobs for time 1585722390000 ms
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input Pattern hdfs://CQBAK/user/etl/SHHadoopStream/BulkZip/{} matches 0 files
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265)
at org.apache.spark.input.StreamFile