一: spark
1.怎样调试:
1. 使用local运行后,再提交到服务器运行, 过滤掉简单的 python语法错误
2. 本地运行也可以把数据从hdfs拖下来,但不能执行saveAsText [hdfs]这类操作
3.使用 yarn-client用于调试;会输出详细的错误信息,而yarn-cluster不会输出这类信息
Traceback (most recent call last):
File "/home/work/work/test/stability_analysis.py", line 473, in <module>
stability_analysis()
File "/home/work/work/test/stability_analysis.py", line 464, in stability_analysis
number.reduced_then_store_into_file()
File "/home/work/work/test/stability_analysis.py", line 248, in reduced_then_store_into_file
rdd_formatter.saveAsTextFile(NumberAbnormalReboot.number_output)
File "/home/work/tars/infra-client-1.1/bin/current/c3prc-hadoop-spark-pack/python/lib/pyspark.zip/pyspark/rdd.py", line 1506, in saveAsTextFile
File "/home/work/tars/infra-client-1.1/bin/current/c3prc-hadoop-spark-pack/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
File "/home/work/tars/infra-client-1.1/bin/current/c3prc-hadoop-spark-pack/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o152.saveAsTextFile.
: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://c3prc-hadoop/user/s_miui_whetstone/Statistics/Stability/number already exists
4. 在云上的文件在使用前要删除,使该文件不存在
如果要多次写入文件,要下把rdd 进行union处理,再一起写入文件
二: mongodb
1. mongodb的启动方法:
sudo systemctl start mongodb
启动 mongodb 不用使用直接启动,要已服务的形式:否则一堆写入权限的问题
2. 常用工具:
客服端应用:mongo 交互查询等/ mongstat看数据库的写入速度等
常用命令看help
3. mongodb写入速度慢:
3.1 MongoClient("mongodb://" + ip + ":27017", maxPoolSize=200)
创建MongoClient 时要使用参数 maxPoolSize
3.2 使用index
self.collection.create_index([('model', DESCENDING), ('version', DESCENDING), ('bn', DESCENDING), ('imei', DESCENDING)],
name=index_name, unique=True, background=True)
3.3 使用bulk
4. mongodb 调试:
try:
bulk.execute()
except errors.BulkWriteError as bwe:
print bwe.details
print traceback.format_exc()