报错:
Failed with exception java.io.IOException:java.io.IOException: Unable to calculate input splits: not authorized on loan to execute command { splitVector: "loan.operatorMongoModel", keyPattern: { _id: 1 }, min: {}, max: {}, force: false, maxChunkSize: 8 }
可以看出是splitVector权限的问题,因为Spark在拆分非分片集合时需要splitVector命令的,该命令仅限于管理员用户。mongo.input.split.create_input_splits的默认设置是true,也就是会对数据进行拆分,根据集群数,cpu核数然后将数据进行拆分成多个InputSplits,以允许Hadoop并行处理,也就是说,Hadoop为每个映射器分配一个InputSplits。
解决方法:
db.createRole({role: "hadoopSplitVector",privileges: [{resource: {db: "loan",collection: "operatorMongoModel"},actions: ["splitVector"]}],roles:[]})
db.updateUser("mintq",{roles: [{role:"read",db:"loan"},{role:"hadoopSplitVector", db:"loan"}]})