pyspark:
AttributeError: 'NoneType' object has no attribute 'setCallSite'
我草,是pyspark的bug。解决方法:
print("Approximately joining on distance smaller than 0.6:") distance_min = model.approxSimilarityJoin(imsi_proc_df, imsi_proc_df, 1e6, distCol="JaccardDistance") \ .select(col("datasetA.id").alias("idA"), col("datasetB.id").alias("idB"), col("JaccardDistance"))