ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling
Resolutions:
1.Improve sample ratio, e.g.
sqlContext.createDataFrame(rdd, samplingRatio=0.2)
2.Tell spark the explicit schema, e.g.
from pyspark.sql.types import *
schema = StructType([
StructField("column_1", StringType(), True),
StructField("column_2", IntegerType(), True)
])
df = sqlContext.createDataFrame(rdd, schema=schema)