一些兴趣点:
1)您不需要数据帧来加载您的json架构 . 模式在驱动程序上加载和执行,因为不需要分发那些不必要的开销
2)我构造了一个JColumn对象的List,并将它传递给StructType以动态构造模式
3)inferSchema应该是false,因为我们明确定义了schema
4)我假设您的数据库表使用“null”表示空值
5)调整映射修改typeMapping
import org.json4s._
import org.json4s.native.JsonMethods
case class JColumn(trim: Boolean, name: String, nullable: Boolean, id: Option[String], position: BigInt, table: String, _type: String, primaryKey: Boolean)
val path = """your_path\schema.json"""
val input = scala.io.Source.fromFile(path)
val json = JsonMethods.parse(input.reader())
val typeMapping = Map(
"double" -> DoubleType,
"integer" -> IntegerType,
"string" -> StringType,
"date" -> DateType,
"bool" -> BooleanType)
var rddSchema = ListBuffer[StructField]()
implicit val formats = DefaultFormats
val schema = json.extract[Array[JColumn]]
//schema.foreach(c => println(s"name:${c.name} type:${c._type} isnullable:${c.nullable}"))
schema.foreach { c =>
rddSchema += StructField(c.name, typeMapping(c._type), c.nullable, Metadata.empty)
}
val in_emp = spark.read
.format("com.databricks.spark.csv")
.schema(StructType(rddSchema.toList))
.option("inferSchema", "false")
.option("dateFormat", "yyyy.MM.dd")
.option("header", "false")
.option("delimiter", ",")
.option("nullValue", "null")
.option("treatEmptyValuesAsNulls", "true")
.csv("""your_path\employee.csv""")
in_emp.printSchema()
in_emp.collect()
in_emp.show()
我使用以下模式进行测试:
[
{
"trim": true,
"name": "id",
"nullable": true,
"id": null,
"position": 0,
"table": "employee",
"_type": "integer",
"primaryKey": true
},
{
"trim": true,
"name": "salary",
"nullable": true,
"id": null,
"position": 1,
"table": "employee",
"_type": "double",
"primaryKey": false
},
{
"trim": true,
"name": "dob",
"nullable": true,
"id": null,
"position": 2,
"table": "employee",
"_type": "date",
"primaryKey": false
},
{
"trim": true,
"name": "department",
"nullable": true,
"id": null,
"position": 3,
"table": "employee",
"_type": "string",
"primaryKey": false
}
]
以及下一个数据(employee.csv):
1211,3500.0,null,marketing
1212,3000.0,2016.12.08,IT
1213,4000.0,2017.10.20,HR
1214,3000.0,2017.10.20,finance