问题一:直接在命令行创建的parquet格式的表通过spark saveAsTable 无法写入
1.建表语句
CREATE TABLE parquet_test (
name string,
sex string,
age int
)
STORED AS PARQUET;
2.查看表结构
3.通过代码直接save
//save 主要代码
sparksession.createDataFrame(rdd1).write.mode("append").saveAsTable("parquet_test")
//因为spark默认格式为parquet,所以format("parquet")写于不写影响不大
//sparksession.createDataFrame(rdd1).write.format("parquet").mode("append").saveAsTable("parquet_test")
直接save发现会报错,然后将写入的表名字换掉让spark自动去建表,然后去查看和上边的表有什么不同
4.查看spark自动建表的表结构
5.根据不同的报错信息对表结构进行修改
//报错信息
Exception in thread "main" org.apache.spark.sql.AnalysisException: The format of the existing table db_src.parquet_test is `HiveFileFormat`. It doesn't match the specified format `ParquetFileFormat`.;
//解决办法
ALTER TABLE parquet_test SET TBLPROPERTIES ('spark.sql.sources.provider'='parquet');
//报错信息
Exception in thread "main" org.apache.spark.sql.AnalysisException: The column number of the existing table db_src.parquet_test(struct<>) doesn't match the data schema(struct<name:string,sex:string,age:int>);
//解决办法
ALTER TABLE parquet_test SET TBLPROPERTIES ('spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fields\":[{\"name\":\"name\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"sex\",\"type\":\"string\",\"nullable\":true,\"metadata\":{}},{\"name\":\"age\",\"type\":\"integer\",\"nullable\":true,\"metadata\":{}}]}');
//报错信息
Exception in thread "main" org.apache.spark.sql.AnalysisException: Could not read schema from the hive metastore because it is corrupted.;
//解决办法
ALTER TABLE parquet_test SET TBLPROPERT