DF保存到mysql中或者保存成.csv .json parquet文件

DataFrame保存到mysql

import java.util.Properties

import cn.doit.sparksql.day01.utils.SparkUtils
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}

/**
 * @description:DataFrame保存到mysql
 **/
object DFSaveMysql {

  def main(args: Array[String]): Unit = {

    val spark: SparkSession = SparkUtils.getSparkSession()
    import spark.implicits._

    // 获得一个DF
    val frame: DataFrame = spark.read
      .options(Map("header" -> "true", "inferSchema" -> "true"))
      .csv("doc/stu2.csv")
    frame.printSchema()
    frame.show()

    val pro = new Properties()
    pro.setProperty("user", "root")
    pro.setProperty("password", "123456")

    //
    // Table or view 'people' already exists. SaveMode: ErrorIfExists.;
    /***
     * def mode(saveMode: SaveMode)
     *
     * 警告:Wed Jan 01 15:02:58 CST 2020 WARN: Establishing SSL connection without server's
     *      identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+
     *      requirements SSL connection must be established by default if explicit option isn't set.
     *      For compliance with existing applications not using SSL the verifyServerCertificate property
     *      is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false,
     *      or set useSSL=true and provide truststore for server certificate verification.
     *
     * 异常:Exception in thread "main" org.apache.spark.sql.AnalysisException:Table or view 'people' already exists. SaveMode: ErrorIfExists.;
     */
    frame.write.mode(SaveMode.Append).jdbc("jdbc:mysql://localhost:3306/bigdata", "people", pro)

  }
}

DataFrame保存成.csv .json parquet文件的代码实现

import cn.doit.sparksql.day01.utils.SparkUtils
import org.apache.spark.sql.{DataFrame, SparkSession}

/**
 * DataFrame保存成.csv .json parquet文件
 **/
object DFSaveFiles {

  def main(args: Array[String]): Unit = {

    val spark: SparkSession = SparkUtils.getSparkSession()
    import spark.implicits._

    // 获得一个DF
    val frame: DataFrame = spark.read
      .options(Map("header" -> "true", "inferSchema" -> "true"))
      .csv("doc/stu2.csv")
    frame.printSchema()
    frame.show()

    /** *
     * 保存为json
     * 保存为CSV 没有头文件
     * 保存为CSV 有头文件
     * 保存为parquet
     * 保存为 text ,这个函数只能保存单列
     * frame.write.text 的保存信息如下:
     * Text data source supports only a single column, and you have 6 columns.;
     */
    //    frame.write.json("doc/output/json")
    //    frame.write.csv("doc/output/csv1")
    //    frame.write.option("header", true).csv("doc/output/csv2")
    //    frame.write.parquet("doc/output/parquet")

    import org.apache.spark.sql.functions._

    // def concat_ws(sep: String, exprs: Column*)
    //    frame.write.text("doc/output/text")
//    frame.selectExpr("concat_ws('\t' , id , name , age ,sex ,city,score )").write.text("doc/output/text")
  }
}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
您可以使用Apache POI和Apache Parquet库来将Java的Excel文件转换为Parquet文件。首先,您需要使用POI库读取Excel文件的数据,然后将其转换为Parquet格式并写入Parquet文件。您可以使用ParquetWriter类来写入Parquet文件。以下是一个示例代码片段: ``` // 导入所需的库 import org.apache.poi.ss.usermodel.*; import org.apache.poi.xssf.usermodel.XSSFWorkbook; import org.apache.parquet.hadoop.ParquetWriter; import org.apache.parquet.hadoop.metadata.CompressionCodecName; import org.apache.parquet.schema.MessageType; import org.apache.parquet.schema.MessageTypeParser; import org.apache.parquet.schema.Types; // 读取Excel文件 Workbook workbook = new XSSFWorkbook(new FileInputStream("input.xlsx")); Sheet sheet = workbook.getSheetAt(); // 定义Parquet文件的模式 MessageType schema = MessageTypeParser.parseMessageType("message ExcelData {\n" + " required binary column1;\n" + " required binary column2;\n" + "}"); // 创建Parquet文件的写入器 ParquetWriter<GenericRecord> writer = AvroParquetWriter.<GenericRecord>builder(new Path("output.parquet")) .withSchema(schema) .withCompressionCodec(CompressionCodecName.SNAPPY) .withDataModel(GenericData.get()) .build(); // 将Excel数据转换为Parquet格式并写入Parquet文件 for (Row row : sheet) { GenericRecord record = new GenericData.Record(schema); record.put("column1", row.getCell().getStringCellValue()); record.put("column2", row.getCell(1).getStringCellValue()); writer.write(record); } // 关闭写入器 writer.close(); ``` 请注意,您需要将上述代码的输入和输出文件路径替换为您自己的路径。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值