- 导读:https://clickhouse.com/docs/en/engines/table-engines/special/file/
- 创建clickhouse_sink表并插入数据。
SELECT * FROM clickhouse_sink
userId|score|EventDate |
------+-----+----------+
1| 1|1970-01-01|
2| 3|2022-06-26|
- 创建File引擎表。
CREATE TABLE file_table (userId UInt64, score UInt8,EventDate Date) ENGINE=File(Parquet)
- 将表数据写入File引擎表
insert into file_table SELECT * FROM clickhouse_sink
- file_table 表数据查询。
select * from file_table
userId|score|EventDate |
------+-----+----------+
1| 1|1970-01-01|
2| 3|2022-06-26|
- 导出的Parquet文件查看、
➜ file_table pwd
/your_clickhouse_data_path/yourdatabasename/file_table
➜ file_table ls
data.Parquet
- Spark读取Parquet文件。
//1. create path
val path = "/your_clickhouse_data_path/yourdatabasename/file_table/data.Parquet"
//2:create SparkSession
val spark = SparkSession
.builder()
.appName("SparkBatchDemo")
.master("local")
//.master("yarn")
// .config("spark.some.config.option", "some-value")
.getOrCreate()
//3:read clickhouse data from path
val df = spark.read.parquet(path)
//4.1: Displays the content of the DataFrame to stdout
df.show()
df.printSchema()
运行结果如下:
+------+-----+---------+
|userId|score|EventDate|
+------+-----+---------+
| 1| 1| 0|
| 2| 3| 19169|
+------+-----+---------+
root
|-- userId: decimal(20,0) (nullable = true)
|-- score: short (nullable = true)
|-- EventDate: integer (nullable = true)