python spark dataframe_[Spark][Python][DataFrame][Write]DataFrame写入的例子

这篇博客详细记录了使用Python的Spark DataFrame将数据写入Parquet文件的过程,包括内存管理、任务调度、Parquet输出配置以及数据压缩等步骤。展示了Spark如何高效地处理和存储数据。
摘要由CSDN通过智能技术生成

17/10/07 00:58:18 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 65.5 KB, free 338.2 KB)

17/10/07 00:58:18 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 21.4 KB, free 359.6 KB)

17/10/07 00:58:18 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:59616 (size: 21.4 KB, free: 208.8 MB)

17/10/07 00:58:18 INFO spark.SparkContext: Created broadcast 2 from saveAsTable at NativeMethodAccessorImpl.java:-2

17/10/07 00:58:18 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 251.1 KB, free 610.7 KB)

17/10/07 00:58:18 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 21.6 KB, free 632.4 KB)

17/10/07 00:58:18 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:59616 (size: 21.6 KB, free: 208.7 MB)

17/10/07 00:58:18 INFO spark.SparkContext: Created broadcast 3 from saveAsTable at NativeMethodAccessorImpl.java:-2

17/10/07 00:58:19 INFO parquet.ParquetRelation: Using default output committer for Parquet: parquet.hadoop.ParquetOutputCommitter

17/10/07 00:58:19 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

17/10/07 00:58:19 INFO datasources.DynamicPartitionWriterContainer: Using user defined output committer class parquet.hadoop.ParquetOutputCommitter

17/10/07 00:58:19 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

17/10/07 00:58:19 INFO mapred.FileInputFormat: Total input paths to process : 1

17/10/07 00:58:19 INFO spark.SparkContext: Starting job: saveAsTable at NativeMethodAccessorImpl.java:-2

17/10/07 00:58:19 INFO scheduler.DAGScheduler: Got job 1 (saveAsTable at NativeMethodAccessorImpl.java:-2) with 1 output partitions

17/10/07 00:58:19 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (saveAsTable at NativeMethodAccessorImpl.java:-2)

17/10/07 00:58:19 INFO scheduler.DAGScheduler: Parents of final stage: List()

17/10/07 00:58:19 INFO scheduler.DAGScheduler: Missing parents: List()

17/10/07 00:58:19 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[7] at saveAsTable at NativeMethodAccessorImpl.java:-2), which has no missing parents

17/10/07 00:58:19 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 72.7 KB, free 705.0 KB)

17/10/07 00:58:20 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 26.4 KB, free 731.4 KB)

17/10/07 00:58:20 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:59616 (size: 26.4 KB, free: 208.7 MB)

17/10/07 00:58:20 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006

17/10/07 00:58:20 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[7] at saveAsTable at NativeMethodAccessorImpl.java:-2)

17/10/07 00:58:20 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks

17/10/07 00:58:20 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,PROCESS_LOCAL, 2149 bytes)

17/10/07 00:58:20 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)

17/10/07 00:58:20 INFO rdd.HadoopRDD: Input split: hdfs://localhost:8020/user/training/people.json:0+179

17/10/07 00:58:20 INFO codegen.GenerateUnsafeProjectio

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值