#发现有重复的ID,我们可能需要重新给每行数据分分配唯一的新的ID来标示它们
# 增加一个新列
df.withColumn('new_id', fn.monotonically_increasing_id()).show()
#withColums 新增一列
#monotonically_increasing_id 生成唯一自增ID
+---+------+------+---+------+-------------+
| id|weight|height|age|gender| new_id|
+---+------+------+---+------+-------------+
| 5| 133.2| 5.7| 54| F| 25769803776|
| 4| 144.5| 5.9| 33| M| 171798691840|
| 2| 167.2| 5.4| 45| M| 592705486848|
| 3| 124.1| 5.2| 23| F|1236950581248|
| 5| 129.2| 5.3| 42| M|1365799600128|
+---+------+------+---+------+-------------+
sparksql_monotonically_increasing_id 生成唯一自增ID
最新推荐文章于 2024-01-22 11:25:21 发布