spark sql 数据类型转换_spark dataframe 类型转换

最新推荐文章于 2023-04-04 06:00:00 发布

weixin_39692623

最新推荐文章于 2023-04-04 06:00:00 发布

阅读量652

点赞数

文章标签： spark sql 数据类型转换

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_39692623/article/details/111481859

版权

本文介绍如何在Spark中进行DataFrame的数据类型转换，特别是将数据转换为二值化特征。通过使用`cast()`函数直接在DataFrame上转换列类型，简化了之前需要遍历RDD再创建DataFrame的复杂过程。示例中展示了将`age`列转换为DoubleType，以及根据`race`列值进行二值化的操作。

摘要由CSDN通过智能技术生成

读一张表，对其进行二值化特征转换。可以二值化要求输入类型必须double类型，类型怎么转换呢？

直接利用spark column 就可以进行转换：

DataFrame dataset= hive.sql("select age,sex,race from hive_race_sex_bucktizer ");

/**

* 类型转换

*/

dataset = dataset.select(dataset.col("age").cast(DoubleType).as("age"),dataset.col("sex"),dataset.col("race"));

是不是很简单。想起之前的类型转换做法，遍历并创建另外一个满足类型要求的RDD，然后根据RDD创建Datafame，好复杂！！！！

JavaRDD parseDataset = dataset.toJavaRDD().map(new Function() {

@Override

public Row call(Row row) throws Exception {

System.out.println(row);

long age = row.getLong(row.fieldIndex("age"));

String sex = row.getAs("sex");

String race =row.getAs("race");

double raceV = -1;

if("white".equalsIgnoreCase(race)){

raceV = 1;

} else if("black".equalsIgnoreCase(race)) {

raceV = 2;

} else if("yellow".equalsIgnoreCase(race)) {

raceV = 3;

} else if("Asian-Pac-Islander".equalsIgnoreCase(race)) {

raceV = 4;

}else if("Amer-Indian-Eskimo".equalsIgnoreCase(race)) {

raceV = 3;

}else {

raceV = 0;

}

return RowFactory.create(age,("male".equalsIgnoreCase(sex)?1:0),raceV);

}

});

StructType schema = new StructType(new StructField[]{

createStructField("_age", LongType, false),

createStructField("_sex", IntegerType, false),

createStructField("_race", DoubleType, false)

});

DataFrame df = hive.createDataFrame(parseDataset, schema);

不断探索，不断尝试！

weixin_39692623

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark sql 数据类型转换_spark dataframe 类型转换

读一张表，对其进行二值化特征转换。可以二值化要求输入类型必须double类型，类型怎么转换呢？直接利用spark column 就可以进行转换：DataFrame dataset= hive.sql("select age,sex,race from hive_race_sex_bucktizer ");/*** 类型转换*/dataset = dataset.select(dataset.col...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。