DataFrame和SparkSql使用区别

最新推荐文章于 2023-03-14 10:36:08 发布

LALALA3_3

最新推荐文章于 2023-03-14 10:36:08 发布

阅读量1.5k

点赞数 1

本文链接：https://blog.csdn.net/LALALA3_3/article/details/103387328

版权

启动交互shell
[root@hdp-1 bin]# ./spark-shell --master spark://hdp-2:7077 --executor-memory 500m --total-executor-cores 1

–master spark://hdp-2:7077 sparkmaster节点的地址

–executor-memory 500m

–total-executor-cores 1

（spark中已经创建好了SparkContext和SQLContext对象）
加载数据

vi datajson.txt
{“name”:“Michael”}
{“name”:“Andy”, “age”:30}
{“name”:“Justin”, “age”:19}

scala> val jsondf=spark.sqlContext.read.json("hdfs://hdp-0:9000/spark/datajson.txt")
jsondf: org.apache.spark.sql.DataFrame = [age: bigint, name: string]            

scala>

scala> jsondf.show()
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+

DataFrame的使用

A DataFrame is a distributed collection of data organized into named columns.

DataFrame是一个分布式数据组织成命名列的集合。

scala> jsondf.select(jsondf("name"),jsondf("age")).show()
+-------+----+
|   name| age|
+-------+----+
|Michael|null|
|   Andy

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

LALALA3_3

关注关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
DataFrame和SparkSql使用区别

启动交互shell[root@hdp-1 bin]# ./spark-shell --master spark://hdp-2:7077 --executor-memory 500m --total-executor-cores 1–master spark://hdp-2:7077 sparkmaster节点的地址–executor-memory 500m–total-executor...
复制链接

扫一扫