通过Java代码实现Spark中RDD与Dateset(DataFrame)之间互相转换

1、导入maven依赖

 <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <java.version>1.8</java.version>
        <spark.version>2.1.0</spark.version>
        <scala.version>2.11</scala.version>
</properties>
<dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.version}</artifactId>
        <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.30</version>
        </dependency>
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.11</version>
        </dependency>
  </dependencies>

2、Dataset转换为RDD

public class SparkSql {
    public static void main(String[] args) {
        //1、创建sparksession对象
        SparkSession session = SparkSession.builder().master("local[1]").appName("Spark Sql").getOrCreate();
        //2、加载数据源并转换为DataFrame对象
        Dataset<Row> json = session.read().json("C:\\Users\\spectre\\Desktop\\stu");
        //3、查询
        //Dataset<Row> se = json.select("stuname","stuage","stusex").where("stuage>20");
        //se.show();
        json.createOrReplaceTempView("stus");
        Dataset<Row> dataset = session.sql("select stuname,stuage,stusex from stus");
        //将Dataset转换为javaRDD
        List<String> collect = dataset.javaRDD().map(new Function<Row, String>() {
            @Override
            public String call(Row row) throws Exception {
                return row.getString(0) + row.getLong(1) + row.getString(2);
            }
        }).collect();
        for (String s:collect) {
            System.out.println(s);
        }
    }
}

3、将RDD转换为Dataset

(反射机制)

SparkSession spark = SparkSession.builder().master("local[*]").appName("Spark").getOrCreate();
        final JavaSparkContext ctx = JavaSparkContext.fromSparkContext(spark.sparkContext());
        JavaRDD<String> source = spark.read().textFile("stuInfo.txt").javaRDD();

        JavaRDD<Student> rowRDD = source.map(new Function<String, Student>() {
            public Student call(String line) throws Exception {
                String parts[] = line.split(",");
                Student stu = new Student();
                stu.setSid(parts[0]);
                stu.setSname(parts[1]);
                stu.setSage(Integer.valueOf(parts[2]));
                return stu;
            }
        });

        Dataset<Row> df = spark.createDataFrame(rowRDD, Student.class);
        df.select("sid", "sname", "sage").coalesce(1).write().mode(SaveMode.Append).parquet("parquet.res");

使用schema生成方案

SparkSession spark = SparkSession.builder().master("local[*]").appName("Spark").getOrCreate();
        final JavaSparkContext ctx = JavaSparkContext.fromSparkContext(spark.sparkContext());
        JavaRDD<String> source = spark.read().textFile("stuInfo.txt").javaRDD();

        JavaRDD<Row> rowRDD = source.map(new Function<String, Row>() {
            public Row call(String line) throws Exception {
                String[] parts = line.split(",");
                String sid = parts[0];
                String sname = parts[1];
                int sage = Integer.parseInt(parts[2]);

                return RowFactory.create(sid, sname, sage);
            }
        });

        ArrayList<StructField> fields = new ArrayList<StructField>();
        StructField field = null;
        field = DataTypes.createStructField("sid", DataTypes.StringType, true);
        fields.add(field);
        field = DataTypes.createStructField("sname", DataTypes.StringType, true);
        fields.add(field);
        field = DataTypes.createStructField("sage", DataTypes.IntegerType, true);
        fields.add(field);

        StructType schema = DataTypes.createStructType(fields);

        Dataset<Row> df = spark.createDataFrame(rowRDD, schema);
        df.coalesce(1).write().mode(SaveMode.Append).parquet("parquet.res1");
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

De gros ado

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值