java 多列 统计,给定列名列表,如何选择数据集的多列?

How can I select multiple columns of dataset ds in Spark 2.3 Java by passing a list argument?

For example, this works fine:

ds.select("col1","col2","col3").show();

However, this fails:

List columns = Arrays.toList("col1","col2","col3");

ds.select(columns.toString()).show()

解决方案

Using spark 2.4.0 you have to convert the List to Seq, and use selectExpr following spark documentation.

If you want to use select, you have to remove the first column from your list and add it as a parameter to select.

Please find the two versions :

Suppose that you have the following .csv file :

InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country

536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom

536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom

536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom

536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom

536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom

You can use this code to solve your issue:

import org.apache.spark.sql.Dataset;

import org.apache.spark.sql.Row;

import org.apache.spark.sql.SparkSession;

import java.util.Arrays;

import java.util.List;

import scala.collection.JavaConverters;

import scala.collection.Seq;

public class SparkJavaTest {

public static SparkSession spark = SparkSession

.builder()

.appName("JavaSparkTest")

.master("local")

.getOrCreate();

public static Seq convertListToSeq(List inputList) {

return JavaConverters.asScalaIteratorConverter(inputList.iterator()).asScala().toSeq();

}

public static void main(String[] args) {

Dataset ds = spark.read().option("header",true).csv("spark-file.csv");

List columns = Arrays.asList("InvoiceNo","StockCode","Description");

//using selectExpr

ds.selectExpr(convertListToSeq(columns)).show(false);

//using select => this first column will be added to select

List columns2 = Arrays.asList("StockCode","Description");

ds.select("InvoiceNo", convertListToSeq(columns2)).show(false);

}

}

Hope it helps :)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值