I have a CSV file with below data :
1,2,5
2,4
2,3
I want to load them into a Dataframe having schema of string of array
The output should be like below.
[1, 2, 5]
[2, 4]
[2, 3]
I want to make it happen in Java.
Please help
解决方案
Below is the sample code in Java. You need to read your file using spark.read().text(String path) method and then call the split function.
import static org.apache.spark.sql.functions.split;
public class SparkSample {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("SparkSample")
.master("local[*]")
.getOrCreate();
//Read file
Dataset ds = spark.read().text("c://tmp//sample.csv").toDF("value");
ds.show(false);
Dataset ds1 = ds.select(split(ds.col("value"), ",")).toDF("new_value");
ds1.show(false);
ds1.printSchema();
}
}