Spark Sql 总结

1. 创建一个JavaSparkContext :
    SparkConf conf = new SparkConf().setAppName(appName).setMaster(master);
JavaSparkContext sc = new JavaSparkContext(conf);

2. 创建rdd:
   (1) parallelize 一个Collection 
  List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
  JavaRDD<Integer> distData = sc.parallelize(data);
  
   (2) JavaRDD<String> distFile = sc.textFile("data.txt");
       读取data.txt的每一行内容
  
3.  RDD的方法。参考:http://blog.csdn.net/lxxc11/article/details/51333088


4.  创建一个SQLContext 
    JavaSparkContext sc = ...; // An existing JavaSparkContext.
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);

5.  创建DataFrame
(1) 
    DataFrame df = sqlContext.jsonFile("examples/src/main/resources/people.json");
(2)
DataFrame df = sqlContext.sql("select * from table ");

6.  将RDD转为DataFrame:
(1) 
JavaRDD<Person> people = sc.textFile("examples/src/main/resources/people.txt").map(
 new Function<String, Person>() {
public Person call(String line) throws Exception {
 String[] parts = line.split(",");


 Person person = new Person();
 person.setName(parts[0]);
 person.setAge(Integer.parseInt(parts[1].trim()));


 return person;
}
 });


DataFrame schemaPeople = sqlContext.createDataFrame(people, Person.class);


(2)
JavaRDD<String> people = sc.textFile("examples/src/main/resources/people.txt");
// The schema is encoded in a string
String schemaString = "name age";


// Generate the schema based on the string of schema
List<StructField> fields = new ArrayList<StructField>();
for (String fieldName: schemaString.split(" ")) {
 fields.add(DataType.createStructField(fieldName, DataType.StringType, true));
}
StructType schema = DataType.createStructType(fields);


// Convert records of the RDD (people) to Rows.
JavaRDD<Row> rowRDD = people.map(
 new Function<String, Row>() {
public Row call(String record) throws Exception {
 String[] fields = record.split(",");
 return Row.create(fields[0], fields[1].trim());
}
 });


// Apply the schema to the RDD.
DataFrame peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema);


(3)
List<String> jsonData = Arrays.asList("{\"name\":\"Yin\",\"address\":{\"city\":\"Columbus\",\"state\":\"Ohio\"}}");
JavaRDD<String> anotherPeopleRDD = sc.parallelize(jsonData);
DataFrame anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD);




   
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值