sparkSQL

spark 专栏收录该内容
3 篇文章 0 订阅

 

配置scala:

 vi  /etc/profile

   export SCALA_HOME=/home/bigdata/scala

   export PATH=$PATH:$SCALA_HOME/bin

   source /etc/profile

#scala -version

  Scala code runner version 2.9.3 -- Copyright 2002-2011, LAMP/EPFL

报错:

WARN component.AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server@655f272d: java.net.BindException: Address already in use

java.net.BindException: Address already in use

解决:杀死遗留僵尸进程spark-submit   ps -ef | grep defunct_process_pid

Spark-shell运行成功验证程序:

1 object HelloWorld {

2     def main(args: Array[String]) {

3        System.out.println("HelloWorld");

4     }

5   }

运行spark-shell

#./bin/spark-shell master=yarn-client

 

ctrl+D停止spark-shell

安装的是spark-1.0.0-bin-cdh4.tgz

编辑Spark-env.sh

export HADOOP_HOME=/usr/lib/hadoop

export HADOOP_CONF_DIR=/etc/hadoop/conf

export SPARK_EXECUTOR_INSTANCES=3

export SPARK_EXECUTOR_CORES=1

export SPARK_EXECUTOR_MEMORY=500m

export SPARK_DRIVER_MEMORY=512m

export SPARK_YARN_APP_NAME=Spark

export SPARK_YARN_QUEUE=default

export SCALA_HOME=/usr/lib/spark/scala-2.9.3

spark-default.conf

spark.yarn.applicationMaster.waitTries     10

spark.yarn.submit.file.replication           3

spark.yarn.preserve.staging.files            true

spark.yarn.scheduler.heartbeat.interval-ms   5000

spark.yarn.max.executor.failures         2*numExecutors

spark.yarn.historyServer.address          192.168.10.224

~

 

SparkSQL

1.准备数据employee.txt

1001,sophia,1

1002,cindy,2

1003,angela,3

1004,kimi,4

1005,tiny,5

将数据放入hdfs

# hdfs dfs -put employee.txt /user/

 显示数据:hadoop dfs –cat /user/employee.txt

2.启动spark shell

3.编写脚本
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
 
case class Employee(employeeId: Int, name: String, departmentId: Int)
 
// Create an RDD of Employee objects and register it as a table.
val employees = sc.textFile("hdfs://product/user/employee.txt").map(_.split(",")).map(p => Employee(p(0), p(1), p(2).trim.toInt))
employees.registerAsTable("employee")
 
// SQL statements can be run by using the sql methods provided by sqlContext.
val fsis = sql("SELECT name FROM employee WHERE departmentId = 1")
 
// The results of SQL queries are SchemaRDDs and support all the normal RDD operations.
// The columns of a row in the result can be accessed by ordinal.
fsis.map(t => "Name: " + t(0)).collect().foreach(println)

4.运行结果

Took 0.268434462

Name:sophia

  • 0
    点赞
  • 0
    评论
  • 0
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

©️2021 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值