Spark入门--初学者

  • Step1:下载sparkhttp://mirrors.shu.edu.cn/apache/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz
  • Step2:将下载好的spark通过命令传输到linux环境下(rz命令,最好创建一个file,放到file中 mkdir opt)
  • Step3:cd  /opt中通过命令解压压缩包
  • tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz
  • Step4:cd  /etc中,vi profile文件, 将如下代码复制到文件中(注:目录有出处请修改)
  • #Spark enviroment
    export SPARK_HOME=/opt/spark/spark-2.3.0-bin-hadoop2.7/
    export PATH="$SPARK_HOME/bin:$PATH"
  • Step5:在/opt/spark/中新建一个spark_file_test文件夹
  • mkdir spark_file_test
  • Step6:在/opt/spark/spark_file_test中创建一个文件
    touch hello_spark
  • Step7编辑hello_spark文件,输入一些测试数据
  • vi hello_spark
    hello spark!
    hello spark!
    hello spark!
    hello spark!
  • Step8:回到cd  /opt/spark/spark-2.3.0-bin-hadoop2.7/bin目录中
  • Step9:输入spark-shell,出现下图中的info表示成功
  • 2018-04-30 09:35:53 WARN  Utils:66 - Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.159.128 instead (on interface eth0)
    2018-04-30 09:35:53 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
    2018-04-30 09:35:57 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Setting default log level to "WARN".Spark context Web UI available at http://192.168.159.128:4040
    Spark context available as 'sc' (master = local[*], app id = local-1524847005612).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.3.0
          /_/
             
    Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.8.0_171)
    Type in expressions to have them evaluated.
    Type :help for more information
  • Step10:读取文件,返回一个RDD
  • scala> var lines = sc.textFile("../../spark_file_test/hello_spark")
    2018-04-27 09:40:53 WARN  SizeEstimator:66 - Failed to check whether UseCompressedOops is set; assuming yes
    lines: org.apache.spark.rdd.RDD[String] = ../../spark_file_test/hello_spark MapPartitionsRDD[1] at textFile at <console>:24
    
  • Step11:测试,读取文件的行数和第一行的信息
  • scala> lines.count()
    res0: Long = 5                                                                  
    
    scala> lines.first
    res1: String = Hello Spark!

    !!!!!!!!!!!!!!!!!!!!!!!!SUCCESSFUL!!!!!!!!!!!!!!!!!!!!!!!!



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值