CDH集群上部署Python3环境及运行Pyspark作业

Anaconda与Python版本对应关系表

Anaconda2/3Python23Python2
5.2.03.6.52.7.14
5.1.03.6.42.7.14
5.0.13.6.32.7.14
5.0.03.6.22.7.13
4.4.03.6.12.7.13
4.3.13.6.02.7.13
4.3.03.6.02.7.13
4.2.03.5.22.7.12
4.1.13.5.22.7.12
4.1.03.5.12.7.11
4.0.03.5.12.7.11
  1. 下载anaconda安装包

    wget https://repo.continuum.io/archive/Anaconda3-4.4.0-Linux-x86_64.sh
    
  2. 安装anaconda
    bash Anaconda3-4.4.0-Linux-x86_64.sh

    按回车键

    [root@node00 ~]# bash Anaconda3-4.4.0-Linux-x86_64.sh 
    
    Welcome to Anaconda3 4.4.0 (by Continuum Analytics, Inc.)
    
    In order to continue the installation process, please review the license
    agreement.
    Please, press ENTER to continue
    >>>                                                                                            # (按回车键)
    ===================================
    Anaconda End User License Agreement
    ===================================
    

    输入yes

    Copyright 2017, Continuum Analytics, Inc.
    ...      																						# 省略
    kerberos (krb5, non-Windows platforms)
    A network authentication protocol designed to provide strong authentication
    for client/server applications by using secret-key cryptography.
    
    cryptography
    A Python library which exposes cryptographic recipes and primitives.
    
    Do you approve the license terms? [yes|no]
    >>> yes 																					  # 输入 yes
    Anaconda3 will now be installed into this location:
    /root/anaconda3
    

    输入安装路径 /opt/cloudera/anaconda3
    如果提示“tar (child): bzip2: Cannot exec: No such file or directory”,需要先安装bzip2。sudo yum -y install bzip2

      - Press ENTER to confirm the location
      - Press CTRL-C to abort the installation
      - Or specify a different location below
    
    [/root/anaconda3] >>> /opt/cloudera/anaconda3         # 输入安装路径 /opt/cloudera/anaconda3
    PREFIX=/opt/cloudera/anaconda3
    installing: python-3.6.1-2 ...
    installing: _license-1.1-py36_1 ...
    

    设置anaconda的PATH路径:
    为了确保pyspark任务提交后使用python3,故输入no,重新设置PATH

    installing: alabaster-0.7.10-py36_0 ...
    ...       																			# 省略
    installing: zlib-1.2.8-3 ...
    installing: anaconda-4.4.0-np112py36_0 ...
    installing: conda-4.3.21-py36_0 ...
    installing: conda-env-2.6.0-0 ...
    Python 3.6.1 :: Continuum Analytics, Inc.
    creating default environment...
    installation finished.
    Do you wish the installer to prepend the Anaconda3 install location
    to PATH in your /root/.bashrc ? [yes|no]
    [no] >>> no															  # 输入 no
    
    You may wish to edit your .bashrc or prepend the Anaconda3 install location:
    
    $ export PATH=/opt/cloudera/anaconda3/bin:$PATH
    
    Thank you for installing Anaconda3!
    
    Share your notebooks and packages on Anaconda Cloud!
    Sign up for free: https://anaconda.org
    
    
  3. 设置anaconda3的环境变量

    [root@node00 ~]# echo "export PATH=/opt/cloudera/anaconda3/bin:$PATH" >> /etc/profile
    [root@node00 ~]# source /etc/profile
    [root@node00 ~]# env |grep PATH
    PATH=/opt/cloudera/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
    
  4. 验证Python版本

    [root@node00 ~]# python
    Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:09:58) 
    [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    

    [root@node00 ~]# python -V
    Python 3.6.1 :: Anaconda 4.4.0 (64-bit)
    
  5. 在CM配置Spark的Python环境

    export PYSPARK_PYTHON=/opt/cloudera/anaconda3/bin/python
    export PYSPARK_DRIVER_PYTHON=/opt/cloudera/anaconda3/bin/python
    

    在这里插入图片描述
    重启相关服务。

  6. 使用Pyspark命令测试

    x = sc.parallelize([1,2,3])
    y = x.flatMap(lambda x: (x, 100*x, x**2))
    print(x.collect())
    print(y.collect())
    
    root@bigdata-dev-41:/home/charles# pyspark
    Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:09:58) 
    [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel).
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 1.6.0
          /_/
    
    Using Python version 3.6.1 (default, May 11 2017 13:09:58)
    SparkContext available as sc, HiveContext available as sqlContext.
    >>> x = sc.parallelize([1,2,3])
    >>> y = x.flatMap(lambda x: (x, 100*x, x**2))
    >>> print(x.collect())
    [1, 2, 3]                                                                       
    >>> print(y.collect())
    [1, 100, 1, 2, 200, 4, 3, 300, 9]                                               
    >>> 
    
    

    参考:
    https://mp.weixin.qq.com/s?__biz=MzI4OTY3MTUyNg==&mid=2247496668&idx=1&sn=4461854378270ea0741e91047a541b9b&chksm=ec2923d5db5eaac30108f19e44ea763ea6e06f26089437ef9b6a1f44204ed1e259efd2fc59ef&scene=21#wechat_redirect

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值