CDH6.3.2安装python3

背景:需要使用pyspark或者python去自动读取远程文件,但是CDH集群里面自带着python2.7.5,python3.0是以后的趋势,所以决定自己安装python3。以下的安装步骤是参照网上的步骤,实操是自己亲自操作的。

1.1 系统版本信息

[root@cdh06 soft]# lsb_release -a
LSB Version:	:core-4.1-amd64:core-4.1-noarch
Distributor ID:	CentOS
Description:	CentOS Linux release 7.6.1810 (Core) 
Release:	7.6.1810
Codename:	Core

2.1 spark和python 信息
环境是基于CDH平台配置,spark只有一个版本,系统里面查看是2.4.0,而python的版本系统自带的2.7.5。

[root@cdh06 soft]# pyspark
Python 2.7.5 (default, Jun 28 2022, 15:30:04) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0-cdh6.3.2
      /_/

Using Python version 2.7.5 (default, Jun 28 2022 15:30:04)
SparkSession available as 'spark'.
>>>
  1. 安装python 3.6环境
    目前pyspark支持到python3.6,所以本次就安装python3.6的版本。
    操作需要在Master 和slave节点都需要操作

2.1 安装 yum-utils
是yum的一个扩展插件
当然前提电脑之前已安装了yum
sudo yum -y install yum-utils

[root@cdh06 soft]# sudo yum -y install yum-utils
已加载插件:fastestmirror, langpacks
Repository cloudera-manager is listed more than once in the configuration
Determining fastest mirrors
 * base: mirrors.cn99.com
 * extras: ftp.sjtu.edu.cn
 * updates: mirrors.aliyun.com
base                                                                                                                                            | 3.6 kB  00:00:00     
cloudera-manager                                                                                                                                | 2.9 kB  00:00:00     
extras                                                                                                                                          | 2.9 kB  00:00:00     
updates                                                                                                                                         | 2.9 kB  00:00:00     
(1/2): extras/7/x86_64/primary_db                                                                                                               | 250 kB  00:00:00     
(2/2): updates/7/x86_64/primary_db                                                                                                              |  17 MB  00:00:01     
正在解决依赖关系
--> 正在检查事务
---> 软件包 yum-utils.noarch.0.1.1.31-50.el7 将被 升级
---> 软件包 yum-utils.noarch.0.1.1.31-54.el7_8 将被 更新
--> 解决依赖关系完成

依赖关系解决

=======================================================================================================================================================================
 Package                                 架构                                 版本                                            源                                  大小
=======================================================================================================================================================================
正在更新:
 yum-utils                               noarch                               1.1.31-54.el7_8                                 base                               122 k

事务概要
=======================================================================================================================================================================
升级  1 软件包

总下载量:122 k
Downloading packages:
No Presto metadata available for base
yum-utils-1.1.31-54.el7_8.noarch.rpm                                                                                                            | 122 kB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  正在更新    : yum-utils-1.1.31-54.el7_8.noarch                                                                                                                   1/2 
  清理        : yum-utils-1.1.31-50.el7.noarch                                                                                                                     2/2 
  验证中      : yum-utils-1.1.31-54.el7_8.noarch                                                                                                                   1/2 
  验证中      : yum-utils-1.1.31-50.el7.noarch                                                                                                                     2/2 

更新完毕:
  yum-utils.noarch 0:1.1.31-54.el7_8                                                                                                                                   

完毕!

2.2 安装centos的开发工具
这个工具是用来编译代码的作用

sudo yum -y groupinstall development

[root@cdh06 soft]# sudo yum -y groupinstall development
已加载插件:fastestmirror, langpacks
Repository cloudera-manager is listed more than once in the configuration
没有安装组信息文件
Maybe run: yum groups mark convert (see man yum)
Loading mirror speeds from cached hostfile
 * base: mirrors.cn99.com
 * extras: ftp.sjtu.edu.cn
 * updates: mirrors.aliyun.com
正在解决依赖关系
--> 正在检查事务
---> 软件包 autoconf.noarch.0.2.69-11.el7 将被 安装
--> 正在处理依赖关系 perl(Data::Dumper),它被软件包 autoconf-2.69-11.el7.noarch 需要
---> 软件包 automake.noarch.0.1.13.4-3.el7 将被 安装
--> 正在处理依赖关系 perl(Thread::Queue),它被软件包 automake-1.13.4-3.el7.noarch 需要
--> 正在处理依赖关系 perl(TAP::Parser),它被软件包 automake-1.13.4-3.el7.noarch 需要
---> 软件包 bison.x86_64.0.3.0.4-2.el7 将被 安装
---> 软件包 byacc.x86_64.0.1.9.20130304-3.el7 将被 安装
---> 软件包 cscope.x86_64.0.15.8-10.el7 将被 安装
---> 软件包 ctags.x86_64.0.5.8-13.el7 将被 安装
---> 软件包 diffstat.x86_64.0.1.57-4.el7 将被 安装
---> 软件包 doxygen.x86_64.1.1.8.5-4.el7 将被 安装
---> 软件包 flex.x86_64.0.2.5.37-6.el7 将被 安装
---> 软件包 gcc.x86_64.0.4.8.5-44.el7 将被 安装
--> 正在处理依赖关系 libgomp = 4.8.5-44.el7,它被软件包 gcc-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 cpp = 4.8.5-44.el7,它被软件包 gcc-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 libgcc >= 4.8.5-44.el7,它被软件包 gcc-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 glibc-devel >= 2.2.90-12,它被软件包 gcc-4.8.5-44.el7.x86_64 需要
---> 软件包 gcc-c++.x86_64.0.4.8.5-44.el7 将被 安装
--> 正在处理依赖关系 libstdc++-devel = 4.8.5-44.el7,它被软件包 gcc-c++-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 libstdc++ = 4.8.5-44.el7,它被软件包 gcc-c++-4.8.5-44.el7.x86_64 需要
---> 软件包 gcc-gfortran.x86_64.0.4.8.5-44.el7 将被 安装
--> 正在处理依赖关系 libquadmath-devel = 4.8.5-44.el7,它被软件包 gcc-gfortran-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 libquadmath = 4.8.5-44.el7,它被软件包 gcc-gfortran-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 libgfortran = 4.8.5-44.el7,它被软件包 gcc-gfortran-4.8.5-44.el7.x86_64 需要
--> 正在处理依赖关系 libgfortran.so.3()(64bit),它被软件包 gcc-gfortran-4.8.5-44.el7.x86_64 需要
---> 软件包 git.x86_64.0.1.8.3.1-23.el7_8 将被 安装
--> 正在处理依赖关系 perl-Git = 1.8.3.1-23.el7_8,它被软件包 git-1.8.3.1-23.el7_8.x86_64 需要
--> 正在处理依赖关系 perl(Term::ReadKey),它被软件包 git-1.8.3.1-23.el7_8.x86_64 需要
--> 正在处理依赖关系 perl(Git),它被软件包 git-1.8.3.1-23.el7_8.x86_64 需要
--> 正在处理依赖关系 perl(Error),它被软件包 git-1.8.3.1-23.el7_8.x86_64 需要
---> 软件包 indent.x86_64.0.2.2.11-13.el7 将被 安装
---> 软件包 intltool.noarch.0.0.50.2-7.el7 将被 安装
--> 正在处理依赖关系 perl(XML::Parser),它被软件包 intltool-0.50.2-7.el7.noarch 需要
--> 正在处理依赖关系 gettext-devel,它被软件包 intltool-0.50.2-7.el7.noarch 需要
---> 软件包 libtool.x86_64.0.2.4.2-22.el7_3 将被 安装
........
//总共24个包需要下载。10个包需要更新

................

已安装:
  autoconf.noarch 0:2.69-11.el7         automake.noarch 0:1.13.4-3.el7        bison.x86_64 0:3.0.4-2.el7                         byacc.x86_64 0:1.9.20130304-3.el7     
  cscope.x86_64 0:15.8-10.el7           ctags.x86_64 0:5.8-13.el7             diffstat.x86_64 0:1.57-4.el7                       doxygen.x86_64 1:1.8.5-4.el7          
  flex.x86_64 0:2.5.37-6.el7            gcc.x86_64 0:4.8.5-44.el7             gcc-c++.x86_64 0:4.8.5-44.el7                      gcc-gfortran.x86_64 0:4.8.5-44.el7    
  git.x86_64 0:1.8.3.1-23.el7_8         indent.x86_64 0:2.2.11-13.el7         intltool.noarch 0:0.50.2-7.el7                     libtool.x86_64 0:2.4.2-22.el7_3       
  patchutils.x86_64 0:0.3.3-5.el7_9     rcs.x86_64 0:5.9.0-7.el7              redhat-rpm-config.noarch 0:9.1.0-88.el7.centos     rpm-build.x86_64 0:4.11.3-48.el7_9    
  rpm-sign.x86_64 0:4.11.3-48.el7_9     subversion.x86_64 0:1.7.14-16.el7     swig.x86_64 0:2.0.10-5.el7                         systemtap.x86_64 0:4.0-13.el7         

作为依赖被安装:
  cpp.x86_64 0:4.8.5-44.el7                                 dwz.x86_64 0:0.11-3.el7                               gettext-common-devel.noarch 0:0.19.8.1-3.el7         
  gettext-devel.x86_64 0:0.19.8.1-3.el7                     glibc-devel.x86_64 0:2.17-326.el7_9                   glibc-headers.x86_64 0:2.17-326.el7_9                
  kernel-debug-devel.x86_64 0:3.10.0-1160.76.1.el7          kernel-headers.x86_64 0:3.10.0-1160.76.1.el7          libgfortran.x86_64 0:4.8.5-44.el7                    
  libquadmath.x86_64 0:4.8.5-44.el7                         libquadmath-devel.x86_64 0:4.8.5-44.el7               libstdc++-devel.x86_64 0:4.8.5-44.el7                
  perl-Data-Dumper.x86_64 0:2.145-3.el7                     perl-Error.noarch 1:0.17020-2.el7                     perl-Git.noarch 0:1.8.3.1-23.el7_8                   
  perl-TermReadKey.x86_64 0:2.30-20.el7                     perl-Test-Harness.noarch 0:3.28-3.el7                 perl-Thread-Queue.noarch 0:3.02-2.el7                
  perl-XML-Parser.x86_64 0:2.41-10.el7                      perl-srpm-macros.noarch 0:1-8.el7                     subversion-libs.x86_64 0:1.7.14-16.el7               
  systemtap-client.x86_64 0:4.0-13.el7                      systemtap-devel.x86_64 0:4.0-13.el7                  

作为依赖被升级:
  gettext.x86_64 0:0.19.8.1-3.el7         gettext-libs.x86_64 0:0.19.8.1-3.el7      libgcc.x86_64 0:4.8.5-44.el7                libgomp.x86_64 0:4.8.5-44.el7        
  libstdc++.x86_64 0:4.8.5-44.el7         rpm.x86_64 0:4.11.3-48.el7_9              rpm-build-libs.x86_64 0:4.11.3-48.el7_9     rpm-libs.x86_64 0:4.11.3-48.el7_9    
  rpm-python.x86_64 0:4.11.3-48.el7_9     systemtap-runtime.x86_64 0:4.0-13.el7    

完毕!

2.3 安装iUS第三方包
安装这个包是为了通过yum安装软件是,可以获得最新软件版本
sudo yum -y install https://repo.ius.io/ius-release-el7.rpm

[root@cdh06 soft]# sudo yum -y install https://repo.ius.io/ius-release-el7.rpm
已加载插件:fastestmirror, langpacks
Repository cloudera-manager is listed more than once in the configuration
ius-release-el7.rpm                                                                                                                             | 8.2 kB  00:00:00     
正在检查 /var/tmp/yum-root-udneDv/ius-release-el7.rpm: ius-release-2-1.el7.ius.noarch
/var/tmp/yum-root-udneDv/ius-release-el7.rpm 将被安装
正在解决依赖关系
--> 正在检查事务
---> 软件包 ius-release.noarch.0.2-1.el7.ius 将被 安装
--> 正在处理依赖关系 epel-release = 7,它被软件包 ius-release-2-1.el7.ius.noarch 需要
Loading mirror speeds from cached hostfile
 * base: mirrors.cn99.com
 * extras: ftp.sjtu.edu.cn
 * updates: mirrors.aliyun.com
--> 正在检查事务
---> 软件包 epel-release.noarch.0.7-11 将被 安装
--> 解决依赖关系完成

依赖关系解决

=======================================================================================================================================================================
 Package                                 架构                              版本                                      源                                           大小
=======================================================================================================================================================================
正在安装:
 ius-release                             noarch                            2-1.el7.ius                               /ius-release-el7                            4.5 k
为依赖而安装:
 epel-release                            noarch                            7-11                                      extras                                       15 k

事务概要
=======================================================================================================================================================================
安装  1 软件包 (+1 依赖软件包)

总计:19 k
总下载量:15 k
安装大小:29 k
Downloading packages:
epel-release-7-11.noarch.rpm                                                                                                                    |  15 kB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  正在安装    : epel-release-7-11.noarch                                                                                                                           1/2 
  正在安装    : ius-release-2-1.el7.ius.noarch                                                                                                                     2/2 
  验证中      : ius-release-2-1.el7.ius.noarch                                                                                                                     1/2 
  验证中      : epel-release-7-11.noarch                                                                                                                           2/2 

已安装:
  ius-release.noarch 0:2-1.el7.ius                                                                                                                                     

作为依赖被安装:
  epel-release.noarch 0:7-11                                                                                                                                           

完毕!

2.4 安装python

1.安装python
sudo yum -y install python36u

[root@cdh06 soft]# sudo yum -y install python36u
已加载插件:fastestmirror, langpacks
Repository cloudera-manager is listed more than once in the configuration
Loading mirror speeds from cached hostfile
epel/x86_64/metalink                                                                                                                            | 6.1 kB  00:00:00     
 * base: mirrors.cn99.com
 * epel: ftp.riken.jp
 * extras: ftp.sjtu.edu.cn
 * updates: mirrors.aliyun.com
epel                                                                                                                                            | 4.7 kB  00:00:00     
ius                                                                                                                                             | 1.3 kB  00:00:00     
epel/x86_64/primary_db         FAILED                                          
https://mirror.misakamikoto.network/fedora-epel/7/x86_64/repodata/7e09d0257e4d6d597cc84629bac5836c3789baf6aff6a46a7e0e6f1404a260b6-primary.sqlite.bz2: [Errno 14] HTTPS Error 404 - Not Found
正在尝试其它镜像。
To address this issue please refer to the below wiki article 

https://wiki.centos.org/yum-errors

If above article doesn't help to resolve this issue please use https://bugs.centos.org/.

(1/4): epel/x86_64/group_gz                                                                                                                     |  97 kB  00:00:00     
(2/4): ius/x86_64/primary                                                                                                                       |  55 kB  00:00:00     
(3/4): epel/x86_64/updateinfo                                                                                                                   | 1.1 MB  00:00:01     
(4/4): epel/x86_64/primary_db                                                                                                                   | 7.0 MB  00:00:03     
ius                                                                                                                                                            217/217
软件包 python36 已经被 python3 取代,改为尝试安装 python3-3.6.8-18.el7.x86_64
正在解决依赖关系
--> 正在检查事务
---> 软件包 python3.x86_64.0.3.6.8-18.el7 将被 安装
--> 正在处理依赖关系 python3-libs(x86-64) = 3.6.8-18.el7,它被软件包 python3-3.6.8-18.el7.x86_64 需要
--> 正在处理依赖关系 python3-setuptools,它被软件包 python3-3.6.8-18.el7.x86_64 需要
--> 正在处理依赖关系 python3-pip,它被软件包 python3-3.6.8-18.el7.x86_64 需要
--> 正在处理依赖关系 libpython3.6m.so.1.0()(64bit),它被软件包 python3-3.6.8-18.el7.x86_64 需要
--> 正在检查事务
---> 软件包 python3-libs.x86_64.0.3.6.8-18.el7 将被 安装
---> 软件包 python3-pip.noarch.0.9.0.3-8.el7 将被 安装
---> 软件包 python3-setuptools.noarch.0.39.2.0-10.el7 将被 安装
--> 解决依赖关系完成

依赖关系解决

=======================================================================================================================================================================
 Package                                        架构                               版本                                      源                                   大小
=======================================================================================================================================================================
正在安装:
 python3                                        x86_64                             3.6.8-18.el7                              updates                              70 k
为依赖而安装:
 python3-libs                                   x86_64                             3.6.8-18.el7                              updates                             6.9 M
 python3-pip                                    noarch                             9.0.3-8.el7                               base                                1.6 M
 python3-setuptools                             noarch                             39.2.0-10.el7                             base                                629 k

事务概要
=======================================================================================================================================================================
安装  1 软件包 (+3 依赖软件包)

总下载量:9.3 M
安装大小:47 M
Downloading packages:
(1/4): python3-3.6.8-18.el7.x86_64.rpm                                                                                                          |  70 kB  00:00:00     
(2/4): python3-pip-9.0.3-8.el7.noarch.rpm                                                                                                       | 1.6 MB  00:00:00     
(3/4): python3-setuptools-39.2.0-10.el7.noarch.rpm                                                                                              | 629 kB  00:00:00     
(4/4): python3-libs-3.6.8-18.el7.x86_64.rpm                                                                                                     | 6.9 MB  00:00:02     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
总计                                                                                                                                   3.4 MB/s | 9.3 MB  00:00:02     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  正在安装    : python3-libs-3.6.8-18.el7.x86_64                                                                                                                   1/4 
  正在安装    : python3-3.6.8-18.el7.x86_64                                                                                                                        2/4 
  正在安装    : python3-setuptools-39.2.0-10.el7.noarch                                                                                                            3/4 
  正在安装    : python3-pip-9.0.3-8.el7.noarch                                                                                                                     4/4 
  验证中      : python3-setuptools-39.2.0-10.el7.noarch                                                                                                            1/4 
  验证中      : python3-libs-3.6.8-18.el7.x86_64                                                                                                                   2/4 
  验证中      : python3-3.6.8-18.el7.x86_64                                                                                                                        3/4 
  验证中      : python3-pip-9.0.3-8.el7.noarch                                                                                                                     4/4 

已安装:
  python3.x86_64 0:3.6.8-18.el7                                                                                                                                        

作为依赖被安装:
  python3-libs.x86_64 0:3.6.8-18.el7                   python3-pip.noarch 0:9.0.3-8.el7                   python3-setuptools.noarch 0:39.2.0-10.el7                  

完毕!

2.查看python的版本

[root@cdh06 soft]# python3.6 -V
Python 3.6.8

3.接下来安装 python36u-devel,目的是为了IUS提供python3的类库和头文件。
sudo yum -y install python36u-devel

[root@cdh06 soft]# sudo yum -y install python36u-devel
已加载插件:fastestmirror, langpacks
Repository cloudera-manager is listed more than once in the configuration
Loading mirror speeds from cached hostfile
 * base: mirrors.cn99.com
 * epel: ftp.riken.jp
 * extras: ftp.sjtu.edu.cn
 * updates: mirrors.aliyun.com
软件包 python36-devel 已经被 python3-devel 取代,改为尝试安装 python3-devel-3.6.8-18.el7.x86_64
正在解决依赖关系
--> 正在检查事务
---> 软件包 python3-devel.x86_64.0.3.6.8-18.el7 将被 安装
--> 正在处理依赖关系 python3-rpm-macros,它被软件包 python3-devel-3.6.8-18.el7.x86_64 需要
--> 正在处理依赖关系 python3-rpm-generators,它被软件包 python3-devel-3.6.8-18.el7.x86_64 需要
--> 正在检查事务
---> 软件包 python3-rpm-generators.noarch.0.6-2.el7 将被 安装
---> 软件包 python3-rpm-macros.noarch.0.3-34.el7 将被 安装
--> 解决依赖关系完成

依赖关系解决

=======================================================================================================================================================================
 Package                                           架构                              版本                                     源                                  大小
=======================================================================================================================================================================
正在安装:
 python3-devel                                     x86_64                            3.6.8-18.el7                             updates                            217 k
为依赖而安装:
 python3-rpm-generators                            noarch                            6-2.el7                                  base                                20 k
 python3-rpm-macros                                noarch                            3-34.el7                                 base                               8.1 k

事务概要
=======================================================================================================================================================================
安装  1 软件包 (+2 依赖软件包)

总下载量:244 k
安装大小:678 k
Downloading packages:
(1/3): python3-rpm-macros-3-34.el7.noarch.rpm                                                                                                   | 8.1 kB  00:00:00     
(2/3): python3-devel-3.6.8-18.el7.x86_64.rpm                                                                                                    | 217 kB  00:00:00     
(3/3): python3-rpm-generators-6-2.el7.noarch.rpm                                                                                                |  20 kB  00:00:00     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
总计                                                                                                                                   1.1 MB/s | 244 kB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  正在安装    : python3-rpm-generators-6-2.el7.noarch                                                                                                              1/3 
  正在安装    : python3-rpm-macros-3-34.el7.noarch                                                                                                                 2/3 
  正在安装    : python3-devel-3.6.8-18.el7.x86_64                                                                                                                  3/3 
  验证中      : python3-devel-3.6.8-18.el7.x86_64                                                                                                                  1/3 
  验证中      : python3-rpm-macros-3-34.el7.noarch                                                                                                                 2/3 
  验证中      : python3-rpm-generators-6-2.el7.noarch                                                                                                              3/3 

已安装:
  python3-devel.x86_64 0:3.6.8-18.el7                                                                                                                                  

作为依赖被安装:
  python3-rpm-generators.noarch 0:6-2.el7                                             python3-rpm-macros.noarch 0:3-34.el7                                            

完毕!

2.5 配置python的环境(重点!)

  1. 虚拟环境(推荐)
    使用venv方法
[root@cdh06 soft]# python3.6 -m venv py3
[root@cdh06 soft]# source py3/bin/activate
(py3) [root@cdh06 soft]# python -V
Python 3.6.8
(py3) [root@cdh06 soft]# deactivate
[root@cdh06 soft]# 

修改环境变量:

[root@cdh06 soft]# vim /etc/profile
## 添加以下内容
export PYSPARK_PYTHON=python3
[root@cdh06 soft]# source /etc/profile

2.验证pyspark:

[root@cdh06 soft]# pyspark
Python 3.6.8 (default, Nov 16 2020, 16:55:22) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/09/23 17:15:27 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0-cdh6.3.2
      /_/

Using Python version 3.6.8 (default, Nov 16 2020 16:55:22)
SparkSession available as 'spark'.
>>> 

成功!

3.在CM配置Python环境变量

1.通过export设置python命令的安装路径:
1.1先查看python3的路径:

[root@cdh05 bin]# whereis python3
python3: /usr/bin/python3 /usr/bin/python3.6 /usr/bin/python3.6m /usr/bin/python3.6-config /usr/bin/python3.6m-config /usr/bin/python3.6m-x86_64-config /usr/lib/python3.6 /usr/lib64/python3.6 /usr/include/python3.6m /usr/share/man/man1/python3.1.gz

1.2 在配置中修改spark_env
在这里插入图片描述
修改完成后,回到CM主页根据提示重启相关服务。

4、pyspark命令测试
1.获取kerberos凭证–省略
2.使用Pyspark命令测试

[root@cdh05 ~]# pyspark
Python 3.6.8 (default, Nov 16 2020, 16:55:22) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.0-cdh6.3.2
      /_/

Using Python version 3.6.8 (default, Nov 16 2020 16:55:22)
SparkSession available as 'spark'.
>>> x = sc.parallelize([1,2,3])
>>> y = x.flatMap(lambda x: (x, 100*x, x**2))
>>> print(x.collect())
[1, 2, 3]                                                                       
>>> print(y.collect())
[1, 100, 1, 2, 200, 4, 3, 300, 9]                                               
>>> 

5.使用spark-submit提交一个Pyspark作业
这个demo主要使用spark-submit提交pyspark job,模拟从hdfs中读取数据,并转换成DateFrame,然后注册表并执行SQL条件查询,将查询结果输出到hdfs中。

1.在/tmp目录下创建一个test,将测试数据上传至hdfs目录/tmp/test/
执行put命令上传文件
查看使用cat命令
在这里插入图片描述

[root@cdh02 ~]# hadoop fs -mkdir/tmp/test/
[root@cdh02 ~]# hadoop fs -put people.txt /tmp/test
[root@cdh02 ~]# hadoop fs -cat /tmp/test/people.txt
anand,14
oner,19
carol,14
job,17
mary,20
divid,20
Eric,16
Faerl,28
rice,25
kumar,30
zhuli,16
marfer,23
rakie,19

2.将pyspark程序上传至CDH集群其中一个节点上,该节点部署了Spark的Gateway角色和Python3

PySparkTest_to_HDFS.py在pysparktest目录中,内容如下:

# 初始化sqlContext
from pyspark import SparkConf,SparkContext
from pyspark.sql import SQLContext, Row
conf=(SparkConf().setAppName('PySparkTest_to_HDFS'))
sc=SparkContext(conf=conf)
sqlContext = SQLContext(sc)

# 加载文本文件并转换成Row.
lines = sc.textFile("/tmp/test/people.txt")
parts = lines.map(lambda l: l.split(","))
people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))

# 将DataFrame注册为table.
schemaPeople = sqlContext.createDataFrame(people)
schemaPeople.registerTempTable("people")

# 执行sql查询,查下条件年龄在13岁到19岁之间
teenagers = sqlContext.sql("SELECT name,age FROM people WHERE age >= 13 AND age <= 19")

# 将查询结果保存至hdfs中
teenagers.write.save("/tmp/test/teenagers")

3.使用spark-submit命令向集群提交PySpark作业

[root@cdh02 ~]# spark-submit pyspark_to_hdfs.py 
22/09/23 19:01:29 INFO spark.SparkContext: Running Spark version 2.4.0-cdh6.3.2
22/09/23 19:01:29 INFO logging.DriverLogger: Added a local log appender at: /tmp/spark-effcdcd5-0278-4188-9baf-5c43b5d97666/__driver_logs__/driver.log
22/09/23 19:01:29 INFO spark.SparkContext: Submitted application: PySparkTest_to_HDFS
22/09/23 19:01:29 INFO spark.SecurityManager: Changing view acls to: root
22/09/23 19:01:29 INFO spark.SecurityManager: Changing modify acls to: root
22/09/23 19:01:29 INFO spark.SecurityManager: Changing view acls groups to: 
22/09/23 19:01:29 INFO spark.SecurityManager: Changing modify acls groups to: 
22/09/23 19:01:29 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/09/23 19:01:29 INFO util.Utils: Successfully started service 'sparkDriver' on port 39084.
22/09/23 19:01:29 INFO spark.SparkEnv: Registering MapOutputTracker
22/09/23 19:01:29 INFO spark.SparkEnv: Registering BlockManagerMaster
22/09/23 19:01:29 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/09/23 19:01:29 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/09/23 19:01:29 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-c25c60ee-4275-4735-af0c-b1f1d78f9a9d
22/09/23 19:01:29 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB
22/09/23 19:01:29 INFO spark.SparkEnv: Registering OutputCommitCoordinator
22/09/23 19:01:29 INFO util.log: Logging initialized @1640ms
22/09/23 19:01:29 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: 2018-09-05T05:11:46+08:00, git hash: 3ce520221d0240229c862b122d2b06c12a625732
22/09/23 19:01:30 INFO server.Server: Started @1701ms
22/09/23 19:01:30 INFO server.AbstractConnector: Started ServerConnector@4b6333cc{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
22/09/23 19:01:30 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@26a1432a{/jobs,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4d4a97e9{/jobs/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5f614298{/jobs/job,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@237cfc5{/jobs/job/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1bdbe5b5{/stages,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4e7b11d4{/stages/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2aafbfca{/stages/stage,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@448bf3a6{/stages/stage/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@138d4512{/stages/pool,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@55344422{/stages/pool/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@42a707c7{/storage,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@59aadaf6{/storage/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2076a104{/storage/rdd,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5794e47c{/storage/rdd/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2d775876{/environment,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@22055a44{/environment/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37eeb181{/executors,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@47290404{/executors/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@508a507f{/executors/threadDump,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@58d5f750{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1af53772{/static,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@44594929{/,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3d9423fd{/api,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5a6bdfe7{/jobs/job/kill,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c7f6901{/stages/stage/kill,null,AVAILABLE,@Spark}
22/09/23 19:01:30 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://cdh02:4040
22/09/23 19:01:30 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
22/09/23 19:01:30 INFO util.Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
22/09/23 19:01:30 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers
22/09/23 19:01:30 INFO conf.Configuration: resource-types.xml not found
22/09/23 19:01:30 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/09/23 19:01:30 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (41127 MB per container)
22/09/23 19:01:30 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
22/09/23 19:01:30 INFO yarn.Client: Setting up container launch context for our AM
22/09/23 19:01:30 INFO yarn.Client: Setting up the launch environment for our AM container
22/09/23 19:01:30 INFO yarn.Client: Preparing resources for our AM container
22/09/23 19:01:30 INFO yarn.Client: Uploading resource file:/tmp/spark-effcdcd5-0278-4188-9baf-5c43b5d97666/__spark_conf__2487934838763565949.zip -> hdfs://nameservice1/user/root/.sparkStaging/application_1660017172277_0243/__spark_conf__.zip
22/09/23 19:01:31 INFO spark.SecurityManager: Changing view acls to: root
22/09/23 19:01:31 INFO spark.SecurityManager: Changing modify acls to: root
22/09/23 19:01:31 INFO spark.SecurityManager: Changing view acls groups to: 
22/09/23 19:01:31 INFO spark.SecurityManager: Changing modify acls groups to: 
22/09/23 19:01:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/09/23 19:01:31 INFO yarn.Client: Submitting application application_1660017172277_0243 to ResourceManager
22/09/23 19:01:31 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
22/09/23 19:01:31 INFO impl.YarnClientImpl: Submitted application application_1660017172277_0243
22/09/23 19:01:32 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
22/09/23 19:01:32 INFO yarn.Client: Application report for application_1660017172277_0243 (state: ACCEPTED)
22/09/23 19:01:32 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: AM container is launched, waiting for AM container to Register with RM
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: root.users.root
	 start time: 1663930891063
	 final status: UNDEFINED
	 tracking URL: http://cdh02:8088/proxy/application_1660017172277_0243/
	 user: root
22/09/23 19:01:33 INFO yarn.SparkRackResolver: Got an error when resolving hostNames. Falling back to /default-rack for all
22/09/23 19:01:33 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> cdh02,cdh03, PROXY_URI_BASES -> http://cdh02:8088/proxy/application_1660017172277_0243,http://cdh03:8088/proxy/application_1660017172277_0243, RM_HA_URLS -> cdh02:8088,cdh03:8088), /proxy/application_1660017172277_0243
22/09/23 19:01:33 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
22/09/23 19:01:33 INFO yarn.Client: Application report for application_1660017172277_0243 (state: RUNNING)
22/09/23 19:01:33 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: 10.110.17.36
	 ApplicationMaster RPC port: -1
	 queue: root.users.root
	 start time: 1663930891063
	 final status: UNDEFINED
	 tracking URL: http://cdh02:8088/proxy/application_1660017172277_0243/
	 user: root
22/09/23 19:01:33 INFO cluster.YarnClientSchedulerBackend: Application application_1660017172277_0243 has started running.
........

22/09/23 19:01:41 INFO cluster.YarnScheduler: Adding task set 1.0 with 2 tasks
22/09/23 19:01:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, cdh06, executor 1, partition 0, NODE_LOCAL, 7910 bytes)
22/09/23 19:01:41 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on cdh06:39890 (size: 78.0 KB, free: 398.6 MB)
22/09/23 19:01:42 INFO spark.ExecutorAllocationManager: Requesting 2 new executors because tasks are backlogged (new desired total will be 2)
22/09/23 19:01:42 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, cdh06, executor 1, partition 1, NODE_LOCAL, 7910 bytes)
22/09/23 19:01:42 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 1009 ms on cdh06 (executor 1) (1/2)
22/09/23 19:01:42 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 112 ms on cdh06 (executor 1) (2/2)
22/09/23 19:01:42 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 
22/09/23 19:01:42 INFO scheduler.DAGScheduler: ResultStage 1 (save at NativeMethodAccessorImpl.java:0) finished in 1.134 s
22/09/23 19:01:42 INFO scheduler.DAGScheduler: Job 1 finished: save at NativeMethodAccessorImpl.java:0, took 1.139997 s
22/09/23 19:01:42 INFO datasources.FileFormatWriter: Write Job 71c5176c-9db0-4399-b3bc-71a64b666fe3 committed.
22/09/23 19:01:42 INFO datasources.FileFormatWriter: Finished processing stats for write job 71c5176c-9db0-4399-b3bc-71a64b666fe3.
22/09/23 19:01:42 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/09/23 19:01:42 INFO server.AbstractConnector: Stopped Spark@4b6333cc{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
22/09/23 19:01:42 INFO ui.SparkUI: Stopped Spark web UI at http://cdh02:4040
22/09/23 19:01:42 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
22/09/23 19:01:42 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
22/09/23 19:01:42 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/09/23 19:01:42 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
22/09/23 19:01:42 INFO cluster.YarnClientSchedulerBackend: Stopped
22/09/23 19:01:42 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/09/23 19:01:42 INFO memory.MemoryStore: MemoryStore cleared
22/09/23 19:01:42 INFO storage.BlockManager: BlockManager stopped
22/09/23 19:01:42 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/09/23 19:01:42 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/09/23 19:01:42 INFO spark.SparkContext: Successfully stopped SparkContext
22/09/23 19:01:42 INFO util.ShutdownHookManager: Shutdown hook called
22/09/23 19:01:42 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-effcdcd5-0278-4188-9baf-5c43b5d97666/pyspark-f00c77c5-73c2-4775-b09d-94cbee658a49
22/09/23 19:01:42 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-effcdcd5-0278-4188-9baf-5c43b5d97666
22/09/23 19:01:42 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-0a5b450a-a378-41cf-b64e-5d37a7a5237d

4.作业执行成功
在这里插入图片描述
查看Yarn界面
在这里插入图片描述

在这里插入图片描述
通过以上信息,可以看到作业执行成功。

5.查看生成的文件,如下图:

[root@cdh02 ~]# hadoop fs -ls /tmp/test/teenagers
Found 3 items
-rw-r--r--   3 root supergroup          0 2022-09-23 19:01 /tmp/test/teenagers/_SUCCESS
-rw-r--r--   3 root supergroup        703 2022-09-23 19:01 /tmp/test/teenagers/part-00000-54158861-54c1-4e3e-b8c5-36bff50444bf-c000.snappy.parquet
-rw-r--r--   3 root supergroup        656 2022-09-23 19:01 /tmp/test/teenagers/part-00001-54158861-54c1-4e3e-b8c5-36bff50444bf-c000.snappy.parquet
[root@cdh02 ~]# 

因为生成的是parquet文件,它是二进制文件,无法直接使用命令查看,所以我们可以在pyspark上验证文件内容是否正确.

我们上面使用spark-submit提交的任务使用sql查询条件是13到19岁,可以看到在pyspark上查询的数据是在这个区间的数据

>>> parquetFile = sqlContext.read.parquet("/tmp/test/teenagers")
>>> parquetFile.registerTempTable("parquetTable")                               
>>> teenagers = sqlContext.sql("select* from parquetTable").show()
+------+---+                                                                    
|  name|age|
+------+---+
| anand| 14|
|  oner| 19|
| carol| 14|
|   job| 17|
|  mary| 20|
| divid| 20|
|  Eric| 16|
|  rice| 25|
| zhuli| 16|
|marfer| 23|
| rakie| 19|
+------+---+

6.PySpark写数据到MySQL
1.将上面的作业增加如下代码

# 初始化sqlContext
from pyspark import SparkConf,SparkContext
from pyspark.sql import SQLContext, Row
conf=(SparkConf().setAppName('PySpar_to_MySQL'))
sc=SparkContext(conf=conf)
sqlContext = SQLContext(sc)

# 加载文本文件并转换成Row.
lines = sc.textFile("/tmp/test/people.txt")
parts = lines.map(lambda l: l.split(","))
people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))

# 将DataFrame注册为table.
schemaPeople = sqlContext.createDataFrame(people)
schemaPeople.registerTempTable("people")

# 执行sql查询,查下条件年龄在13岁到29岁之间
teenagers = sqlContext.sql("SELECT name,age FROM people WHERE age >= 13 AND age <= 29")

url = "jdbc:mysql://10.110.17.37:3306/mes_gd"
table = "teenagers"
prop = {"user":"xxx","password":"xxx@xx96"}

teenagers.write.jdbc(url, table, "append", prop)

2.在命令行加载MySQL的驱动包到Spark环境变量,然后执行命令
在这里插入图片描述
本地刚好有一个MySQL的驱动包,执行以下命令添加到spark环境变量中。
先将驱动包复制到opt/cloudera/parcels/CDH/lib/spark/jars目录下。

[root@cdh02 ~]# cp mysql-connector-java-5.1.47.jar /opt/cloudera/parcels/CDH/lib/spark/jars
[root@cdh02 ~]# export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/CDH/lib/spark/jars/mysql-connector-java-5.1.47.jar
[root@cdh02 ~]# spark-submit pyspark_to_mysql.py
22/09/23 19:24:36 INFO spark.SparkContext: Running Spark version 2.4.0-cdh6.3.2
22/09/23 19:24:36 INFO logging.DriverLogger: Added a local log appender at: /tmp/spark-f34e3512-a30a-4137-ba20-5d59e4772966/__driver_logs__/driver.log
22/09/23 19:24:36 INFO spark.SparkContext: Submitted application: pyspark_to_mysql
22/09/23 19:24:36 INFO spark.SecurityManager: Changing view acls to: root
22/09/23 19:24:36 INFO spark.SecurityManager: Changing modify acls to: root
22/09/23 19:24:36 INFO spark.SecurityManager: Changing view acls groups to: 
22/09/23 19:24:36 INFO spark.SecurityManager: Changing modify acls groups to: 
22/09/23 19:24:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/09/23 19:24:36 INFO util.Utils: Successfully started service 'sparkDriver' on port 45020.

。。。。
22/09/23 19:24:52 INFO spark.SparkContext: Invoking stop() from shutdown hook
22/09/23 19:24:52 INFO server.AbstractConnector: Stopped Spark@58529c8c{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
22/09/23 19:24:52 INFO ui.SparkUI: Stopped Spark web UI at http://cdh02:4040
22/09/23 19:24:52 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread
22/09/23 19:24:52 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors
22/09/23 19:24:52 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
22/09/23 19:24:52 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
 services=List(),
 started=false)
22/09/23 19:24:52 INFO cluster.YarnClientSchedulerBackend: Stopped
22/09/23 19:24:52 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/09/23 19:24:52 INFO memory.MemoryStore: MemoryStore cleared
22/09/23 19:24:52 INFO storage.BlockManager: BlockManager stopped
22/09/23 19:24:52 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
22/09/23 19:24:52 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/09/23 19:24:52 INFO spark.SparkContext: Successfully stopped SparkContext
22/09/23 19:24:52 INFO util.ShutdownHookManager: Shutdown hook called
22/09/23 19:24:52 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-f34e3512-a30a-4137-ba20-5d59e4772966/pyspark-d8c43dbb-9758-46a7-8359-7269d20f61f3
22/09/23 19:24:52 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-f34e3512-a30a-4137-ba20-5d59e4772966
22/09/23 19:24:52 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-9940c301-63b4-417b-a9b9-aefa1ea7dff8

在这里插入图片描述

执行成功!

3.使用Yarn查看作业是否运行成功
在这里插入图片描述
4.验证MySQL表中是否有数据

mysql> use mes_gd;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+--------------------+
| Tables_in_mes_gd   |
+--------------------+
| ae_order_equipment |
| ae_order_materials |
| ae_order_physical  |
| clicks             |
| letter             |
| student            |
| teenagers          |
| tinvbill_daishuyun |
| tinvbill_kongming  |
| tinvbill_shulan    |
+--------------------+
10 rows in set (0.00 sec)

mysql> select * from teenagers;
+------+---+                                                                    
|  name|age|
+------+---+
| anand| 14|
|  oner| 19|
| carol| 14|
|   job| 17|
|  mary| 20|
| divid| 20|
|  Eric| 16|
|  rice| 25|
| zhuli| 16|
|marfer| 23|
| rakie| 19|
+------+---+

到此验证结束!!
注意:这里将数据写入MySQL时需要在环境变量中加载MySQL的JDBC驱动包,MySQL表可以不存在,pyspark在写数据时会自动创建该表。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
CM(Cloudera Manager)是一款用于管理大数据平台的工具,而CDH(Cloudera Distribution for Hadoop)则是Cloudera提供的一套基于Hadoop的大数据解决方案。离线安装指的是在没有网络连接的情况下完成安装过程。 首先,我们需要下载CM 6.3和CDH 6.3.2的离线安装包。可以通过Cloudera官方网站或者其他可靠的渠道获取相关安装包。 在离线安装过程中,我们需要将安装包文件传输到目标机器上。可以通过使用U盘、移动硬盘或者通过本地网络将文件传输到目标机器上。 安装之前,确保目标机器满足CM 6.3和CDH 6.3.2的系统要求。这些要求包括特定的操作系统版本、硬件配置和依赖软件的安装。 接下来,解压安装包文件。可以使用相关解压工具(如tar命令)将压缩包文件解压到指定的目录中。 运行Cloudera Manager安装脚本。在解压后的安装包目录中,可以找到一个名为"cm-6.3.x-installer.bin"(x表示具体的版本号)的安装脚本。运行此脚本以启动安装过程。 根据安装脚本的提示,完成Cloudera Manager的安装。这包括选择安装目录、指定数据库选项、配置集群名称和选择其他相关配置。 在Cloudera Manager安装完成后,通过Web浏览器访问Cloudera Manager的Web界面。在Web界面中,可以配置和管理CDH集群。可以按照界面的引导完成相关配置,包括添加主机、指定集群角色和配置相关服务。 最后,通过Cloudera Manager管理界面安装CDH 6.3.2。在界面中选择要安装的版本,并按照界面的引导完成相关配置。这包括选择所需的服务和配置不同服务的参数。 总之,离线安装CM 6.3和CDH 6.3.2的过程包括下载安装包、传输文件、解压安装包、运行Cloudera Manager安装脚本、配置和管理Cloudera Manager,以及通过管理界面安装CDH 6.3.2。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值