Hadoop项目
1. 使用华为云服务器
-
使用华为云购买VPS,并使用XShell连接(访问github外网,购买中国香港)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-73RPT4gq-1659157887499)(C:\Users\yjr26\AppData\Roaming\Typora\typora-user-images\image-20220526170221927.png)]
2. 编译安装Hadoop
-
安装GCC
- 可以跳过步骤2,3,具体实践过程中使用系统自带yum源没有报错
-
安装 Protobuf步骤5中,patch -p1 < protoc.patch报错:-bash: patch: command not found
解决:yum -y install patch
-
安装 Protobuf步骤7中,报错(未解决,目前没看到有什么影响)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-PeGrpG7J-1659157887500)(C:\Users\yjr26\AppData\Roaming\Typora\typora-user-images\image-20220527162031442.png)]
-
编译成功
... [INFO] Apache Hadoop Azure support ........................ SUCCESS [ 10.346 s] [INFO] Apache Hadoop Aliyun OSS support ................... SUCCESS [ 5.318 s] [INFO] Apache Hadoop Client Aggregator .................... SUCCESS [ 2.450 s] [INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [ 1.571 s] [INFO] Apache Hadoop Resource Estimator Service ........... SUCCESS [ 6.384 s] [INFO] Apache Hadoop Azure Data Lake support .............. SUCCESS [ 11.636 s] [INFO] Apache Hadoop Image Generation Tool ................ SUCCESS [ 0.393 s] [INFO] Apache Hadoop Tools Dist ........................... SUCCESS [ 8.816 s] [INFO] Apache Hadoop Tools ................................ SUCCESS [ 0.024 s] [INFO] Apache Hadoop Client API ........................... SUCCESS [01:40 min] [INFO] Apache Hadoop Client Runtime ....................... SUCCESS [01:13 min] [INFO] Apache Hadoop Client Packaging Invariants .......... SUCCESS [ 0.982 s] [INFO] Apache Hadoop Client Test Minicluster .............. SUCCESS [02:25 min] [INFO] Apache Hadoop Client Packaging Invariants for Test . SUCCESS [ 0.149 s] [INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [ 0.105 s] [INFO] Apache Hadoop Distribution ......................... SUCCESS [ 21.004 s] [INFO] Apache Hadoop Client Modules ....................... SUCCESS [ 0.025 s] [INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [ 0.697 s] [INFO] Apache Hadoop Cloud Storage Project 3.1.1 .......... SUCCESS [ 0.025 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 33:13 min [INFO] Finished at: 2022-05-27T22:35:15+08:00 [INFO] ------------------------------------------------------------------------
3. 测试运行Hadoop:
-
试运行Hadoop:
[root@hadoop02 hadoop-3.1.1]# pwd /root/hadoop-3.1.1-src/hadoop-dist/target/hadoop-3.1.1 [root@hadoop02 hadoop-3.1.1]# bin/hadoop Usage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS] or hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS] where CLASSNAME is a user-provided Java class OPTIONS is none or any of: buildpaths attempt to add class files from build tree --config dir Hadoop config directory --debug turn on shell script debug mode --help usage information hostnames list[,of,host,names] hosts to use in slave mode hosts filename list of hosts to use in slave mode loglevel level set the log4j level for this command workers turn on worker mode SUBCOMMAND is one of: Admin Commands: daemonlog get/set the log level for each daemon Client Commands: archive create a Hadoop archive checknative check native Hadoop and compression libraries availability classpath prints the class path needed to get the Hadoop jar and the required libraries conftest validate configuration ... key manage keys via the KeyProvider rumenfolder scale a rumen input trace rumentrace convert logs into a rumen trace s3guard manage metadata on S3 trace view and modify Hadoop tracing settings version print the version Daemon Commands: kms run KMS, the Key Management Server SUBCOMMAND may print help when invoked w/o parameters or with -h. [root@hadoop02 hadoop-3.1.1]#
-
单节点安装
https://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-common/SingleCluster.html
-
Standalone Operation:
[root@hadoop02 hadoop-3.1.1]# ls bin etc include lib libexec LICENSE.txt NOTICE.txt README.txt sbin share [root@hadoop02 hadoop-3.1.1]# mkdir input [root@hadoop02 hadoop-3.1.1]# cp etc/hadoop/*.xml input [root@hadoop02 hadoop-3.1.1]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar grep input output 'dfs[a-z.]+'2022-05-28 00:02:27,848 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2022-05-28 00:02:27,896 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2022-05-28 00:02:27,896 INFO impl.MetricsSystemImpl: JobTracker metrics system started 2022-05-28 00:02:28,002 INFO input.FileInputFormat: Total input files to process : 9 2022-05-28 00:02:28,022 INFO mapreduce.JobSubmitter: number of splits:9 2022-05-28 00:02:28,116 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local146364211_0001 2022-05-28 00:02:28,117 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2022-05-28 00:02:28,212 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 2022-05-28 00:02:28,214 INFO mapreduce.Job: Running job: job_local146364211_0001 2022-05-28 00:02:28,215 INFO mapred.LocalJobRunner: OutputCommitter set in config null 2022-05-28 00:02:28,221 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2 ... 2022-05-28 00:02:30,322 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=1336966 FILE: Number of bytes written=3249308 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 Map-Reduce Framework Map input records=1 Map output records=1 Map output bytes=17 Map output materialized bytes=25 Input split bytes=157 Combine input records=0 Combine output records=0 Reduce input groups=1 Reduce shuffle bytes=25 Reduce input records=1 Reduce output records=1 Spilled Records=2 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=0 Total committed heap usage (bytes)=2527068160 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=123 File Output Format Counters Bytes Written=23 [root@hadoop02 hadoop-3.1.1]# cat output/* 1 dfsadmin [root@hadoop02 hadoop-3.1.1]#
-
Pseudo-Distributed Operation(报错)
[root@hadoop02 hadoop-3.1.1]# sbin/start-dfs.sh(第二步) Starting namenodes on [localhost] ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes ERROR: Attempting to operate on hdfs datanode as root ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation. Starting secondary namenodes [hadoop02] ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation. [root@hadoop02 hadoop-3.1.1]#
[root@hadoop02 hadoop-3.1.1]# sbin/start-dfs.sh Starting namenodes on [localhost] Last login: Mon May 30 17:58:02 CST 2022 from ::1 on pts/2 localhost: localhost: Authorized users only. All activities may be monitored and reported. localhost: root@localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). Starting datanodes Last login: Mon May 30 18:00:15 CST 2022 on pts/2 localhost: localhost: Authorized users only. All activities may be monitored and reported. localhost: root@localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). Starting secondary namenodes [hadoop02] Last login: Mon May 30 18:00:15 CST 2022 on pts/2 hadoop02: Warning: Permanently added 'hadoop02' (ECDSA) to the list of known hosts. hadoop02: hadoop02: Authorized users only. All activities may be monitored and reported. hadoop02: root@hadoop02: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). [root@hadoop02 hadoop-3.1.1]#
-
-
集群安装:(多机部署)
4. 性能调优
-
鲲鹏代码迁移工具
快速入门-鲲鹏代码迁移工具-鲲鹏开发套件-文档首页–鲲鹏社区 (hikunpeng.com)
-
获取软件包
[root@hadoop02 ~]# wget https://mirror.iscas.ac.cn/kunpeng/archive/Porting_Dependency/Packages/Porting-advisor_2.3.0_linux-Kunpeng.tar.gz
-
解压
[root@hadoop02 ~]# tar --no-same-owner -zxvf Porting-advisor_2.3.0_linux-Kunpeng.tar.gz
-
执行runtime_env_check脚本,检查鲲鹏代码迁移工具的依赖文件。
[root@hadoop02 Porting-advisor_2.3.0_linux-Kunpeng]#bash runtime_env_check.sh
-
安装
[root@hadoop02 Porting-advisor_2.3.0_linux-Kunpeng]# .\install web Porting Web console is now running, go to: https://192.168.0.250:8084/porting/#/login Successfully installed the Kunpeng Porting Advisor in /opt/portadv/.
-
Web操作界面打不开
解决:【软件迁移】鲲鹏代码迁移工具无法打开Web页面_鲲鹏众智_鲲鹏论坛_华为云论坛 (huaweicloud.com)
[root@hadoop02 package]# chmod 777 hadoop-3.1.1.tar.gz #给文件所有用户加上可读可写可执行权限(r->4,w->2,x->1)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-G3Dide0N-1659157887501)(C:\Users\yjr26\AppData\Roaming\Typora\typora-user-images\image-20220618215330804.png)]
代码迁移工具的用户名:portadmin
-
-
鲲鹏性能分析工具
-
获取软件包:鲲鹏官网或者VSCode提示处
-
解压
-
安装:
./install.sh
-
性能分析工具的用户名:tunadmin
-
5.Git源码到Gitee
git config --global user.name "yjr"
git config --global user.email "yjr2609538400@163.com" #Git的全局设置
mkdir xxx #新建文件夹
cd xxx #进入
git init #使本地文件夹成为一个本地git仓库,运行后文件夹下会生成一个.git文件夹
git add xxx #将要git的本地文件夹添加至本地暂缓区,add .表示该目录下所有文件
git commit -m main #将暂缓区文件提交至本地版本库的main分支(可改名)。
git remote add origin https://gitee.com/Marches7/hadoop_test.git #远程仓库的地址,在克隆/下载那里复制
git push -u origin main #将本地main分支代码提交至远程main分支上
git pull origin main #将远程main分支代码拉到本地main分支上(合并)
#github需要使用token登录,待解决
6.Hadoop实操
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-cS4sur2V-1659157887501)(C:\Users\yjr26\AppData\Roaming\Typora\typora-user-images\image-20220604211747848.png)]
7.注意
编译过程,安装 Protobuf那”步骤6 回到protobuf-2.5.0根目录,编译并安装到系统默认目录“。
./configure CFLAGS=‘-fsigned-char’
8. 生成补丁和打补丁
Linux下的一些开源软件源码包,我们去对源码进行修改后,为了方便后续的更新,回滚,可以采用补丁的方式生成我们对源码的修改过程,这样别人拿到补丁,就可以直接使用,而不用再做修改源码的同样的操作。
- 补丁的生成
diff -uNr src_path modify_path > patch file
作用说明:对于diff命令,它的功能就是逐行比较两个文件的不同,然后输出比较的结果。
参数说明:
-u :以统一格式创建补丁文件,这种格式比缺省格式更紧凑些
-N :选项确保补丁文件将正确地处理已经创建和删除文件的情况
-r :递归选项,设置了这个选项,diff会将两个不同版本源代码目录中的所有对应文件全部都进行一次比较,包括子目录文件
src_path :源文件地址
modify_path :修改后文件地址
patch file :生成补丁文件名,一般以.patch为后缀
">" 是重定向符号,表示将diff比较的结果重定向输入到patch_file文件中(如不指定重定向,diff的结果将打印到标准输出)
-
打补丁
patch -pN < xxx.patch
参数说明:
-pN :选项打补丁时要忽略掉第N层目录