先自我介绍一下,小编浙江大学毕业,去过华为、字节跳动等大厂,目前阿里P7
深知大多数程序员,想要提升技能,往往是自己摸索成长,但自己不成体系的自学效果低效又漫长,而且极易碰到天花板技术停滞不前!
因此收集整理了一份《2024年最新大数据全套学习资料》,初衷也很简单,就是希望能够帮助到想自学提升又不知道该从何学起的朋友。
既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上大数据知识点,真正体系化!
由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新
如果你需要这些资料,可以添加V获取:vip204888 (备注大数据)
正文
根据配置的路径新建zkdata,zkdatalog目录。然后到zkdata目录中可以touch新建一个文件myid,也可以直接echo写入为1,另外slave1,salve2分别对应2,3。编辑
5.分发解压后的java,/etc/profile,zookeeper修改myid为2,3
7.3.分别在虚拟机master,slave1,slave2上启动journalnode进程
hadoop-daemon.sh start namenode单独启动master上的namenode
hdfs namenode -bootstrapStandby再在另外你要起的虚拟机上同步namenode
8.在Master节点上使用命令分别查看服务nn2与rm2进程状态
hdfs haadmin -getServiceState nn2
yarn rmadmin -getServiceState rm2
HadoopHA模式搭建规划
| 主机名 | IP地址 | 相关进程 |
| master | 根据自己的 | NameNode,DataNode, DFSZKFailoverController, QuorumPeerMain,JournalNode, ResourceManager,NodeMananger |
| slave1 | 根据自己的 | NameNode,DataNode, DFSZKFailoverController, QuorumPeerMain,JournalNode, ResourceManager,NodeMananger |
| slave2 | 根据自己的 | DataNode, NodeMananger, QuorumPeerMain, JournalNode |
1.前期准备
1.1.hadoop-3.1.3.tar.gz,jdk-8u212-linux-x64.tar.gz,apache-zookeeper-3.5.7-bin.tar.gz三个包提取码:k5y6
2.解压安装包,配置环境变量
tar -zxf tar包 -C 指定目录
解压后
apache-zookeeper-3.5.7-bin名字好长不太习惯可以用mv改名
或者ln -s 软链接
vim /etc/profile配置环境变量,source /etc/profile使环境变量生效
验证
hadoop version
java -version
3. 将三个节点分别命名为master、slave1、slave2并做免密登录
修改主机名,断开重连
hostnamectl set-hostname 主机名
免密在前面Hadoop完全分布式搭建说过,这里不再赘述
4.搭建zookeeper集群
cd /opt/module/zookeeper/conf
cp zoo_sample.cfg zoo.cfg
编辑zoo.cfg新增下列配置
根据配置的路径新建zkdata,zkdatalog目录。然后到zkdata目录中可以touch新建一个文件myid,也可以直接echo写入为1,另外slave1,salve2分别对应2,3。
5.分发解压后的java,/etc/profile,zookeeper修改myid为2,3
scp -r /opt/module/jdk1.8.0_212/ slave1:/opt/module/
scp -r /opt/module/jdk1.8.0_212/ slave2:/opt/module/
scp /etc/profile slave1:/etc/profile
scp /etc/profile slave2:/etc/profile(不要忘记source)scp -r /opt/module/zookeeper/ slave1:/opt/module/
scp -r /opt/module/zookeeper/ slave2:/opt/module/
6.启动zookeeper
zkServer.sh start
查看状态
zkServer.sh status
cd /opt/module/hadoop-3.1.3/etc/hadoop
vim core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://cluster</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.3/tmpdir</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>master:2181,slave1:2181,slave2:2181</value>
<description>
A list of ZooKeeper server addresses, separated by commas, that are
to be used by the ZKFailoverController in automatic failover.
</description>
</property>
vim hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.nameservices</name>
<value>cluster</value>
<description>
Comma-separated list of nameservices.
</description>
</property>
<property>
<name>dfs.ha.namenodes.cluster</name>
<value>nn1,nn2</value>
<description>
The prefix for a given nameservice, contains a comma-separated
list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
Unique identifiers for each NameNode in the nameservice, delimited by
commas. This will be used by DataNodes to determine all the NameNodes
in the cluster. For example, if you used Ἶ@~\myclusterἾ@~] as
thh
e nameservice
ID previously, and you wanted to use Ἶ@~\nn1Ἶ@~] and Ἶ@~\nn22
Ἶ@@
~] as the individual
IDs of the NameNodes, you would configure a property
dfs.ha.namenodes.mycluster, and its value "nn1,nn2".
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster.nn1</name>
<value>master:8020</value>
<description>
A comma separated list of auxiliary ports for the NameNode to listen on.
This allows exposing multiple NN addresses to clients.
Particularly, it is used to enforce different SASL levels on different ports.
Empty list indicates that auxiliary ports are disabled.
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.cluster.nn2</name>
<value>slave1:8020</value>
<description>
A comma separated list of auxiliary ports for the NameNode to listen on.
This allows exposing multiple NN addresses to clients.
Particularly, it is used to enforce different SASL levels on different ports.
Empty list indicates that auxiliary ports are disabled.
</description>
</property>
<property>
<name>dfs.namenode.http-address.cluster.nn1</name>
<value>master:9870</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.http-address.cluster.nn2</name>
<value>slave1:9870</value>
<description>
The address and the base port where the dfs namenode web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;slave1:8485;slave2:8485/cluster</value>
<description>A directory on shared storage between the multiple namenodes
in an HA cluster. This directory will be written by the active and read
by the standby in order to keep the namespaces synchronized. This directory
does not need to be listed in dfs.namenode.edits.dir above. It should be
left empty in a non-HA cluster.
</description>
</property>
<property>
<name>dfs.client.failover.proxy.provider.cluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
<description>
The prefix (plus a required nameservice ID) for the class name of the
configured Failover proxy provider for the host. For more detailed
information, please consult the "Configuration Details" section of
the HDFS High Availability documentation.
</description>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
**网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。**
**需要这份系统化的资料的朋友,可以添加V获取:vip204888 (备注大数据)**
![img](https://img-blog.csdnimg.cn/img_convert/da0c921983174db7203097009f64a0a1.png)
**一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!**
section of
the HDFS High Availability documentation.
</description>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
**网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。**
**需要这份系统化的资料的朋友,可以添加V获取:vip204888 (备注大数据)**
[外链图片转存中...(img-ce7645nZ-1713288893084)]
**一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!**