网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。
一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!
dfs.namenode.name.dir
/home/cluster/hadoop/data/nn
dfs.datanode.data.dir
/home/cluster/hadoop/data/dn
dfs.journalnode.edits.dir
/home/cluster/hadoop/data/jn
dfs.nameservices
ns1
dfs.ha.namenodes.ns1
hadoop001,hadoop002
dfs.namenode.rpc-address.ns1.hadoop001
hadoop001:8020
dfs.namenode.http-address.ns1.hadoop001
hadoop001:9870
dfs.namenode.rpc-address.ns1.hadoop002
hadoop002:8020
dfs.namenode.http-address.ns1.hadoop002
hadoop002:9870
dfs.ha.automatic-failover.enabled.ns1
true
dfs.client.failover.proxy.provider.ns1
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.permissions.enabled
false
dfs.replication
2
dfs.blocksize
HDFS blocksize of 128MB for large file-systems
dfs.namenode.handler.count
100
More NameNode server threads to handle RPCs from large number of DataNodes.
dfs.namenode.shared.edits.dir
qjournal://hadoop001:8485;hadoop002:8485;hadoop003:8485/ns1
dfs.ha.fencing.methods
sshfence
dfs.ha.fencing.ssh.private-key-files
/root/.ssh/id_rsa
mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>mapreduce.framework.name
yarn
Execution framework set to Hadoop YARN.
mapreduce.map.memory.mb
4096
Larger resource limit for maps.
mapreduce.map.java.opts
-Xmx4096M
Larger heap-size for child jvms of maps.
mapreduce.reduce.memory.mb
4096
Larger resource limit for reduces.
mapreduce.reduce.java.opts
-Xmx4096M
Larger heap-size for child jvms of reduces.
mapreduce.task.io.sort.mb
2040
Higher memory-limit while sorting data for efficiency.
mapreduce.task.io.sort.factor
400
More streams merged at once while sorting files.
mapreduce.reduce.shuffle.parallelcopies
200
Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.
mapreduce.jobhistory.address
hadoop001:10020
MapReduce JobHistory Server host:port.Default port is 10020
mapreduce.jobhistory.webapp.address
hadoop001:19888
MapReduce JobHistory Server Web UI host:port.Default port is 19888.
mapreduce.jobhistory.intermediate-done-dir
/tmp/mr-history/tmp
Directory where history files are written by MapReduce jobs.
mapreduce.jobhistory.done-dir
/tmp/mr-history/done
Directory where history files are managed by the MR JobHistory Server.
yarn-site.xml
<?xml version="1.0"?>yarn.resourcemanager.ha.enabled
true
yarn.resourcemanager.ha.automatic-failover.enabled
true
yarn.resourcemanager.ha.automatic-failover.embedded
true
yarn.resourcemanager.cluster-id
yarn-rm-cluster
yarn.resourcemanager.ha.rm-ids
rm1,rm2
yarn.resourcemanager.hostname.rm1
hadoop001
yarn.resourcemanager.hostname.rm2
hadoop002
yarn.resourcemanager.recovery.enabled
true
yarn.resourcemanager.zk.state-store.address
hadoop001:2181,hadoop002:2181,hadoop003:2181
yarn.resourcemanager.zk-address
hadoop001:2181,hadoop002:2181,hadoop003:2181
yarn.resourcemanager.address.rm1
hadoop001:8032
yarn.resourcemanager.address.rm2
hadoop002:8032
yarn.resourcemanager.scheduler.address.rm1
hadoop001:8034
yarn.resourcemanager.webapp.address.rm1
hadoop001:8088
yarn.resourcemanager.scheduler.address.rm2
hadoop002:8034
yarn.resourcemanager.webapp.address.rm2
hadoop002:8088
yarn.acl.enable
true
Enable ACLs? Defaults to false.
yarn.admin.acl
*
yarn.log-aggregation-enable
false
Configuration to enable or disable log aggregation
yarn.resourcemanager.hostname
hadoop001
host Single hostname that can be set in place of setting all yarn.resourcemanager*address resources. Results in default ports for ResourceManager components.
yarn.scheduler.minimum-allocation-mb
1024
saprk调度时一个container能够申请的最小资源,默认值为1024MB
yarn.scheduler.maximum-allocation-mb
28672
saprk调度时一个container能够申请的最大资源,默认值为8192MB
yarn.nodemanager.resource.memory-mb
28672
nodemanager能够申请的最大内存,默认值为8192MB
yarn.app.mapreduce.am.resource.mb
28672
AM能够申请的最大内存,默认值为1536MB
yarn.nodemanager.log.retain-seconds
10800
yarn.nodemanager.log-dirs
/home/cluster/yarn/log/1,/home/cluster/yarn/log/2,/home/cluster/yarn/log/3
yarn.nodemanager.aux-services
mapreduce_shuffle
Shuffle service that needs to be set for Map Reduce applications.
yarn.log-aggregation.retain-seconds
-1
yarn.log-aggregation.retain-check-interval-seconds
-1
yarn.app.mapreduce.am.staging-dir
hdfs://ns1/tmp/hadoop-yarn/staging
The staging dir used while submitting jobs.
yarn.application.classpath
/usr/local/hadoop/hadoop/etc/hadoop:/usr/local/hadoop/hadoop/share/hadoop/common/lib/:/usr/local/hadoop/hadoop/share/hadoop/common/:/usr/local/hadoop/hadoop/share/hadoop/hdfs:/usr/local/hadoop/hadoop/share/hadoop/hdfs/lib/:/usr/local/hadoop/hadoop/share/hadoop/hdfs/:/usr/local/hadoop/hadoop/share/hadoop/mapreduce/:/usr/local/hadoop/hadoop/share/hadoop/yarn:/usr/local/hadoop/hadoop/share/hadoop/yarn/lib/:/usr/local/hadoop/hadoop/share/hadoop/yarn/*
Linux上打 hadoop classpath 找到的所有路径
五、初始化集群
1、启动zookeeper
由于hadoop的HA机制依赖于zookeeper,因此先启动zookeeper集群
如果zookeeper集群没有搭建参考:大数据高可用技术之zookeeper3.4.5安装配置_qq262593421的博客-CSDN博客
zkServer.sh start
zkServer.sh status
2、在zookeeper中初始化元数据
hdfs zkfc -formatZK
3、启动zkfc
hdfs --daemon start zkfc
4、启动JournalNode
格式化NameNode前必须先格式化JournalNode,否则格式化失败
这里配置了3个JournalNode节点,hadoop001、hadoop002、hadoop003
hdfs --daemon start journalnode
5、格式化NameNode
在第一台NameNode节点上执行
hdfs namenode -format
6、启动hdfs
start-all.sh
7、同步备份NameNode
等hdfs初始化完成之后(20秒),在另一台NameNode上执行
hdfs namenode -bootstrapStandby
如果格式化失败或者出现以下错误,把对应节点上的目录删掉再重新格式化
网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。
一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!
4、启动JournalNode
格式化NameNode前必须先格式化JournalNode,否则格式化失败
这里配置了3个JournalNode节点,hadoop001、hadoop002、hadoop003
hdfs --daemon start journalnode
5、格式化NameNode
在第一台NameNode节点上执行
hdfs namenode -format
6、启动hdfs
start-all.sh
7、同步备份NameNode
等hdfs初始化完成之后(20秒),在另一台NameNode上执行
hdfs namenode -bootstrapStandby
如果格式化失败或者出现以下错误,把对应节点上的目录删掉再重新格式化
[外链图片转存中…(img-FTQmuxpx-1715620984651)]
[外链图片转存中…(img-0RhWtO1N-1715620984651)]
网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。
一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!