ssh权限问题
----------------
1.~/.ssh/authorized_keys 权限:644
2.$/.ssh 权限:700
3.root
配置SSH
-------------
生成密钥对$>ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
添加认证文件$>cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
权限设置,文件和文件夹权限除了自己之外,别人不可写。$>chmod 700 ~/.ssh$>chmod 644 ~/.ssh/authorized_keys
scp
----------
远程复制.
rsync
---------
远程同步,支持符号链接。
rsync -lr xxx xxx
完全分布式
---------------
1.配置文件
[core-site.xml]
fs.defaultFS=hdfs://s201:8020/
[hdfs-site.xml]
replication=1 //伪分布
replication=3 //完全分布
[mapred-site.xml]//配置框架
mapreduce.framework.name=yarn
[yarn-site.xml]
rm.name=s201//资源管理器主机名
[slaves]//分发,内容为数据节点所在主机名
s202
s203
s204
2.分发文件
a)ssh
openssh-server //sshd
openssh-clients //ssh
openssh //ssh-keygen
b)scp/rsync
3.格式化文件系统
$>hadoop namenode -format
4.启动hadoop所有进程
//start-dfs.sh + start-yarn.sh
$>start-all.sh
5.xcall.sh jps//使用脚本查看所有主机的进程
/usr/local/bin/jps //需要对其创建符号链接使得可执行脚本文件
/usr/local/bin/java //同上
6.查看jps进程
$>xcall.sh jps
7.关闭centos的防火墙
//d表示守护进程
$>sudo service firewalld stop // <=6.5版本 start/stop/status/restart
$>sudo systemctl stop firewalld // <=7.0版本 start/stop/status/restart
$>sudo systemctl disable firewalld //开机自动关闭
$>sudo systemctl enable firewalld //开机自动启用
8.最终通过webui
//8020是rpc端口>>名称节点,用于远程通信,http是用来使用web访问的
//50010是rpc端口>>数据节点
http://s201:50070/
符号连接
----------------
1.修改符号连接的owner
$>chown -h centos:centos xxx //-h:针对连接本身,而不是所指文件.
2.修改符号链接
$>ln -sfT index.html index //覆盖原有的连接。
hadoop模块
-------------------
common //
hdfs //
mapreduce //
yarn //
进程
------------------
[hdfs]//脚本:start-dfs.sh
NameNode NN
DataNode DN
SecondaryNamenode 2NN
[yarn]//脚本:start-yarn.sh
ResourceMananger RM
NodeManager NM
脚本分析
-------------------
引用变量的方式:“$变量名”==${变量名}=="${变量名}"
原样输出:`$变量名`
if[ a -eq b] ==
if[ a -ne b] !=
if[ a -ge b] >=
if[ a -gt b] >
if[ a -le b] <=
if[ a =lt b] <
if[ $a = $b ] = 用作赋值时两边不能有空格;用作判断时必须有空格
if[ $a != $b ] != 使用方式同上
if[ -n $str ] 非空
if[ -z $str ] 为空
if[ $str ] 非空,类似于-n
文件表达式
if[ -f file] 文件存在
if[ -e file] 文件(夹)存在
if[ -d dir] 目录
if[ -s file] 文件存在且非空
if[ -S file] 文件是Socket文件,大写S
if[ -r file] 文件可读
if[ -w file] 文件可写
if[ -x file] 文件可执行
sbin/start-all.sh
--------------
libexec/hadoop-config.sh
start-dfs.sh
start-yarn.sh
#!/usr/bin/env bash //定义环境变量
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Start all hadoop daemons. Run this on master node.
echo "This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh" //打印,使用start-dfs.sh and start-yarn.sh代替start-all.sh
bin=`dirname "${BASH_SOURCE-$0}"` //取出目录
bin=`cd "$bin"; pwd` //取出完整目录
DEFAULT_LIBEXEC_DIR="$bin"/../libexec //取出bin的上级目录的子目录libexec,就是bin的同级目录中的libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR} //三元运算符(:-),如果存在HADOOP_LIBEXEC_DIR取出变量将其赋值给HADOOP_LIBEXEC_DIR,反之将DEFAULT