大数据面试01之Linux&Shell

藏在云层

于 2023-05-19 21:04:34 发布

阅读量137

点赞数

文章标签： linux 大数据面试

本文链接：https://blog.csdn.net/qq_61843271/article/details/130774030

版权

1.1 Linux常用高级命令

序号	命令	命令解释
1	top	查看内存
2	df -h	查看磁盘存储情况
3	iotop	查看磁盘IO读写(yum install iotop安装）
4	iotop -o	直接查看比较高的磁盘读写程序
5	netstat -tunlp \| grep 端口号	查看端口占用情况
6	uptime	查看报告系统运行时长及平均负载
7	ps -aux	查看进程

1.2 Shell常用工具及写过的脚本

1）awk、sed、cut、sort

2）用Shell写过哪些脚本

（1）集群启动，分发脚本

（2）数仓与mysql的导入导出

（3）数仓层级内部的导入

1.3 Shell中提交了一个脚本，进程号已经不知道了，但是需要kill掉这个进程，怎么操作?

ssh $i "ps -ef | grep file-flume-kafka | grep -v grep |awk '{print \$2}' | xargs kill"

1.4 Shell中单引号和双引号区别

1）在/home/atguigu/bin创建一个test.sh文件

[atguigu@hadoop102 bin]$ vim test.sh

在文件中添加如下内容

#!/bin/bash
do_date=$1

echo '$do_date'
echo "$do_date"
echo "'$do_date'"
echo '"$do_date"'
echo `date

2）查看执行结果

[atguigu@hadoop102 bin]$ test.sh 2019-02-10

$do_date
2019-02-10
'2019-02-10'
"$do_date"
2019年 05月 02日 星期四 21:02:08 CST

3）总结：

单引号不取变量值
双引号取变量值
反引号`，执行引号中命令
双引号内部嵌套单引号，取出变量值
单引号内部嵌套双引号，不取出变量值

1.5 常见面试题

1.现有file1文件，内容如下
张三 40
李四 50
王五 60

	问题（1）：使用Linux命令查询file1中空行所在的行号
	awk '/^$/{print NR}' file1                注释：^匹配开头 $匹配结尾 中间为空，即为空行
	问题（2）：使用Linux命令计算第二列的和并输出
	cat file1 | awk -F " " '{sum+=$2} END{print sum}'

2.Shell脚本里如何检查一个文件是否存在？如果不存在该如何处理
#!/bin/bash
if [ -e $1 ]
then 
	echo "$1存在！"
else
	echo "$1不存在！"
fi

3.用Linux命令(Shell脚本)，对文本中无序的第一列数字排序，排序完成后并求和然后输出
vim test.txt
9
3
2
5
10

sort -n test2.txt | awk '{sum+=$0} END{print sum}'

4.请用shell脚本写出查找当前文件夹（/home）下所有的文本文件内容包含有字符"shen"的文件名称
grep -r "shen" /home 
# grep -r "shen" /home | cut -d ":" -f 1

5.判断一文件是不是字符设备文件，如果是将其拷贝到/dev目录下
#!/bin/bash
reap -p "Input file name:" file_name
if [ -c $file_name ]
then 
	cp $file_name /dev
else 
	echo "This file is not charfile!"
if

6.添加一个新组为class1,然后添加这个组的30个用户，用户名的形式为stdxx,其中xx从01到30 ？
#!/bin/bash
groupadd class1
for ((i=1;i<31;i++))
do
	if[ i -lt 10]
	then
		useradd -g class1 std0$i
	else
		useradd -g class1 std$i
	fi
done

7.编写shell程序，实现自动删除50个账号的功能，账号名为stud1至stud50
#!/bin/bash
for ((i=1;i<=50;i++))
do
	userdel -r stud$i
done

8.写一个sed命令，修改 /tmp/input.txt的文件内容
要求：删除所有空行
一行中，如果包含有 "11111" 则在"11111"前面插入 "AAA"，在"11111"后面插入 "BBB",比如将0000111112222
的一行改为0000AAA11111BBB2222
input.txt 的内容如下
000011111222

000011111222222
11111000000222


111111111111122222222222
2211111111
112222222
1122
sed -e '/^$/d' -e 's#\(11111\)#AAA\1BBB#g' /tmp/input.txt

9.Linux中单引号('')，双引号("")，反引号(``)详解
区别：
单引号：所见即所得，里面的内容会原封不动的显示出来

双引号：会解析里面的变量和特殊符号
双引号括起来的字符中，“$”、反斜杠（\）和反引号（``）是拥有特殊含义的：
$ 代表引用变量的值；
\ 反斜杠是转义字符；
`` 反引号代表引用命令。

反引号：反引号里面的内容会被优先执行 里面传入的是命令
反引号用于命令替换，即先执行反引号中的语句，再把结果加入到原命令中

例子展示：
[root@alice ~]# echo '$UID'
$UID
[root@alice ~]# echo "$UID"
0
[root@alice ~]# echo `date`
Wed May 15 09:28:43 CST 2019

[zhushouqing@hadoop177 ~]$ echo "date: `date`"
date: 2023年 01月 09日 星期一 19:57:56 CST

echo "Here 'this is a string' is a string"
>>> Here 'this is a string' is a string
echo "Here \"this is a string\" is a string"
>>> Here "this is a string" is a string

双引号单引号嵌套 ： 看谁在最外面
[zhushouqing@hadoop177 ~]$ export a=10
[zhushouqing@hadoop177 ~]$ echo '"$a"'
"$a"
[zhushouqing@hadoop177 ~]$ echo "'$a'"
'10'

常用的脚本展示：
	1.启停脚本需掌握
	基本逻辑就是
	#!/bin/bash
	if [ $# -lt 1 ]
	then
		echo "No Args,Please Input!"
		exit;
	if
	case $1 in
	"start")
		for host in hadoop177 hadoop178 hadoop179
		do
			ssh $host "绝对路径"
		done
	;;
	"stop")
		for host in hadoop177 hadoop178 hadoop179
		do
			
		done
	;;
	*)
		echo "Input Args Error..."
	;;
	esac
	
脚本展示：
1.jpsall -->查看集群的jps进程

#!/bin/bash
for host in hadoop102 hadoop103 hadoop104
do
        echo =============== $host ===============
        ssh $host jps
done

2.myhaoop.sh -->hadoop的启停脚本
#!/bin/bash
if [ $# -lt 1 ]
then
    echo "No Args Input..."
    exit ;
fi
case $1 in
"start")
        echo " =================== 启动 hadoop集群 ==================="
        echo " --------------- 启动 hdfs ---------------"
        ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
        echo " --------------- 启动 yarn ---------------"
        ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
        echo " --------------- 启动 historyserver ---------------"
        ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
;;
"stop")
        echo " =================== 关闭 hadoop集群 ==================="
        echo " --------------- 关闭 historyserver ---------------"
        ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
        echo " --------------- 关闭 yarn ---------------"
        ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
        echo " --------------- 关闭 hdfs ---------------"
        ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
;;
*)
    echo "Input Args Error..."
;;
esac

3.xsync -->分发脚本（知道就行，不需会手写）

#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
  echo Not Enough Arguement!
  exit;
fi
#2. 遍历集群所有机器
for host in hadoop102 hadoop103 hadoop104
do
  echo ====================  $host  ====================
  #3. 遍历所有目录，挨个发送
  for file in $@
  do
    #4. 判断文件是否存在
    if [ -e $file ]
    then
      #5. 获取父目录
      pdir=$(cd -P $(dirname $file); pwd)
      #6. 获取当前文件的名称
      fname=$(basename $file)
      ssh $host "mkdir -p $pdir"
      rsync -av $pdir/$fname $host:$pdir
    else
      echo $file does not exists!
    fi
  done
done

4.zk.sh --> zookeeper 启停脚本
#!/bin/bash
if [ $# -lt 1 ]
then
    echo "No Args Input..."
    exit ;
fi

case $1 in
"start")
        for i in hadoop102 hadoop103 hadoop104
    do
        echo "=====================  $i  ======================="
        ssh $i "source /etc/profile && /opt/module/zookeeper-3.5.7/bin/zkServer.sh start"
    done
;;
"stop")
        for i in hadoop102 hadoop103 hadoop104
    do
        echo "=====================  $i  ======================="
        ssh $i "source /etc/profile && /opt/module/zookeeper-3.5.7/bin/zkServer.sh stop"
    done
;;
"status")
        for i in hadoop102 hadoop103 hadoop104
    do
        echo "=====================  $i  ======================="
        ssh $i "source /etc/profile && /opt/module/zookeeper-3.5.7/bin/zkServer.sh status"
    done
;;
*)
    echo "Input Args Error..."
;;
esac

5.kf.sh --> kafka启停脚本
#!/bin/bash

case $1 in
"start"){
    for i in hadoop102 hadoop103 hadoop104
    do
        echo " --------启动 $i Kafka-------"
        ssh $i "/opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties "
    done
};;
"stop"){
    for i in hadoop102 hadoop103 hadoop104
    do
        echo " --------停止 $i Kafka-------"
        ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh"
    done
};;
esac

期望脚本在任何路径都能使用（脚本放在声明了全局环境变量的路径）
[atguigu@hadoop102 ~]$ echo $PATH
/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/atguigu/.local/bin:/home/atguigu/bin:/opt/module/jdk1.8.0_212/bin

可以将脚本放在 /home/atguigu/bin 目录下 这样脚本在任何路径都可以使用