MIT - The Missing Semester of Your CS Education 第2节Shell Tools and Scripting课程笔记和习题解答

最新推荐文章于 2024-05-21 22:12:28 发布

羊驼鲸鲸

最新推荐文章于 2024-05-21 22:12:28 发布

阅读量1k

点赞数 18

文章标签：笔记

本文链接：https://blog.csdn.net/2201_75758106/article/details/136165673

版权

Shell Tools and Scripting · Missing Semester (mit.edu)

使用bash作为脚本语言。

Shell Scripting

在许多情况下，您可能希望执行一系列命令并使用控制流表达式，例如条件语句或循环。

大多数shell都有自己的脚本语言，具有变量、控制流和自己的语法。Shell脚本与其他脚本编程语言的区别在于，它被优化用于执行与shell相关的任务。创建命令管道、将结果保存到文件中以及从标准输入读取都是shell脚本的基本操作，这使得它比通用脚本语言更易于使用。

将重点介绍最常用的bash脚本。

foo=bar
echo "$foo"
# prints bar
echo '$foo'
# prints $foo

在bash中分配变量时，请使用foo=bar的语法（注意没有空格），并使用$foo访问变量的值。

在bash中，字符串可以使用'和"定界符定义，但它们不是等价的。用'定界的字符串是字面字符串，不会替换变量值，而用"定界的字符串会进行变量替换。

与大多数编程语言一样，bash支持if、case、while和for等控制流技术。下面是一个创建目录并进入其中的函数示例。

mcd () {
    mkdir -p "$1"
    cd "$1"
}

这里的$1是脚本/函数的第一个参数。更详细的列表。

$0 - 脚本的名称
$1 to $9 - 脚本的参数
$@ - 所有参数
$# - 参数的个数
$? - 上一个命令的返回代码
$$ - 当前脚本的进程标识号Process identification number (PID)
!! - 整个上一条命令，包括参数
$_ -上一条命令的最后一个参数

退出代码可以用于使用 &&（与运算符）和 ||（或运算符）有条件地执行命令，这两个运算符都是short-circuiting短路运算符。命令也可以在同一行使用分号 ; 分隔。

false || echo "Oops, fail"
# Oops, fail

true || echo "Will not be printed"
#

true && echo "Things went well"
# Things went well

false && echo "Will not be printed"
#

true ; echo "This will always run"
# This will always run

false ; echo "This will always run"
# This will always run

另一种常见的模式是希望将命令的输出作为变量获取。这可以通过command substitution命令替换来实现。每当您使用$( CMD )时，它将执行 CMD，获取命令的输出并替换$( CMD )。例如，如果您执行 for file in $(ls)，Shell 将先调用 ls，然后对这些值进行迭代。

一个较少为人知的类似功能是process substitution进程替换，<( CMD ) 将执行 CMD 并将输出放置在临时文件中，并用 <( CMD )替换为该文件的名称。这在命令期望通过文件传递值而不是通过 STDIN 传递时非常有用。例如，diff <(ls foo) <(ls bar) 将显示目录 foo 和 bar 中文件之间的差异。

让我们看一个示例。

#!/bin/bash

echo "Starting program at $(date)" # Date will be substituted

echo "Running program $0 with $# arguments with pid $$"

for file in "$@"; do
    grep foobar "$file" > /dev/null 2> /dev/null
    # When pattern is not found, grep has exit status 1
    # We redirect STDOUT and STDERR to a null register since we do not care about them    
	if [[ $? -ne 0 ]]; then
        echo "File $file does not have any foobar, adding one"
        echo "# foobar" >> "$file"
    fi
done

这段代码遍历通过脚本的参数传递给脚本的文件列表（$@）。对于每个文件，它使用 grep 命令在文件中搜索字符串 “foobar”。

grep foobar "$file" > /dev/null 2> /dev/null 这一行将 grep 命令的输出重定向到空设备文件（/dev/null），这意味着不会将输出显示在终端上。> /dev/null 将标准输出重定向到空设备文件，2> /dev/null 将标准错误输出重定向到空设备文件。

接下来，使用 $? 检查 grep 命令的退出状态。如果模式 “foobar” 在文件中未找到，grep 命令的退出状态将是 1。$? -ne 0 检查退出状态是否不等于 0，即是否找不到模式。在 bash 中进行比较时，请尽量使用双方括号 [[ ]]，而不是简单的方括号 [ ]。更详细的解释。

如果找不到模式，代码将输出一条消息 “File $file does not have any foobar, adding one”，并将 “# foobar” 追加到文件的末尾。

该代码段的作用是检查文件中是否存在字符串 “foobar”，如果不存在，则将该字符串作为注释添加到文件中。

在启动脚本时，通常希望提供类似的参数。Bash 有一些方法可以使这更容易，通过执行文件名扩展来展开表达式。这些技术通常被称为 shell globbing通配。

通配符 - 每当您想要执行某种通配符匹配时，可以使用 ? 和 * 分别匹配一个或任意数量的字符。例如，给定文件 foo、foo1、foo2、foo10 和 bar，命令 rm foo? 将删除 foo1 和 foo2，而 rm foo* 将删除除 bar 之外的所有文件。
花括号 {} - 每当您在一系列命令中有一个公共子字符串时，可以使用花括号让 bash 自动展开。这在移动或转换文件时非常方便。

convert image.{png,jpg}
# Will expand to
convert image.png image.jpg

cp /path/to/project/{foo,bar,baz}.sh /newpath
# Will expand to
cp /path/to/project/foo.sh /path/to/project/bar.sh /path/to/project/baz.sh /newpath

# Globbing techniques can also be combined
mv *{.py,.sh} folder
# Will move all *.py and *.sh files


mkdir foo bar
# This creates files foo/a, foo/b, ... foo/h, bar/a, bar/b, ... bar/h
touch {foo,bar}/{a..h}
touch foo/x bar/y
# Show differences between files in foo and bar
diff <(ls foo) <(ls bar)
# Outputs
# < x
# ---
# > y

shellcheck可以帮助您在 sh/bash 脚本中找出错误。

从终端调用脚本时，并不一定需要使用 bash 编写脚本。例如，下面是一个Python 脚本，按逆序输出其参数：

#!/usr/local/bin/python
import sys
for arg in reversed(sys.argv[1:]):
    print(arg)

kernel内核知道要使用 Python 解释器而不是 shell 命令来执行此脚本，是因为我们在脚本的顶部包含了一个 shebang 行。最佳实践是使用 env 命令编写 shebang 行，该命令将解析为系统中命令所在的位置，增加了脚本的可移植性。为了解析位置，env 将使用 PATH 环境变量。对于这个示例，shebang 行应该是 #!/usr/bin/env python。

函数和脚本之间的一些区别需要牢记：

函数必须与 shell 相同的语言编写，而脚本可以用任何语言编写。这就是为什么为脚本包含 shebang 行很重要的原因。
函数在定义被读取时只加载一次。脚本每次执行时都会加载。这使得函数加载稍快，但每当您更改函数时，都必须重新加载其定义。
函数在当前 shell 环境中执行，而脚本在它们自己的进程中执行。因此，函数可以修改环境变量，例如更改当前目录，而脚本不能。脚本将以传值方式接收已使用 export 导出的环境变量。
与任何编程语言一样，函数是一种强大的构造，用于实现模块化、代码重用和 shell 代码的清晰性。通常，Shell 脚本将包含它们自己的函数定义。

Shell Tools

Finding how to use commands

给定一个命令，您如何了解它的功能和不同选项？

第一种方法是使用 -h 或 --help 标志调用该命令。更详细的方法是使用 man 命令。man 是 manual（手册）的缩写。对于基于 ncurses 的交互式工具，可以在程序内部使用 :help 命令或键入 ? 来访问命令的帮助信息。

manpages 可能会提供过于详细的命令描述，TLDR pages是一个解决方案，重点是提供命令的示例用法。

Finding files

所有类 UNIX 系统都预装了 find，这是一个很好的用于查找文件的 shell 工具。find 将递归地搜索与某些条件匹配的文件。

# Find all directories named src
find . -name src -type d
# Find all python files that have a folder named test in their path
find . -path '*/test/*.py' -type f
# Find all files modified in the last day
find . -mtime -1
# Find all zip files with size in range 500k to 10M
find . -size +500k -size -10M -name '*.tar.gz'

find 有助于简化乏味的任务。

# Delete all files with .tmp extension
find . -name '*.tmp' -exec rm {} \;
# Find all PNG files and convert them to JPG
find . -name '*.png' -exec convert {} {}.jpg \;
#`\;`表示命令的结束

find的语法有时很难记住。例如，要简单地查找与某个模式 PATTERN 匹配的文件，您必须执行 find -name '*PATTERN*'（如果要进行不区分大小写的模式匹配，则使用 -iname）。fd 是 find 的一个简单、快速和用户友好的替代品。它的语法也更直观。例如，查找模式 PATTERN 的语法是 fd PATTERN。

locate 使用一个数据库，该数据库通过 updatedb 进行更新。在大多数系统中，updatedb 会每天通过 cron 进行更新。此外，find 和类似的工具还可以使用文件大小、修改时间或文件权限等属性来查找文件，而 locate 只使用文件名。

更详细的比较：

find是一个非常古老和广泛使用的工具，具有强大的表达式语法和多种选择文件的方式，例如按文件名、大小、所有者、时间戳、权限等。它可以在指定的目录中搜索文件，并可以在找到文件后对其进行操作。它实时运行，因此输出始终是最新的。

locate是一个比较新的工具，它的优势在于速度。它使用一个预先建立的数据库，通过更新数据库来进行文件搜索。它可以快速地找到匹配模式的文件，但由于数据库可能是几小时或几天前更新的，所以它的输出可能会过时或不准确。

Finding code

很多时候您希望根据文件内容进行搜索。一个常见的场景是希望搜索包含某个pattern的所有文件，以及该模式在这些文件中出现的位置。大多数类 UNIX 系统提供了 grep。

grep 具有许多flags。经常使用的一些flags是 -C（用于获取匹配行周围的上下文）和 -v（用于反转匹配，即打印所有不匹配模式的行）。例如，grep -C 5 将打印匹配行前后的 5 行。当需要快速搜索许多文件时，您应该使用 -R，因为它会递归进入目录并查找匹配字符串的文件。

grep -R可以在许多方面进行改进，比如忽略.git文件夹、使用多CPU支持等。已经开发了许多grep的替代工具，包括ack, ag 和 rg。目前，我选择使用ripgrep（rg），因为它快速且直观。以下是一些示例：

# Find all python files where I used the requests library
rg -t py 'import requests'
# Find all files (including hidden files) without a shebang line
rg -u --files-without-match "^#\!"
# Find all matches of foo and print the following 5 lines
rg foo -A 5
# Print statistics of matches (# of matched lines and files )
rg --stats PATTERN

Finding shell commands

按上箭头会重现上一个命令，如果您持续按下它，您将逐渐遍历您的shell历史记录。

history命令可以让您以编程方式访问shell历史记录。它将将您的shell历史记录打印到标准输出。如果我们想在其中搜索，可以将该输出通过管道传递给grep并搜索模式。history | grep find将打印包含子字符串"find"的命令。

在大多数shell中，您可以使用Ctrl+R进行向后搜索历史记录。在按下Ctrl+R后，您可以键入要与历史记录中的命令匹配的子字符串。随着您不断按下，您将在历史记录中循环显示匹配项。在zsh中，还可以通过使用UP/DOWN箭头启用此功能。Ctrl+R的一个很好的补充是使用fzf绑定。fzf是一个通用的模糊查找工具。

我喜欢的另一个技巧是基于历史记录的自动建议。这个功能最初由fish shell引入，它会动态地将您当前的shell命令自动完成为您最近输入的具有相同前缀的命令。它可以在zsh中启用。

您可以修改您的shell的历史记录行为，比如防止带有前导空格的命令被包含在内。当您键入带有密码或其他敏感信息的命令时，这非常方便。要做到这一点，在您的.bashrc中添加HISTCONTROL=ignorespace或在您的.zshrc中添加setopt HIST_IGNORE_SPACE。

Directory Navigation，目录导航

如何快速导航目录呢？有许多简单的方法可以做到这一点，比如编写shell别名或使用ln -s创建符号链接。

writing shell aliases：您可以在shell配置文件（如.bashrc或.zshrc）中创建别名来快速导航到常用的目录。例如：

alias proj='cd /path/to/project'

每当您在终端中输入"proj"，它就会自动执行"cd /path/to/project"命令。

creating symlinks with ln -s：可以创建一个符号链接，将其放置在您希望快速访问的位置，并将其链接到您要导航的目标目录。例如：

ln -s /path/to/project ~/proj

您可以通过输入"cd ~/proj"来快速导航到该目录。

通过工具如fasd 和 autojump，可以查找频繁和（或）最近的文件和目录。Fasd根据频率和最近性对文件和目录进行排名。默认情况下，fasd添加了一个名为z的命令，您可以使用它来快速使用frecent目录的子字符串进行cd。例如，如果您经常转到/home/user/files/cool_project，您可以简单地使用z cool跳转到该目录。使用autojump，可以使用j cool实现相同的目录更改。

还有更复杂的工具可以快速查看目录结构：tree, broot，甚至完整的文件管理器，如nnn , ranger。

我尝试tree:

daisy@Daisy:/tmp/missing$ tree
Command 'tree' not found, but can be installed with:
sudo apt install tree

daisy@Daisy:/tmp/missing$ sudo apt install tree
E: Unable to locate package tree

出现了报错，E: Unable to locate package XXX 的解决办法-CSDN博客

daisy@Daisy:/tmp/missing$ sudo apt-get update
daisy@Daisy:/tmp/missing$ sudo apt install tree
daisy@Daisy:/tmp/missing$ tree
.
├── marco.sh
└── semester
0 directories, 2 files

Exercises

Solution-Shell 工具和脚本 · the missing semester of your cs education (missing-semester-cn.github.io)

1.阅读man ls并编写一个ls命令，以以下方式列出文件

包括所有文件，包括隐藏文件
以人类可读的格式列出大小（例如，454M而不是454279954）
按照最近性排序文件
输出带有颜色

示例:

 -rw-r--r--   1 user group 1.1M Jan 14 09:53 baz
 drwxr-xr-x   5 user group  160 Jan 14 09:53 .
 -rw-r--r--   1 user group  514 Jan 14 06:42 bar
 -rw-r--r--   1 user group 106M Jan 13 12:12 foo
 drwx------+ 47 user group 1.5K Jan 12 18:08 ..

包括所有文件，包括隐藏文件:

-a, --all
	do not ignore entries starting with .

以人类可读的格式列出大小（例如，454M）:

--block-size=SIZE
	with -l, scale sizes by SIZE when printing them; e.g.,
	'--block-size=M'; see SIZE format below

-h, --human-readable
	with -l and -s, print sizes like 1K 234M 2G etc.

按照最近性排序文件:

-c     with -lt: sort by, and show, ctime (time of last change of
       file status information); with -l: show ctime and sort by
       name; otherwise: sort by ctime, newest first

-t     sort by time, newest first; see --time

输出带有颜色:

--color[=WHEN]
	color the output WHEN; more info below
	   Using color to distinguish file types is disabled both by default
       and with --color=never.  With --color=auto, ls emits color codes
       only when standard output is connected to a terminal.  The
       LS_COLORS environment variable can change the settings.  Use the
       dircolors(1) command to set it.

daisy@Daisy:$ ls -a --block-size=M -c --color=auto
.motd_shown    .         last-modified.txt          .bash_logout  .profile.bash_history  .lesshst  .sudo_as_admin_successful  .bashrc       ..

以上是错误的，因为–block-size=SIZE和-c都需要结合-l使用。

daisy@Daisy:~$ ls -a -l --block-size=M -c --color=auto
total 1M
drwxr-x--- 2 daisy daisy 1M Jan 30 20:13 .
drwxr-xr-x 3 root  root  1M Jan 30 14:28 ..
-rw------- 1 daisy daisy 1M Jan 30 20:32 .bash_history
-rw-r--r-- 1 daisy daisy 1M Jan 30 14:28 .bash_logout
-rw-r--r-- 1 daisy daisy 1M Jan 30 14:28 .bashrc
-rw------- 1 daisy daisy 1M Jan 30 20:13 .lesshst
-rw-r--r-- 1 daisy daisy 0M Feb  4 14:44 .motd_shown
-rw-r--r-- 1 daisy daisy 1M Jan 30 14:28 .profile
-rw-r--r-- 1 daisy daisy 0M Jan 30 15:02 .sudo_as_admin_successful
-rw-r--r-- 1 daisy daisy 1M Jan 30 16:33 last-modified.txt

发现以上没有按照最近性排序文件，因为-c和-l结合是按照文件名排序并展示最后一次更改文件的时间。

daisy@Daisy:~$ ls -a -l --block-size=M -t --color=auto
total 1M
-rw-r--r-- 1 daisy daisy 0M Feb  4 14:44 .motd_shown
-rw------- 1 daisy daisy 1M Jan 30 20:32 .bash_history
drwxr-x--- 2 daisy daisy 1M Jan 30 20:13 .
-rw------- 1 daisy daisy 1M Jan 30 20:13 .lesshst
-rw-r--r-- 1 daisy daisy 1M Jan 30 16:33 last-modified.txt
-rw-r--r-- 1 daisy daisy 0M Jan 30 15:02 .sudo_as_admin_successful
-rw-r--r-- 1 daisy daisy 1M Jan 30 14:28 .bash_logout
-rw-r--r-- 1 daisy daisy 1M Jan 30 14:28 .bashrc
-rw-r--r-- 1 daisy daisy 1M Jan 30 14:28 .profile
drwxr-xr-x 3 root  root  1M Jan 30 14:28 ..

以上命令应该可以。标准答案是使用-a -h -t --color=auto。

2.在bash中编写marco和polo两个函数，分别实现以下功能。每当执行marco命令时，当前工作目录应以某种方式保存，然后当执行polo命令时，无论您在哪个目录中，它都应将您切换回执行marco命令时的目录。为了方便调试，您可以将代码编写到一个名为marco.sh的文件中，并通过执行source marco.sh来加载定义到您的shell中。

 #!/bin/bash
 marco(){
     echo "$(pwd)" > $HOME/marco_history.log
     echo "save pwd $(pwd)"
 }
 polo(){
     cd "$(cat "$HOME/marco_history.log")"
 }

或者

 #!/bin/bash
 marco() {
     export MARCO=$(pwd)
 }
 polo() {
     cd "$MARCO"
 }

因为有空格，bash可能处理成参数，所以套上引号，单引号是表面字符串，双引号会执行引号中的内容。$()会执行括号中的内容。

3.假设您有一个很少出现故障的命令。为了调试它，您需要捕获其输出，但是获取失败运行可能很耗时。编写一个bash脚本，重复运行以下脚本，直到它失败，并将其标准输出和错误流保存到文件中，并在最后打印所有内容。如果您能报告脚本失败之前运行的次数，将获得额外的分数。

 #!/usr/bin/env bash

 n=$(( RANDOM % 100 ))

 if [[ n -eq 42 ]]; then
    echo "Something went wrong"
    >&2 echo "The error was using magic numbers"
    exit 1
 fi

 echo "Everything went according to plan"

daisy@Daisy:/tmp/missing$ vim buggy.sh
daisy@Daisy:/tmp/missing$ cat buggy.sh
 #!/usr/bin/env bash

 n=$(( RANDOM % 100 ))

 if [[ n -eq 42 ]]; then
    echo "Something went wrong"
    >&2 echo "The error was using magic numbers"
    exit 1
 fi

 echo "Everything went according to plan"

daisy@Daisy:/tmp/missing$ touch debug.sh
daisy@Daisy:/tmp/missing$ vim debug.sh
daisy@Daisy:/tmp/missing$ cat debug.sh
#!/usr/bin/env bash
count=0
output_file="output.txt"

while true; do
  ((count++))
  echo "Running script, attempt: $count"

# 执行脚本并将输出保存到临时文件
  ./buggy.sh > "$output_file" 2>&1

# 检查脚本的退出状态
  if [[ $? -ne 0 ]]; then
    echo "Script failed after $count attempts"
    break
  fi
done

# 打印所有内容
echo "Script output:"
cat "$output_file"

# 打印脚本失败之前的运行次数
echo "Number of successful attempts: $((count-1))"

daisy@Daisy:/tmp/missing$ ./debug.sh
-bash: ./debug.sh: Permission denied

sh文件出现错误：Permission denied解决办法_sh permission denied-CSDN博客

daisy@Daisy:/tmp/missing$ ls -l
total 16
-rw-r--r-- 1 daisy daisy 210 Feb  8 14:50 buggy.sh
-rw-r--r-- 1 daisy daisy 498 Feb  8 14:52 debug.sh
daisy@Daisy:/tmp/missing$ chmod +x {debug,buggy}.sh
daisy@Daisy:/tmp/missing$ ls -l
total 16
-rwxr-xr-x 1 daisy daisy 210 Feb  8 14:50 buggy.sh
-rwxr-xr-x 1 daisy daisy 498 Feb  8 14:52 debug.sh
daisy@Daisy:/tmp/missing$ ./debug.sh
Running script, attempt: 1
Running script, attempt: 2
......
Running script, attempt: 162
Script failed after 162 attempts
Script output:
Something went wrong
The error was using magic numbers
Number of successful attempts: 161

标准答案Solution-Shell 工具和脚本 · the missing semester of your cs education (missing-semester-cn.github.io)。

4.正如我们在讲座中所讨论的，find命令的-exec选项可用于对我们搜索到的文件执行操作，非常强大。但是，如果我们想对所有文件执行某些操作，比如创建一个zip文件，怎么办？到目前为止，您已经看到命令可以从参数和STDIN中获取输入。当使用管道连接命令时，我们将STDOUT连接到STDIN，但是像tar这样的一些命令从参数中获取输入。为了解决这个问题，可以使用xargs命令，它将使用STDIN作为参数执行命令。例如，ls | xargs rm将删除当前目录中的文件。

您的任务是编写一个命令，递归查找文件夹中的所有HTML文件，并将它们打包成一个zip文件。请注意，您的命令应该可以处理文件名中包含空格的情况（提示：检查xargs的-d标志）。

find /path/to/folder -type f -name "*.html" -print0 | xargs -0 zip html_files.zip

find /path/to/folder：在/path/to/folder路径下递归查找文件。
-type f：只匹配文件，不包括文件夹。
-name "*.html"：只匹配文件名以.html结尾的文件。
-print0：打印文件名，并使用null字符作为分隔符，以处理文件名中包含空格或特殊字符的情况。
|：将find的输出通过管道传递给下一个命令。
xargs -0 zip html_files.zip：使用xargs命令将zip命令应用于find的输出，并将所有HTML文件打包成html_files.zip。

在这个命令中，-print0和-0选项结合使用，确保文件名中的空格或特殊字符被正确处理。

标准答案：

#for Linux
find . -type f -name "*.html" | xargs -d '\n'  tar -cvzf html.zip

find . -type f -name "*.html"：使用find命令在当前目录及其子目录中查找所有以.html结尾的文件。
xargs -d '\n'：-d '\n'指定换行符为分隔符，以处理文件名中包含空格或特殊字符的情况。
tar -cvzf html.zip：tar命令用于创建一个名为html.zip的压缩文件，并将输入的文件添加到压缩文件中。-c表示创建新的压缩文件，-v表示显示详细信息，-z表示使用gzip压缩算法，-f指定压缩文件的名称。

  a html_root/3.html
  a html_root/html/xxxx.html
  a html_root/2.html
  a html_root/1.html

输出部分的内容是tar命令的详细信息，显示了每个文件被添加到压缩文件中的过程（以a开头表示添加文件）。

5.（Advanced）编写一个命令或脚本，递归查找目录中最近修改的文件。更一般地说，您能否按照最近性列出所有文件？

find . -type f -mmin -60 -print0 | xargs -0 ls -lt | head -10

find . -type f -mmin -60 -print0：使用find命令在当前目录及其子目录中查找类型为文件（-type f）且最近60分钟内修改过的文件（-mmin -60）。-print0表示使用null字符作为分隔符，以处理文件名中包含空格或特殊字符的情况。
xargs -0 ls -lt：-0指定使用null字符作为分隔符，与find命令的输出配合使用。ls -lt用于列出文件，并按照修改时间的倒序排序，最近修改的文件排在前面。
head -10：head命令用于显示输出的前10行，即显示最后修改的前10个文件。