Python语言实现cat命令、正则表达式应用

最新推荐文章于 2023-06-30 22:47:27 发布

企鹅与蟒蛇

最新推荐文章于 2023-06-30 22:47:27 发布

阅读量1.6k

点赞数

分类专栏： Python 文章标签： python 开发语言

本文链接：https://blog.csdn.net/ikkyphoenix/article/details/125820977

版权

Python 专栏收录该内容

14 篇文章 2 订阅

订阅专栏

1. 实现cat命令
2. 使用正则表达式过滤掉英文和数字
3. 阐述如何设计一个日志分析系统

1. 实现cat命令

要求： 支持查看文件内容和-n（打印行号）参数功能即可。

需求分析：cat命令要求至少带一个文件参数，则将文件内容打印到标准输出中；如果带多个文件作为参数，则相继将文件内容打印到标准输出中；如果带目录作为参数，则提示异常。

代码实现：

输入的文件支持相对路径和绝对路径的形式，所以使用pathlib模块解析路径以及相关的路径操作；参数选项解析，需要用到argparse模块以及sys模块。

具体代码实现如下所示：

import sys
import argparse
import pathlib


'''
模拟Linux系统的cat命令，使其具有通过选项-n能够在输出文件内容的同时，打印行号
-n选项为可选选项，如果不指定-n选项，则只打印文件内容，而不打印行号。

这一版本目前要求至少需要指定一个输入文件，可以指定多个输入文件。当指定多个输入文件的时候，
会逐个打印文件内容；如果此时指定了-n选项，那么每个文件的第一行都会重新标记为1。
'''


def parse_arguments(args_lst):
	parser = argparse.ArgumentParser(prog='mycat', description='arguments for mycat programm')
	parser.add_argument('-n', '--number', action='store_false', help='display line number')
	# parser.add_argument('-f', '--files', metavar='F', type=str, nargs='+', default='sys.stdin')
	parser.add_argument('files', metavar='F', type=str, nargs='+', default='sys.stdin')

	# parser.print_help()
	args = parser.parse_args(args_lst)
	return args


def my_cat(args):
	files_lst = args.files
	for f in files_lst:
    	f_pth = pathlib.Path(f)
        	try:
                with open(f_pth, encoding='utf-8') as inp_f:
                    cont_lst = inp_f.readlines()
                    print(cont_lst)
                    if args.number:
                       	for cont in cont_lst:
                    	print(cont)
	            	else:
	                	for idx, cont in enumerate(cont_lst):
	                    	print(idx+1, cont)

	    	# except FileNotFoundError as e:
	        	#print('Exception occur: {} {}'.format(e.code, e.message))
	    	except Exception as e:
	        	print('Catch Exception: {}'.format(e))

if __name__ == '__main__':
	stdin_args_lst = sys.argv[1:]
	# print(stdin_args_lst)           # for debug: ['-n']
	ret_args = parse_arguments(stdin_args_lst)
	# print(dir(ret_args))            # for debug: ['__class__', ....., '_get_kwargs', 'number']
	# print(ret_args.files)
	my_cat(ret_args)

上述代码的执行效果分别如下所示：

不带任何参数的时候

$ python my_cat.py
usage: mycat [-h] [-n] F [F ...]
mycat: error: the following arguments are required: F
此时会提示错误，同时指出错误原因，缺少位置参数F

只带-h选项的时候

$ python my_cat.py -h
usage: mycat [-h] [-n] F [F ...]

arguments for mycat programm

positional arguments:
F

optional arguments:
-h, --help    show this help message and exit
-n, --number  display line number

此时不报错，而是打印帮助信息

当既指定-h选项，又指定确实存在的文件的时候

$ python my_cat.py -h test1.txt
usage: mycat [-h] [-n] F [F ...]

arguments for mycat programm

positional arguments:
F

optional arguments:
-h, --help    show this help message and exit
-n, --number  display line number

或者是下面的这种形式：

$ python my_cat.py test1.txt -h
usage: mycat [-h] [-n] F [F ...]

arguments for mycat programm

positional arguments:
F

optional arguments:
-h, --help    show this help message and exit
-n, --number  display line number

这两种形式都会直接打印帮助信息，而不会打印文件信息，这是符合-h选项的逻辑的。

当指定1个确实存在的文件的时候

$ python my_cat.py test1.txt
['Hello World\n', 'This is Python\n', 'Welcome to my world\n']
Hello World

This is Python

Welcome to my world

此时会将文件内容逐行打印出来。如果再指定-n选项，则会在打印的文件内容前面加上行号。具体如下所示：

$ python my_cat.py test1.txt -n
['Hello World\n', 'This is Python\n', 'Welcome to my world\n']
1 Hello World

2 This is Python

3 Welcome to my world

当指定2个确实存在的文件的时候

['Hello World\n', 'This is Python\n', 'Welcome to my world\n']
Hello World

This is Python

Welcome to my world

['Life is short, you need Python\n', 'Python is a snake, but not only a snake\n']
Life is short, you need Python

Python is a snake, but not only a snake

此时会接连打印两个文件的内容，如果此时再指定-n选项，则会在打印的文件内容前面加上行号。具体如下所示：

$ python my_cat.py test1.txt test2.txt -n
['Hello World\n', 'This is Python\n', 'Welcome to my world\n']
1 Hello World

2 This is Python

3 Welcome to my world

['Life is short, you need Python\n', 'Python is a snake, but not only a snake\n']
1 Life is short, you need Python

2 Python is a snake, but not only a snake

从上述输出中可以看出，每个文件分别计数行号。

当指定的文件不存在的时候

$ python my_cat.py test3.txt
Catch Exception: [Errno 2] No such file or directory: 'test3.txt'
此时会提示文件异常，不存在该文件。

当指定目录作为参数的时候

$ python my_cat.py .
Catch Exception: [Errno 13] Permission denied: '.'

此时会提示权限异常。

至此，基本实现了一个简单的cat形式的命令。

2. 使用正则表达式过滤掉英文和数字

要求： 有字符串”not 404 found 张三 99 深圳”，使用正则表达式过滤掉英文和数字，最终得到”张三深圳”。

具体解决过程如下所示：

在下面的方案中，先拿掉数字，然后从得到的结果中拿掉英文字母和空格，最后通过字符串的str.join方法将得到的结果拼接起来。具体如下所示：

import re


ori_str = 'not 404 found 张三 99 深圳'
# 拿掉上述字符串中的数字和英文字母，最终留下'张三 深圳'

comp1 = re.compile('[^0-9]+')
print(comp1.findall(ori_str))

comp2 = re.compile('[^a-z ]+')
print(comp2.findall(ori_str))

res1 = ''.join(comp1.findall(ori_str))
print(comp2.findall(res1))
fin_res = ' '.join(comp2.findall(res1))
print(fin_res, type(fin_res))

上述代码的输出结果如下所示：

['not ', ' found 张三 ', ' 深圳']
['404', '张三', '99', '深圳']
['张三', '深圳']
张三 深圳 <class 'str'>

上述过程是分为两步最终获取到所需的结果。将上述过程合并，具体如下所示：

import re


ori_str = 'not 404 found 张三 99 深圳'
'''
将上述过程汇总，得到如下结果
'''
comp = re.compile('[^0-9a-z ]+')
res = comp.findall(ori_str)
fin_res = ' '.join(res)
print(fin_res, type(fin_res))