py的征途2之简例分享

最新推荐文章于 2021-02-18 10:27:05 发布

这周末在做梦

最新推荐文章于 2021-02-18 10:27:05 发布

阅读量646

点赞数

分类专栏： py征途文章标签： python 字符串

本文链接：https://blog.csdn.net/weixin_46203060/article/details/112552648

版权

py征途专栏收录该内容

4 篇文章 0 订阅

订阅专栏

老菜了

一，pyhton源代码大小写转换
二，简易web爬取

一，pyhton源代码大小写转换

1题目描述与分析

（1）题目

编写一个程序，读取一个Python源代码文件，将文件中所有除保留字外的小写字母转换成大写字母，生成后的文件要能被python解释器正确执行。

（2）思路

我在网上查阅得到的思路一般都是替换的方法，但是这种方法无法解决字符串中间被替换的现象（例，import和or都是keyword，但import中包含or子串）。因此，我按照全部大写，再筛选替换的方式编写了一段程序。

（3）题目槽点

“生成后的文件要能被python解释器正确执行”，这句话我一直难以理解，python的大小写敏感是针对所有对象的，又不是单独只有关键字大小写敏感，众多函数，方法，库名等都需要考虑大小写，所以按照题目大小写转换后是否能执行，是必然不能的。（如果我的吐槽有误，请再评论区指出，谢谢大佬的点拨）

2答案代码

（1）1.py 内容

1.py是一个isPrime函数

from math import sqrt
def isPrime(a):
    try:
        if type(1) == type(a):
            if a == 1:
                return False
            for i in range(2, int(sqrt(a) + 1)):
                if a % i == 0:
                    return False
            return True

    except:
        print("异常，原因：a不是整数")

a = eval(input("请输入a的值："))
print(isPrime(a))

（2）转python源文件为大写字母.py 内容

def replace_key():
    f = open('1.py', 'r', encoding="utf-8")#获取str1,即1.py全部内容
    str1 = f.read()
    f.close()
    keywords = ['False', 'class', 'from', 'or', 'None', 'continue', 'global', 'pass', 'True', 'def', 'if', 'raise',
                'and', 'del', 'import', 'return', 'as', 'elif', 'in', 'try', 'assert', 'else', 'is', 'while', 'async',
                'except', 'lambda', 'with', 'await', 'finally', 'nonlocal', 'yield', 'break', 'for', 'not']
    for i in range(35):
        lenl = len(keywords[i])
        location = str1.find(keywords[i].upper())#查找
        if location != -1 and str1[lenl + location] in [' ','\n',':']:#判断出现的keywords,以及排除单词内部出现包含keyword可能，当然，有可能出现未考虑到的情况
            str1 = str1.replace(keywords[i].upper(), keywords[i],1) #必须替换一次，仅仅替换查找正确的keywords,否则会出现的词内替换现象
    f = open('1.py', 'w', encoding="utf-8")#将以处理1次的str1覆盖写入1.py
    f.write(str1)
    f.close()
    return 0

if __name__=='__main__':#不使用__name__也可以
    #以下步骤除备注外，仅将全文本大写
    f = open('1.py', 'r', encoding="utf-8")
    str1 = f.read()
    str1=str1.upper()
    f.seek(0)                    #获取行数，确保replace_key函数数能够将所有keywords都替换
    lines = len(f.readlines())
    f.close()

    f = open('1.py', 'w', encoding="utf-8")
    lsz = f.write(str1)
    f.close()
    for i in range(lines):
        replace_key()

（3）运行结果

在这里插入图片描述

3为什么分享

这道题目的其他方法本人不是很满意，但再这里澄清一句，主流方法依旧具有可学习得地方。

二，简易web爬取

1题目描述

（1）题目

找一个网站，尝试编程提取一些自己感兴趣得东西出来。

（2）思路

自顶向下，自底向上。听起来很高大上，其实就是设计函数进行字符串提取。选择一个网页，提取html中出现过的所有支持http的域名

2答案代码

（1）爬取的网站及html代码

额，html代码600多行就不在这里列出来了，爬取的是一位大佬的博客。
域名：https://ca01h.top/

（2）herfextract.py内容

def getHTMLlines(htmlpath):
    f=open(htmlpath,'r',encoding='utf-8')
    ls=f.readlines()
    f.close()
    return ls


def extractImageUrls(htmlist):
    urls=[]
    for line in htmlist:
        if 'href=' in line:
            url=line.split('href=')[-1].split('"')[1]
            if 'http' in url:
                urls.append(url)
    return urls


def showResluts(urls):
    count=0
    for url in urls:
        print('第{:2}个url:{}'.format(count,url))
        count+=1


def saveResults(filepath,urls):
    f=open(filepath,'w')
    for url in urls:
        f.write(url+'\n')
    f.close()


def main():
    inputfile="ca01h.top.html"
    ouputfile="ca01h.top-urls.html"
    htmllines=getHTMLlines(inputfile)
    imageUrls=extractImageUrls(htmllines)
    showResluts(imageUrls)
    saveResults(ouputfile,imageUrls)


main()