python根据id取值_python根据ID获取HTML input框中value的值和python格式化URL

最新推荐文章于 2022-07-08 14:07:07 发布

big maom~~

最新推荐文章于 2022-07-08 14:07:07 发布

阅读量2.6k

点赞数

文章标签： python根据id取值

本文链接：https://blog.csdn.net/weixin_28972355/article/details/113983798

版权

本文介绍了如何使用Python通过正则表达式(re)和BeautifulSoup库分别从HTML中获取指定id的input元素的value值，对比了两者在实现上的优劣，并演示了BeautifulSoup在解析和提取值方面的便捷性。

摘要由CSDN通过智能技术生成

1. python 获取指定id的input 中value的值 1.1 引入正则 import re

注意，此时要安装re linux : sudo pip install re windows: pip install re ## re可以换成任何模块比如，BeautifulSoup 1.1 核心 # content 是HTML内容，id_name 是指定input框的id def get_id_tag(content, id_name): id_name = id_name.strip() patt_id_tag = """]*id=['"]?""" + id_name + """['" ][^>]*>""" id_tag = re.findall(patt_id_tag, content, re.DOTALL|re.IGNORECASE) if id_tag: id_tag = id_tag[0] return id_tag

1.3使用 response2 = sess.post(url=urlFriend, data=formdata, headers=headers) # 注意这个url指向的是一个页面而不是JSON数据或者其它 idValue =get_id_tag(response2.content,"XXXXID") print(idValue)

1.4 结果 # idValue

但是发现这个只是把input框元素给提取出来了，并没有打印出value的值，如果用这个方案，还需要提取字符串比较麻烦，我们不妨换一种方案，直接获取value的值 2.1 引入 from bs4 import BeautifulSoup 2.2 核心 #创建 Beautiful Soup 对象 soup = BeautifulSoup(response2.content,"html.parser") #print(soup.prettify()) idVal = soup.prettify().find_all(id="web_csrf_token")[0]['value'] print(idValue)

2.3注意 html.parser 一定要写上，不然会报错，错误如下 # soup.prettify() 格式化输出 soup 对象的内容 find_all 一定要带【0】因为查询的是全部，而你只取其中一个 UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 90 of the file weiboyi.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.

2.4 结果 # idValue 5c10810eba673

python 替换链接

我们的链接如下 {"url":"http:\/\/img.XXX.com\/images\/captcha\/812c30d5756d153a1be1971d9208e970.png"}

我们想要的结果： http://img.XXX.com/images/captcha/812c30d5756d153a1be1971d9208e970.png

代码： #content 是传入的URL def tiquPng(content): pattern = re.compile(r"http:.*.png") result = pattern.findall(content) return result[0].replace("\\", "").replace(" ", "")