1. python 获取指定id的input 中value的值 1.1 引入正则 import re
注意,此时要安装re linux : sudo pip install re windows: pip install re ## re可以换成任何模块比如,BeautifulSoup 1.1 核心 # content 是HTML内容,id_name 是指定input框的id def get_id_tag(content, id_name): id_name = id_name.strip() patt_id_tag = """]*id=['"]?""" + id_name + """['" ][^>]*>""" id_tag = re.findall(patt_id_tag, content, re.DOTALL|re.IGNORECASE) if id_tag: id_tag = id_tag[0] return id_tag
1.3使用 response2 = sess.post(url=urlFriend, data=formdata, headers=headers) # 注意这个url指向的是一个页面而不是JSON数据或者其它 idValue =get_id_tag(response2.content,"XXXXID") print(idValue)
1.4 结果 # idValue
但是发现这个只是把input框元素给提取出来了,并没有打印出value的值,如果用这个方案,还需要提取字符串比较麻烦,我们不妨换一种方案,直接获取value的值 2.1 引入 from bs4 import BeautifulSoup 2.2 核心 #创建 Beautiful Soup 对象 soup = BeautifulSoup(response2.content,"html.parser") #print(soup.prettify()) idVal = soup.prettify().find_all(id="web_csrf_token")[0]['value'] print(idValue)
2.3注意 html.parser 一定要写上,不然会报错,错误如下 # soup.prettify() 格式化输出 soup 对象的内容 find_all 一定要带【0】因为查询的是全部,而你只取其中一个 UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 90 of the file weiboyi.py. To get rid of this warning, pass the additional argument 'features="html.parser"' to the BeautifulSoup constructor.
2.4 结果 # idValue 5c10810eba673
python 替换链接
我们的链接如下 {"url":"http:\/\/img.XXX.com\/images\/captcha\/812c30d5756d153a1be1971d9208e970.png"}
我们想要的结果: http://img.XXX.com/images/captcha/812c30d5756d153a1be1971d9208e970.png
代码: #content 是传入的URL def tiquPng(content): pattern = re.compile(r"http:.*.png") result = pattern.findall(content) return result[0].replace("\\", "").replace(" ", "")