python爬虫元素和源码不同_python爬虫,爬出来和源码不同

求教,爬移民家园的网站,爬不到有效内容,这是为什么,怎么才能爬到具体的帖子内容?(附图是用下面的代码爬下来的内容)

import urllib.request

url = "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost"

headers = {

"User-Agent": "Mozilla/5.0(Windows NT 6.1; Win64; x64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36",

"Referer": "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost"

}

req = urllib.request.Request(url=url, headers=headers)

response = urllib.request.urlopen(req)

html = response.read().decode("utf-8")

print(html)

89e5958c0b5de84e456472cb369a3cd2.png

回答

需要带上cookie才有数据,用一个seesion访问2次就行了

import requests

url = "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost"

headers = {

"User-Agent": "Mozilla/5.0(Windows NT 6.1; Win64; x64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36",

"Referer": "https://www.yiminjiayuan.com/forum.php?mod=forumdisplay&fid=189&filter=lastpost&orderby=lastpost",

#"Cookie": "agZD_b1dd_saltkey=s88c1OTO; agZD_b1dd_lastrequest=da9fBUNoIWsWCDoenEkJt1v2UMl1NFvuWruxtrWGzzWv%2FGdOzvGY",

}

s = requests.session()

content = s.get(url=url, headers=headers).content

content = s.get(url=url, headers=headers).content

print content.decode('gbk','ignore')

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值