接下来分析网页结构,并设计提取规则为
pattern = 'data-type="png" data-src="(.+?)"'
最后编写爬虫代码如下:
fromre importfindall
fromurllib.request importurlopen
url = 'https://mp.weixin.qq.com/s?__biz=MzI4MzM2MDgyMQ==&tempkey=OTMwX1pNbk5ETmVxTkkwdXpGaWo1RC1GZThwaHlzeHRRb2dfcjRFZmpFc2cyVDhBME82dl82dHVWdks5UDc2SFZtWTN3M2VQQ1BFalRpblpfZUFrdHpEbzBpUDR5OXZRS3N0VzE2WXp4Ym5iNWZmLXVMeDFBeThfZFpKa3VxNHpIT21hNnBTc244THRCQm1leTVSendVRk5zSnNIWldFaHUxRzRJaFU3OGd%2Bfg%3D%3D&chksm=6b8aad835cfd249522b213148affa25de442377adfb83afec75e3321fc6059ff26d2ddd11e04#rd'
withurlopen(url) asfp:
content = fp.read().decode()
pattern = 'data-type="png" data-src="(.+?)"'
result = findall(pattern, content)
for index, item inenumerate(result