python的爬虫【1】抓取链接+标题

最新推荐文章于 2020-12-10 14:01:24 发布

鹿鸣悠悠

最新推荐文章于 2020-12-10 14:01:24 发布

阅读量6.4k

点赞数 5

分类专栏： python 文章标签：爬虫地址标题链接

本文链接：https://blog.csdn.net/weixin_41665637/article/details/90637175

版权

python 专栏收录该内容

148 篇文章 4 订阅

订阅专栏

前言：
对链接爬虫：
1、抓取它的图片地址
2、抓取标题

针对地址：https://bh.sb/post/44622
脚本：

import requests
from lxml import etree
r=requests.get('https://bh.sb/post/44622/').content

topic=etree.HTML(r)
html=topic.xpath('/html/body/section/div/div/article/p/text()')
img=topic.xpath('//article/p/a/img/@src')
# url=topic.xpath('/html/body/section/div/div/article/')

# print(url)

for x in html:
    print(x,end="")

for i in img:
    print(i)

# print(html)
# print(img)
# print(html[1],img[1])

# print(r)

1、xpath的取法：
点击标题：谷歌浏览器-》右键-》检查-》copy-》xpath
在这里插入图片描述

2、怎么把数组转换成正常字符-》菜鸟教程去查看
在这里插入图片描述
3、小节

在这里插入图片描述

这样的爬虫不适合我这种小白同志，我就希望给我一个简单模板，我直接用就好。

4、介绍小白可以学会的爬虫

4.1装chrome第三方库插件：XPath Helper

下载地址：https://chrome.google.com/webstore/detail/xpath-helper/hgimnogjllphhhkhlmebbmlgjoejdpjl
使用链接地址：https://zhaoolee.com/ChromeAppHeroes/page/015_xpath_helper.html

1、被爬地址：https://cuiqingcai.com/category/technique/python
2、操作办法
第一步：
在这里插入图片描述

第二步：调代码
在这里插入图片描述
第三步：调出1条出来

第四步：调出多条出来

结果：

4.2贴代码

备注：（requests库和etree库提前安装，安装地址：https://blog.csdn.net/weixin_41665637/article/details/99292935）
代码模板：

import requests
from lxml import etree
r=requests.get('https://cuiqingcai.com/category/technique/python').content
topic=etree.HTML(r)
title=topic.xpath('/html/body/section/div[2]/div/article/header/h2/a/text()')
img=topic.xpath('/html/body/section/div[2]/div/article/div/a/img/@src')
#
# for x in html:
#     print(x,end="")

for i in img:
    print(i)