爬取菜鸟教程|菜鸟笔记,作为爬虫玩家,不想复制,但有需要,所以写来spider.

想要这些数据
在这里插入图片描述
代码可直接运行,但是要先装包,最后将数据放到excel表格中了
爬取连接为https://www.runoob.com/python/python-exceptions.html

import requests as re
import pandas as pd
import bs4
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',
'Cookie':'__guid=61023018.2220520734065574000.1561789449521.1106; _ga=GA1.2.691952474.1561789450; _gid=GA1.2.1913507903.1562568389; monitor_count=10; Hm_lvt_3eec0b7da6548cf07db3bc477ea905ee=1562730834,1562750530,1562752398,1562752508; Hm_lpvt_3eec0b7da6548cf07db3bc477ea905ee=1562752508',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.9',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Host':'www.runoob.com',
'Referer':'https://www.baidu.com/link?url=p9nOeNKSa-aZaI0Sf_fk9sYJ0nyIS0V4X3rdM2T2vxjObxbWIHy-Com3v5Nd3cR0eyuen9VK5yTiPoCiKdN7Oa&wd=&eqid=bf29f91c00044643000000025d25614c',
'Upgrade-Insecure-Requests':'1'

}
url1='https://www.runoob.com/python/python-exceptions.html'
a = re.get(url=url1).content.decode('utf-8')
# print(a)
html = bs4.BeautifulSoup(a,'lxml')
s =html.table.find_all('td')
lists = []
for i in s:
    for j in i:
        f=j.replace('\r\n',"")
        lists.append(f)
a1 =[]
a2 = []
for i in range(len(lists)):
    if i%2==0:
        a1.append(lists[i])
    else:
        a2.append(lists[i])
aa2 = pd.DataFrame(a2,a1)

aa2.to_excel(excel_writer=r'2.xlsx')
  • 3
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值