教你爬取博客之星排行榜(selenium + requests )

要爬取的页面及链接

https://bss.csdn.net/m/topic/blog_star2020

在这里插入图片描述


第一种方法(selenium)

爬虫使用的是selenium,插件工具使用的是xpath_helper。



使用xpath定位元素

在这里插入图片描述



完整代码
from selenium import webdriver
driver = webdriver.Chrome()
driver.implicitly_wait(10)  # 等待页面元素加载完毕再开始爬取
url = "https://bss.csdn.net/m/topic/blog_star2020"
driver.get(url)
indexs = driver.find_elements_by_xpath('//*[@id="blogList"]/li/a/span')                 #博主投票序号
names = driver.find_elements_by_xpath('//*[@id="blogList"]/li/a/div[2]')                #博主名称
numbers = driver.find_elements_by_xpath('//*[@id="blogList"]/li/a/div[4]/p[2]/em')     	#博主票数
urls = driver.find_elements_by_xpath('//*[@id="blogList"]/li/a')                        #投票链接

data = []
for i in range(len(indexs)):
    d = {}
    d['index'] = indexs[i].text					# text取出文本内容
    d['name'] = names[i].text
    d['number'] = int(numbers[i].text)
    d['url'] = urls[i].get_attribute('href')	# 取出a标签的href链接
    data.append(d)

data = sorted(data,key=lambda x: x['number'],reverse=True)		 # 对票数进行排序

# 将tr td添加到每条记录
ii = 0
for i in dataSort:
    ii += 1
    print("<tr>")
    print("<td>{}</td>".format(ii))                                     # 名次
    print("<td>{}</td>".format(i['name']))                              # 博主
    print("<td>{}</td>".format(i['index']))                             # 投票序号
    print("<td>{}</td>".format(i['number']))                            # 得票数量
    print("<td><a href='{}'>{}</a></td>".format(i['url'],i['url']))     # 投票地址
    print("</tr>")
driver.close()		# 关闭浏览器

输出效果(内容较多 只复制前三)<tr>
<td>1</td>
<td>Hollis在csdn</td>
<td>070</td>
<td>3893</td>
<td><a href='https://bss.csdn.net/m/topic/blog_star2020/detail?username=hollis_chuang'>https://bss.csdn.net/m/topic/blog_star2020/detail?username=hollis_chuang</a></td>
</tr>
<tr>
<td>2</td>
<td>帅地</td>
<td>124</td>
<td>3454</td>
<td><a href='https://bss.csdn.net/m/topic/blog_star2020/detail?username=m0_37907797'>https://bss.csdn.net/m/topic/blog_star2020/detail?username=m0_37907797</a></td>
</tr>
<tr>
<td>3</td>
<td>敖 丙</td>
<td>014</td>
<td>3300</td>
<td><a href='https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_35190492'>https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_35190492</a></td>
</tr>



第二种方法(requests)

请求数据

点击查看大图
在这里插入图片描述


完整代码
import requests 	# 如果没有requests 包 直接pip install requests 安装

#  请求地址
url = 'https://bss.csdn.net/m/topic/blog_star2020/getUsers'		
#  请求头
headers = {'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Mobile Safari/537.36'}
#  要传的参数
data = {'number': ''}
response = requests.post(url,headers,data)
data = json.loads(response.text)['data']
data = sorted(data,key=lambda x:x['vote_num'],reverse=True)
dataCsdn=list()
j=0
for i in data:
    j+=1
    d = {}
    d['名次'] = j
    d['博主']=i['nick_name']
    d['票数']=i['vote_num']
    d['投票地址']=i['url']
    d['投票序号']=i['number']
    d['博客等级']=i['level']
    d['码龄']=i['codeLevel']
    d['原创']=i['brief']
    dataCsdn.append(d)
    

""" 解释 """
#  post请求
ret = requests.post(url,headers,data)
print(ret) # 输出:<Response [200]>(请求成功,返回200)

print(ret.text)
# 输出(内容太长,复制部分):
'\n{"code":200,"msg":"ok","data":[{"id":"3260","title":"qq_26525215","vote_num":2263,"url":"https:\\/\\/bss.csdn.net\\/m\\/topic\\/blog_star2020\\/detail?username=qq_26525215","img":"","brief":"166","class_id":"95","logs":true,"level":8,"codeLevel":6,"nick_name":"\\u8c19\\u5fc6","avatar":"https:\\/\\/profile.csdnimg.cn\\/F\\/7\\/6\\/1_qq_26525215","article_count":166,"nameWords":null,"number":"001"}]}

#  使用json.loads将数据转回原类型
print(json.loads(ret.text))
#  输出(得票字段vote_num, 博主字段nick_name, 原创文章字段brief,码龄字段codeLevel,投票地址字段url,序号字段number 等等):	
{'data': [{'article_count': 166,
   'avatar': 'https://profile.csdnimg.cn/F/7/6/1_qq_26525215',
   'brief': '166',
   'class_id': '95',
   'codeLevel': 6,
   'id': '3260',
   'img': '',
   'level': 8,
   'logs': True,
   'nameWords': None,
   'nick_name': '谙忆',
   'number': '001',
   'title': 'qq_26525215',
   'url': 'https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_26525215',
   'vote_num': 2263}]}





内容粘贴

在这里插入图片描述



排行榜

更新时间 1-25 投票已截至

排行博主序号票数投票
1敖 丙1210275https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_35190492
2谷哥的小弟617856https://bss.csdn.net/m/topic/blog_star2020/detail?username=lfdfhl
3帅地1326791https://bss.csdn.net/m/topic/blog_star2020/detail?username=m0_37907797
4沉默王二276293https://bss.csdn.net/m/topic/blog_star2020/detail?username=qing_gee
5Hollis在csdn646182https://bss.csdn.net/m/topic/blog_star2020/detail?username=hollis_chuang
6小傅哥1735962https://bss.csdn.net/m/topic/blog_star2020/detail?username=yao__shun__yu
7一个处女座的程序猿1805652https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_41185868
8李锐博恩945640https://bss.csdn.net/m/topic/blog_star2020/detail?username=reborn_lee
9小林coding1775572https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_34827674
10ThinkWon1435515https://bss.csdn.net/m/topic/blog_star2020/detail?username=thinkwon
11谙忆15283https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_26525215
12中间件兴趣圈1935215https://bss.csdn.net/m/topic/blog_star2020/detail?username=prestigeding
131_bit1825165https://bss.csdn.net/m/topic/blog_star2020/detail?username=a757291228
14qq26480087261174712https://bss.csdn.net/m/topic/blog_star2020/detail?username=u012325865
15Jack-Cui774670https://bss.csdn.net/m/topic/blog_star2020/detail?username=c406495762
16第三女神程忆难514615https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_40881680
17TrueDei1384612https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_17623363
18lovelife110894383https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_33873431
19LaoYuanPython934310https://bss.csdn.net/m/topic/blog_star2020/detail?username=laoyuanpython
20单片机菜鸟哥494242https://bss.csdn.net/m/topic/blog_star2020/detail?username=dpjcn1990
21程序猿小亮453728https://bss.csdn.net/m/topic/blog_star2020/detail?username=jiuqiyuliang
22柔若寒1213244https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_19734597
23小山猪的沙塔1602673https://bss.csdn.net/m/topic/blog_star2020/detail?username=u012039040
24ReCclay1202649https://bss.csdn.net/m/topic/blog_star2020/detail?username=recclay
25艺博东1832546https://bss.csdn.net/m/topic/blog_star2020/detail?username=hyd696
26JasonLee\'blog781771https://bss.csdn.net/m/topic/blog_star2020/detail?username=xianpanjia4616
27Alice菌71687https://bss.csdn.net/m/topic/blog_star2020/detail?username=weixin_44318830
28记得诚791343https://bss.csdn.net/m/topic/blog_star2020/detail?username=albert992
29公众号-JavaEdge601329https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_33589510
30小麦大叔1631286https://bss.csdn.net/m/topic/blog_star2020/detail?username=u010632165
31carl-zhao371171https://bss.csdn.net/m/topic/blog_star2020/detail?username=u012410733
32牧小农991158https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_14996421
33考古学家lx841084https://bss.csdn.net/m/topic/blog_star2020/detail?username=weixin_43582101
34riemann_1191051https://bss.csdn.net/m/topic/blog_star2020/detail?username=riemann_
35Engineer-Bruce_Yang541048https://bss.csdn.net/m/topic/blog_star2020/detail?username=morixinguan
36herosunly70981https://bss.csdn.net/m/topic/blog_star2020/detail?username=herosunly
37SoWhat1412129969https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_31821675
38许进进164963https://bss.csdn.net/m/topic/blog_star2020/detail?username=lucasxu01
39Data-Mining52947https://bss.csdn.net/m/topic/blog_star2020/detail?username=liuzehn
40刘炫32097926https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_35082030
41AI 菌6913https://bss.csdn.net/m/topic/blog_star2020/detail?username=wjinjie
42刘一哥GIS95906https://bss.csdn.net/m/topic/blog_star2020/detail?username=lucky51222
43梦想橡皮擦103889https://bss.csdn.net/m/topic/blog_star2020/detail?username=hihell
44cutercorley38887https://bss.csdn.net/m/topic/blog_star2020/detail?username=cufeecr
45webmote149864https://bss.csdn.net/m/topic/blog_star2020/detail?username=webmote
46Bubbliiiing23863https://bss.csdn.net/m/topic/blog_star2020/detail?username=weixin_44791964
47江南、董少76852https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_41453285
48xcbeyond178848https://bss.csdn.net/m/topic/blog_star2020/detail?username=xcbeyond
49xindoo166843https://bss.csdn.net/m/topic/blog_star2020/detail?username=xindoo
50象在舞161839https://bss.csdn.net/m/topic/blog_star2020/detail?username=gdkyxy2013
51ZhuJiangs191833https://bss.csdn.net/m/topic/blog_star2020/detail?username=haojiagou
52白玉梁16793https://bss.csdn.net/m/topic/blog_star2020/detail?username=baiyuliang2013
53ztenv196679https://bss.csdn.net/m/topic/blog_star2020/detail?username=lianshaohua
54xiangzhihong8171639https://bss.csdn.net/m/topic/blog_star2020/detail?username=xiangzhihong8
55源码兴趣圈187628https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_37781649
56一颗小树x181625https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_41204464
57刘早起88620https://bss.csdn.net/m/topic/blog_star2020/detail?username=weixin_41846769
58小王曾是少年157600https://bss.csdn.net/m/topic/blog_star2020/detail?username=hnu_csee_wjw
59雪松研究所165598https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_33487044
60恬静的小魔龙134585https://bss.csdn.net/m/topic/blog_star2020/detail?username=q764424567
61小小鱼儿小小林175583https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_27471405
62满天星._104560https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_32146369
63L-Java87556https://bss.csdn.net/m/topic/blog_star2020/detail?username=weixin_43767015
64段智华50540https://bss.csdn.net/m/topic/blog_star2020/detail?username=duan_zhihua
65三钻126540https://bss.csdn.net/m/topic/blog_star2020/detail?username=tridiamond6
66技术大咖秀73519https://bss.csdn.net/m/topic/blog_star2020/detail?username=shipfei_csdn
67王义凯_Rick153512https://bss.csdn.net/m/topic/blog_star2020/detail?username=wsdc0521
68花狗Fdog_65507https://bss.csdn.net/m/topic/blog_star2020/detail?username=fdog_
69灰小猿62485https://bss.csdn.net/m/topic/blog_star2020/detail?username=weixin_44985880
70Winter_world154483https://bss.csdn.net/m/topic/blog_star2020/detail?username=w464960660
71TRHX • 鲍勃140447https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_36759224
72善良勤劳勇敢而又聪明的老杨125445https://bss.csdn.net/m/topic/blog_star2020/detail?username=yy339452689
73_陈哈哈200439https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_39390545
74小宋是呢158419https://bss.csdn.net/m/topic/blog_star2020/detail?username=xiaosongshine
75nineheaded_bird109412https://bss.csdn.net/m/topic/blog_star2020/detail?username=tengweitw
76程序员cxuan34407https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_36894974
77向彪-blockchain174398https://bss.csdn.net/m/topic/blog_star2020/detail?username=ws327443752
78✎ℳ๓₯㎕...雲淡風輕2392https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_34361283
79Heartsuit66389https://bss.csdn.net/m/topic/blog_star2020/detail?username=u013810234
80Albert Yang9385https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_23853743
81阿华田51211376https://bss.csdn.net/m/topic/blog_star2020/detail?username=aa518189
82Trent1985142376https://bss.csdn.net/m/topic/blog_star2020/detail?username=trent1985
83科皮子菊86365https://bss.csdn.net/m/topic/blog_star2020/detail?username=meiqi0538
84_江南一点雨199363https://bss.csdn.net/m/topic/blog_star2020/detail?username=u012702547
85Charzous41362https://bss.csdn.net/m/topic/blog_star2020/detail?username=charzous
86bigbirdit21353https://bss.csdn.net/m/topic/blog_star2020/detail?username=zpcandzhj
87戴着眼镜看不清53328https://bss.csdn.net/m/topic/blog_star2020/detail?username=lyztyycode
88beyondma20315https://bss.csdn.net/m/topic/blog_star2020/detail?username=beyondma
89tyyj90137314https://bss.csdn.net/m/topic/blog_star2020/detail?username=tyyj90
90码农飞哥106313https://bss.csdn.net/m/topic/blog_star2020/detail?username=u014534808
91后端技术漫谈71301https://bss.csdn.net/m/topic/blog_star2020/detail?username=qqxx6661
92anlian52310298https://bss.csdn.net/m/topic/blog_star2020/detail?username=anlian523
93AlbertS5292https://bss.csdn.net/m/topic/blog_star2020/detail?username=shihengzhen101
94我是橙子va155286https://bss.csdn.net/m/topic/blog_star2020/detail?username=weixin_38239050
95程序员爱酸奶(QuellanAn)46283https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_27790011
96华为云68268https://bss.csdn.net/m/topic/blog_star2020/detail?username=devcloud
97Mr.郑先生_102266https://bss.csdn.net/m/topic/blog_star2020/detail?username=zbp_12138
98云 祁186263https://bss.csdn.net/m/topic/blog_star2020/detail?username=beiisbei
99半颗心脏19257https://bss.csdn.net/m/topic/blog_star2020/detail?username=xh870189248
100cv调包侠42249https://bss.csdn.net/m/topic/blog_star2020/detail?username=qq_46098574

进程完成,退出码 0



  • 4
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值