02_Python简单爬虫(熊猫直播LOL的up主,谁最强!)

 
  

 声明:

 本文仅用于Python练手,并无任何恶意攻击行为!


#
导入request模块 from urllib import request # 导入re模块 import re class Spider(): # url以http, https开头 url_to_run = r'https://www.panda.tv/cate/lol' # 待抓取网页,熊猫直播平台-LOL分类(抓取主播名,视频观看人数) htmls = None # 保存抓取到的HTML内容 root_pattern = '<div class="video-info">(.*?)</div>' # 非贪婪匹配,匹配到最近的一个</div>,包含主播名,视频观看人数这两个tag的上一级tag name_pattern = '</i>(.*?)</span>' # 非贪婪匹配,匹配到举例</i>最近的1个</span>,找到该视频的主播名 number_pattern = '<span class="video-number">(.*?)</span>'# 非贪婪匹配,匹配到举例最近的1个</span>, 找到该主播视频的观看人数 result_list = [] # 存储最后的分析结果,每个元素为{'name':主播名, 'number':视频观看数}} @classmethod def fetch_content(cls): """ 模拟浏览器,向服务器发送获取特定页面的请求 将返回的HTML页面,字符串形式保存到Spider.htmls :return: None """ #request模块下的urlopen方法, 将web服务器返回的结果封装为1个file-like object,本质Response实例 result = request.urlopen(cls.url_to_run) # result操作 #print(result.getcode()) # HTTP返回码,200则正常获取到页面 #print(result.geturl()) # 实际获取的URL,判定页面是否有重定向 cls.htmls = result.read() # 实际的HTML页面内容, bytes类型 cls.htmls = str(cls.htmls, encoding='utf-8') # 将byte类型的HTML页面内容,转换为str字符串 @classmethod def analysis(cls): """ 根据Spider.htmls中保存的HTML页面,进行分析 1)主播名 2)视频观看次数 将每个主播和视频的观看次数,组成1个dict, 添加到cls.result_list :return: None """ # root_pattern中做了group, 返回结果中已经没有外部video-info标签 video_info_lst = re.findall(cls.root_pattern, cls.htmls, flags=re.S) for video in video_info_lst: up_host = re.findall(cls.name_pattern, video, flags=re.S) video_number = re.findall(cls.number_pattern, video, flags=re.S) # 对up_host内容格式进行调整: 丢弃第二个\n, 将第一个的\n开头和两边的空白字符去除 up_host = up_host[0] up_host = up_host.strip('\n') up_host = up_host.strip(' ') # 对video_number内容格式进行调整, 将vidoe_number从list中取出 video_number = video_number[0] # 主播名,观看数,组成字典,添加到结果列表 dic = {'name':up_host, 'number':video_number} cls.result_list.append(dic) @classmethod def sort_seed(cls, item): """ result_list中的元素是dict, 不能对dict直接做大小比较 指定将dict中的number作为key, 进行不同dict间的比较依据 sorted比较,传入要比较的ict, sort_seed返回dict中的number, 作为比较依据 :return: item['number'] 作为比较依据 """ r = re.findall('\d+', item['number']) number = float(r[0]) # 处理“万”级别用户换算 if '' in item['number']: number *= 10000 return number @classmethod def sort_result(cls): """ 将cls.result_list中的元素,按照观看人数进行排序 :return: """ # sorted(iterable, key = None, reverse = False) cls.result_list = sorted(cls.result_list, key=cls.sort_seed, reverse=True) @classmethod def show(cls): print("Total Uphost: " + str(len(cls.result_list))) print('='*45) for item in cls.result_list: print('Uphost:'+ item['name'] + " ," + "Rank: " + str(cls.result_list.index(item) + 1) + ' Video Watched: ' + item['number'] ) @classmethod def go(cls): cls.fetch_content() cls.analysis() cls.sort_result() cls.show() # 类测试代码 Spider.go()

 

部分实际测试结果:

Total Uphost: 118
=============================================
Uphost:即将拥有人鱼线的PDD ,Rank: 1 Video Watched: 283.7万
Uphost:RNG丶MLXG ,Rank: 2 Video Watched: 23.5万
Uphost:熊猫伏念 ,Rank: 3 Video Watched: 9.7万
Uphost:药水哥s ,Rank: 4 Video Watched: 9.3万
Uphost:WE丶Mystic丶 ,Rank: 5 Video Watched: 8.0万
Uphost:叫我官人 ,Rank: 6 Video Watched: 5.5万
Uphost:冠军锐雯 ,Rank: 7 Video Watched: 4.5万
Uphost:熊猫丶蛮神 ,Rank: 8 Video Watched: 2.3万
Uphost:起飛的辛德浪 ,Rank: 9 Video Watched: 1.6万
Uphost:善言_ ,Rank: 10 Video Watched: 1.9万
Uphost:左手QAQ ,Rank: 11 Video Watched: 1.3万
Uphost:S7全球总决赛 ,Rank: 12 Video Watched: 1.2万
Uphost:Pino一米八 ,Rank: 13 Video Watched: 1.2万
Uphost:金三炮o金三岁 ,Rank: 14 Video Watched: 9494
Uphost:挽神z ,Rank: 15 Video Watched: 7025
Uphost:易小埋l ,Rank: 16 Video Watched: 6228
Uphost:主播毕老实 ,Rank: 17 Video Watched: 5941
Uphost:一剑西来QAQ ,Rank: 18 Video Watched: 5897
Uphost:英雄联盟活动直播间 ,Rank: 19 Video Watched: 4239
Uphost:超级提莫丶牛腩君 ,Rank: 20 Video Watched: 4125
Uphost:mid六安王 ,Rank: 21 Video Watched: 3555
Uphost:熊猫丶乐鱼阿卡丽 ,Rank: 22 Video Watched: 3184
Uphost:熊猫TV一休哥 ,Rank: 23 Video Watched: 3120
Uphost:小黑胖砸 ,Rank: 24 Video Watched: 2415
Uphost:或许这就是离岛吧 ,Rank: 25 Video Watched: 2341
Uphost:第一最寂寞1u ,Rank: 26 Video Watched: 2203
Uphost:李阿特 ,Rank: 27 Video Watched: 2081
Uphost:LOL日常活动直播间 ,Rank: 28 Video Watched: 2028
Uphost:LPL熊猫官方直播 ,Rank: 29 Video Watched: 2003
Uphost:熊猫TV丶小青龙 ,Rank: 30 Video Watched: 1996
Uphost:熊猫TV灬小豆豆 ,Rank: 31 Video Watched: 1957
Uphost:小啊雅大大大 ,Rank: 32 Video Watched: 1613
Uphost:小凯南zz ,Rank: 33 Video Watched: 1483
Uphost:拿铁不加糖 ,Rank: 34 Video Watched: 1401
Uphost:金克喵的猫珥朵丶 ,Rank: 35 Video Watched: 1351
Uphost:炽天使z1 ,Rank: 36 Video Watched: 1164
Uphost:小小小女人丶 ,Rank: 37 Video Watched: 1111
Uphost:東東東 ,Rank: 38 Video Watched: 1081
Uphost:纯纯小流_氓 ,Rank: 39 Video Watched: 1077
Uphost:熊猫tv芭比公主 ,Rank: 40 Video Watched: 1070
Uphost:big火鸡 ,Rank: 41 Video Watched: 979
Uphost:机器猫mmm ,Rank: 42 Video Watched: 944
Uphost:大家都叫我冷爷丶 ,Rank: 43 Video Watched: 915
Uphost:栗子菌i ,Rank: 44 Video Watched: 879
Uphost:星矢魔术 ,Rank: 45 Video Watched: 845
Uphost:唐人leo ,Rank: 46 Video Watched: 842
Uphost:十级浪 ,Rank: 47 Video Watched: 829
Uphost:筱兮QAQ ,Rank: 48 Video Watched: 829
Uphost:酥软迷妹小慢慢Zz ,Rank: 49 Video Watched: 817
Uphost:小凡Aaaaaa ,Rank: 50 Video Watched: 804
Uphost:小丸子爱吃樱桃丶 ,Rank: 51 Video Watched: 803
Uphost:爱流血的兔斯基 ,Rank: 52 Video Watched: 803
Uphost:凶残的喵绵绵 ,Rank: 53 Video Watched: 800
Uphost:别叫凯隐叫隐神 ,Rank: 54 Video Watched: 799
Uphost:Panda初心2018 ,Rank: 55 Video Watched: 793
Uphost:熊猫丶大风6 ,Rank: 56 Video Watched: 792
Uphost:顽皮ssssssssssss ,Rank: 57 Video Watched: 790
Uphost:大表哥响尾蛇 ,Rank: 58 Video Watched: 789
Uphost:告白White ,Rank: 59 Video Watched: 788
Uphost:牌面之王丶火影劫 ,Rank: 60 Video Watched: 775
Uphost:西湖仙境 ,Rank: 61 Video Watched: 775
Uphost:飞不起来1 ,Rank: 62 Video Watched: 774
Uphost:逗了个蛋 ,Rank: 63 Video Watched: 773
Uphost:瓜皮球球 ,Rank: 64 Video Watched: 770
Uphost:竹蜻蜓呀 ,Rank: 65 Video Watched: 761
Uphost:少年阿超和阿斌 ,Rank: 66 Video Watched: 760
Uphost:刚出土的i帕帕 ,Rank: 67 Video Watched: 753
Uphost:小主播安旭 ,Rank: 68 Video Watched: 747
Uphost:西决哟 ,Rank: 69 Video Watched: 737
Uphost:Panda丶夏木 ,Rank: 70 Video Watched: 733
Uphost:冰雪丶狐狸 ,Rank: 71 Video Watched: 730
Uphost:夜魅丝 ,Rank: 72 Video Watched: 730
Uphost:熊猫丶皮皮瓜 ,Rank: 73 Video Watched: 725
Uphost:Panda灬刀刀 ,Rank: 74 Video Watched: 721
Uphost:莫莫莫夏夏夏 ,Rank: 75 Video Watched: 694
Uphost:皮皮翔i ,Rank: 76 Video Watched: 646
Uphost:南表妹QAQ ,Rank: 77 Video Watched: 644
Uphost:青蛙OB ,Rank: 78 Video Watched: 633
Uphost:_Infi_ ,Rank: 79 Video Watched: 631
Uphost:暴躁茹阿姨 ,Rank: 80 Video Watched: 627
Uphost:整天打碟的DJ胖丶 ,Rank: 81 Video Watched: 625
Uphost:熊猫丶一百 ,Rank: 82 Video Watched: 623
Uphost:全蛋狮子喵 ,Rank: 83 Video Watched: 622
Uphost:熊猫TV丶小66 ,Rank: 84 Video Watched: 620
Uphost:电竞张全蛋长长 ,Rank: 85 Video Watched: 596
Uphost:熊猫第一不亏哥 ,Rank: 86 Video Watched: 536
Uphost:叫我东邪 ,Rank: 87 Video Watched: 513
Uphost:熊猫TV丶一手绝 ,Rank: 88 Video Watched: 499
Uphost:熊猫TV丶别勉强 ,Rank: 89 Video Watched: 485
Uphost:提莫的小女朋友 ,Rank: 90 Video Watched: 480
Uphost:王者蕾 ,Rank: 91 Video Watched: 471
Uphost:日暮哟 ,Rank: 92 Video Watched: 470
Uphost:颖妹er超甜的 ,Rank: 93 Video Watched: 464
Uphost:熊猫TV丶成小七 ,Rank: 94 Video Watched: 441
Uphost:熊猫tv丶马小越 ,Rank: 95 Video Watched: 405
Uphost:柒柒天 ,Rank: 96 Video Watched: 397
Uphost:Panda电竞白子画 ,Rank: 97 Video Watched: 395
Uphost:熊猫TV_苏璞 ,Rank: 98 Video Watched: 388
Uphost:你的小老虎哥哥 ,Rank: 99 Video Watched: 362
Uphost:门徒zzzz ,Rank: 100 Video Watched: 359
Uphost:李易钧 ,Rank: 101 Video Watched: 352
Uphost:熊猫TV丶农药术士 ,Rank: 102 Video Watched: 346
Uphost:熊猫贝乐 ,Rank: 103 Video Watched: 320
Uphost:李小青盲僧 ,Rank: 104 Video Watched: 309
Uphost:刘慕宸 ,Rank: 105 Video Watched: 307
Uphost:寒风强袭 ,Rank: 106 Video Watched: 305
Uphost:会蛙泳的饼干0 ,Rank: 107 Video Watched: 300
Uphost:阿四德莱文丶 ,Rank: 108 Video Watched: 275
Uphost:知道神龙摆尾吗 ,Rank: 109 Video Watched: 275
Uphost:瓦罗兰的未来丶尨 ,Rank: 110 Video Watched: 260
Uphost:JO丶欣欣 ,Rank: 111 Video Watched: 253
Uphost:123ivan456 ,Rank: 112 Video Watched: 250
Uphost:only丶提莫 ,Rank: 113 Video Watched: 240
Uphost:情话好听但不暖心 ,Rank: 114 Video Watched: 230
Uphost:小丸子真好吃 ,Rank: 115 Video Watched: 218
Uphost:一只提莫送你回家 ,Rank: 116 Video Watched: 213
Uphost:请叫我大腿岩丶 ,Rank: 117 Video Watched: 188
Uphost:伊人芳泽瑞尔心i ,Rank: 118 Video Watched: 183

 

转载于:https://www.cnblogs.com/shay-zhangjin/p/7863539.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值