Python爬虫实现抓取腾讯视频所有电影-源码【实战必学】

最新推荐文章于 2024-08-22 17:32:24 发布

学术严谨

最新推荐文章于 2024-08-22 17:32:24 发布

阅读量1.1w

点赞数 8

文章标签： python web开发数据挖掘

本文链接：https://blog.csdn.net/RRRJ97699/article/details/105939877

版权

本文介绍了如何使用Python编写爬虫，详细解析了代码，实现了抓取腾讯视频平台的所有电影资源，是学习Python爬虫和数据挖掘的实战教程。

摘要由CSDN通过智能技术生成

用python实现的抓取腾讯视频所有电影的爬虫

1.  # -*- coding: utf-8 -*-

2.  import re

3.  import urllib2

4.  from bs4 import BeautifulSoup

5.  import string, time

6.  import pymongo

8.  NUM = 0 #全局变量,电影数量

9.  m_type = u'' #全局变量,电影类型

10.  m_site = u'qq' #全局变量,电影网站

12.  #根据指定的URL获取网页内容

13.  def gethtml(url):

14.  req = urllib2.Request(url)

15.  response = urllib2.urlopen(req)

16.  html = response.read()

17.  return html

18.  '''

19.  在学习过程中有什么不懂得可以加我的python学习交流扣扣qun，784758214，群里有不错的学习教程与开发工具。

20.  '''

22.  #从电影分类列表页面获取电影分类

23.  def gettags(html):

24.  global m_type

25.  soup = BeautifulSoup(html) #过滤出分类内容

26.  #print soup

27.  #<ul class="clearfix _group" gname="mi_type" gtype="1">

28.  tags_all = soup.find_all('ul', {'class