![](https://img-blog.csdnimg.cn/20201014180756927.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
爬虫
dreams512
这个作者很懒,什么都没留下…
展开
-
爬取大学公告信息 beautifulsoup的使用
# -*-coding:utf-8-*-import reimport urllib2from bs4 import BeautifulSoupdef print_zh(key): s = "u'%s'" % key s = eval(s) print(s)keyList = [u'项目', u'交流']keyResult = []url = 'http://urp原创 2017-01-03 17:34:50 · 457 阅读 · 0 评论 -
Beautifulsoup4学习笔记
beautifulsoup4学习小记 pip安装pip install beautifulsoup4或easy_installeasy_install beautifulsoup4或源码安装python setup.py installBeautifulSoup除了内置HTML解析器,还支持一些第三方解析去,比如html5lib,lxml等,可以安装之后,在初始化BeautifulSoup对象的时原创 2016-12-20 11:54:52 · 4530 阅读 · 0 评论 -
豆瓣电影信息爬取并保存到excel
爬取地址: https://www.douban.com/doulist/3936288/?start=0 爬取豆瓣电影Top250,并将电影名称,导演,演员,时间等信息记录到excel中。import reimport openpyxlimport requestsfrom bs4 import BeautifulSoupclass Movie(object): def __in原创 2017-01-13 22:23:08 · 1375 阅读 · 0 评论 -
多线程爬取kx1d图片
# -*-coding:utf-8-*-import osimport shutilimport threadingimport lxml.htmlimport requestslist_href = []class Download(object): current_num = 0 def __init__(self, output, hf_list): s原创 2017-01-18 22:04:01 · 558 阅读 · 0 评论