爬虫
wzwz96
这个作者很懒,什么都没留下…
展开
-
[Python爬虫]1.豆瓣电影Top250
# 豆瓣电影Top250 import requestsfrom bs4 import BeautifulSoupfor page in range(10): page = page*25 url = "https://movie.douban.com/top250?start={}".format(page) response = requests.get(url).t原创 2017-06-24 21:32:46 · 580 阅读 · 0 评论 -
[Python爬虫]2.豆瓣图书Top250
# 豆瓣图书Top250import requestsfrom bs4 import BeautifulSoupfor page in range(10): url = 'https://book.douban.com/top250?start={}'.format(page*25) r = requests.get(url).text bsObj = BeautifulS原创 2017-06-24 21:34:03 · 458 阅读 · 0 评论 -
[Python爬虫]3.国家地理—每日一图
# v1.0版本import requestsfrom bs4 import BeautifulSoupimport reurl = 'http://www.nationalgeographic.com.cn/photography/photo_of_the_day/'r = requests.get(url)r.encoding = r.apparent_encodingbsObj =原创 2017-06-24 21:35:52 · 1198 阅读 · 1 评论 -
[Python爬虫]4.国家地理v2.0
# v2.0版本将代码以函数形式封装import requestsfrom bs4 import BeautifulSoupimport redef getHTMLContent(url): try: r = requests.get(url) r.raise_for_status() r.encoding = r.apparent_enc原创 2017-06-24 21:37:21 · 627 阅读 · 1 评论 -
[Python爬虫]5.国家地理v3.0
# v3.0进行了代码优化# 前面两个版本download的是首页的小图片# 本版本为高清大图import requestsfrom bs4 import BeautifulSoupimport redef getHTMLText(url): #获取页面 try: r = requests.get(url) r.raise_for_status(原创 2017-06-24 21:39:09 · 886 阅读 · 1 评论 -
Python爬虫数据存储MySQL【2】模拟登录网页
# -*- coding:utf-8 -*-import requestsloginUrl = '提交post信息的页面'afterUrl = '真正爬取信息的页面'header = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}values = {"name": "testname", "pas原创 2017-08-25 21:36:41 · 476 阅读 · 0 评论 -
操作 Python爬虫数据存储MySQL【3】爬取信息
# -*- coding: UTF-8 -*-# 爬虫from bs4 import BeautifulSoupbsObj = BeautifulSoup(response, 'html.parser')# 数组id匹配内容idList = ["MEMBER_NAME", "IDENTITY_CARD", "MEMBER_CODE", "BANKNUMBER", "ACCO原创 2017-08-26 02:03:42 · 1230 阅读 · 0 评论