Python
Ar1ess
不识庐山真面目 只识弯弓射大雕
展开
-
Python简单爬虫1
爬取豆瓣小王子下的短评(初级版) #coding = 'utf-8' import re import requests page = 'https://book.douban.com/subject/1084336/comments/hot?p=1' url = requests.get(page).text #正则表达式 p1 = '(?<=<span class="s...原创 2019-01-07 15:41:54 · 133 阅读 · 0 评论 -
Python简单爬虫4
BeautifulSoup爬取京东畅销书排行榜并写入Mongodb数据库 import requests from bs4 import BeautifulSoup import pymongo as pm import os import json headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) A...原创 2019-01-12 15:57:53 · 159 阅读 · 0 评论 -
Python简单爬虫2
爬取豆瓣用户读过的书,根据网页url的变化自动翻页,不过好像被封IP了 import urllib.request import http.cookiejar import requests from bs4 import BeautifulSoup import re #保存文件位置 #filename = 'cookies.txt' #创建一个实例对象保存cookies #cooki...原创 2019-01-08 13:12:29 · 129 阅读 · 0 评论 -
Python简单爬虫5
Xpath爬取豆瓣电影排行榜250并写入文件 from lxml import etree import requests import os headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.9...原创 2019-01-14 13:02:55 · 136 阅读 · 0 评论 -
Python简单爬虫6
Xpath爬取哈尔滨所有公交车信息 以公交路线为集合名存入Mongodb数据库 from lxml import etree import requests import os import pymongo as py headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...原创 2019-01-15 19:01:41 · 321 阅读 · 0 评论 -
Python简单爬虫3
BeautifulSoup根据标签类型爬取微博热搜并写入文件 import requests from bs4 import BeautifulSoup headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0....原创 2019-01-10 11:32:17 · 168 阅读 · 0 评论