- 博客(14)
- 收藏
- 关注
原创 BeautifulSoup练习-豆瓣活动
from bs4 import BeautifulSoupimport requestsimport mysql_testdef space_strip(tag, css): r = tag.select(css)[0].text.replace('\n', '').strip() return rurl = 'https://beijing.douban.c...
2018-08-23 19:35:37 130
原创 BeautifulSoup练习-雪球问答
import requestsfrom bs4 import BeautifulSoupurl = 'https://xueqiu.com/ask/square'headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome...
2018-08-23 19:28:59 192
原创 多进程喜马拉雅音乐爬取
import requestsfrom lxml import etreeimport osimport reimport timeimport multiprocessingfrom multiprocessing import Pool, Queuedef download(audio_url_title): (audio_url, title) = audio_u...
2018-08-23 19:27:45 359
原创 腾讯招聘信息爬取
import requests, re, timefrom lxml import etreefrom urllib import parsefrom mysql_test import mysql_connectdef txzhaopin(num): for page in range(num): page*=10 mc = mysql_c...
2018-08-20 08:26:08 216
原创 电影天堂信息爬取
import requestsimport re, os, timedef dytt(num): for page in range(1,num+1): url = 'http://www.dytt8.net/html/gndy/dyzz/list_23_%d.html' % page print(url) response = r...
2018-08-20 08:24:27 544
原创 我爱我家租房信息
import requests, re, timefrom lxml import etreefrom urllib import parsefrom mysql_test import mysql_connectdef txzhaopin(num): for page in range(num): page*=10 mc = mysql_c...
2018-08-20 08:23:30 844
原创 爬取今日头条街拍图片
import requestsimport re, json, osdef ttjp(num): for offset in range(0,num*20+1,20): url = 'https://www.toutiao.com/search_content/?offset={}&format=json&keyword=%E8%A1%97%E6...
2018-08-16 22:09:19 210
原创 requests的使用
import requests# 直接定义urlurl = 'http://httpbin.org/get?key2=value2&key1=value1'r = requests.get(url)# 自定义参数# payload = {'key1': 'value1', 'key2': 'value2'}# r = requests.get('http://httpbin...
2018-08-15 22:47:54 117
原创 爬取雪球网房产新闻
xueqiu.pyimport requestsimport jsonimport mysql_testdef xueqiu_urllib(num): max_id = -1 count = 10 mc = mysql_test.mysql_connect() for i in range(num): url = 'https://xu...
2018-08-15 21:40:14 276
原创 有道翻译
import testimport timeimport randomimport hashlibimport jsondef my_md5(str): s = str.encode('utf-8') m = hashlib.md5() m.update(s) return m.hexdigest()def youdaofanyi(kw): ...
2018-08-14 22:31:40 162
原创 通过用户名密码登陆人人网
from urllib import request, parseimport jsonfrom http import cookiejar# 通过对象保存cookiecookie_object = cookiejar.CookieJar()# handler对应一个操作handler = request.HTTPCookieProcessor(cookie_object)# o...
2018-08-14 19:54:20 780
原创 用cookie登陆人人网
from urllib import requesturl = 'http://www.renren.com/967453172'headers = { # 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', # 'Accept...
2018-08-14 19:47:50 232
原创 tuozhan.py
from urllib import request, parsefrom urllib.error import HTTPError, URLErrorimport jsonfrom http import cookiejarclass session(): def __init__(self): cookie_object = cookiejar.Cook...
2018-08-13 21:27:08 129
原创 Session和Cookie的区别
Cookie和Session都是在用户访问网站时用来保存用户信息的,用户下次访问时,可以直接读取事先生成的Cookie和Session中数据,不需要用户再次输入,让用户访问网站更加快捷便利。区别是,Cookie数据保存在用户使用的浏览器上,Session数据保存在网站的服务器上。 ...
2018-08-13 20:49:13 122
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人