![](https://img-blog.csdnimg.cn/20201014180756925.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
爬虫
qestion_yz_10086
这个作者很懒,什么都没留下…
展开
-
python多线程爬虫
python多线程爬虫import requestsfrom urllib.parse import urlencodefrom multiprocessing.pool import ThreadPoolimport osfrom hashlib import md5def get_page(offset): url='https://www.toutiao.com/api/search/content/?aid=24&app_name=web_search&offs原创 2020-09-18 10:11:45 · 152 阅读 · 0 评论 -
requests.text和requests.content的区别
requests.text和requests.content的区别我们在利用requests库进行网络数据爬取时,通常遇到编码问题,在通过requests的get方法获取响应后,通常有response.text和response.content两种输出格式:response.content: 这个是直接从网络上面抓取的数据,没有经过任何解码,所以说是一个bytes类型。其实在硬盘上和网络上传输的字符串都是bytes类型。因此在利用response.content进行输出时: 我们可以利用 reso原创 2020-08-12 09:48:34 · 963 阅读 · 0 评论 -
和讯研报网爬虫
和讯研报网爬虫import requestsfrom selenium import webdriverimport pandas as pdimport reif __name__ == '__main__': url='http://yanbao.stock.hexun.com/ybsj5_{}.shtml' chrome_options=webdriver.ChromeOptions() chrome_options.add_argument('--headless原创 2020-07-14 11:36:14 · 922 阅读 · 1 评论 -
天天基金网爬虫
天天基金网爬虫import requestsimport osimport pandas as pdfrom pandas import DataFrame,Seriesimport jsonimport numpy as npimport timedef get_excel(url,path,page1,page2): Headers = { "Cookie": "waptgshowtime=2020714; qgqp_b_id=080aa9f400733f8391原创 2020-07-14 10:36:59 · 2403 阅读 · 1 评论 -
笔趣阁爬虫优化
笔趣阁爬虫优化import requestsimport timefrom bs4 import BeautifulSoupimport osfrom multiprocessing.dummy import Pool as ThreadPoolfrom multiprocessing import Poolfrom threading import Threadimport pandas as pdfrom pandas import DataFrame,Seriesimport nu原创 2020-06-03 14:10:25 · 253 阅读 · 0 评论 -
笔趣阁爬虫
笔趣阁爬虫import requestsimport timefrom bs4 import BeautifulSoupimport osdef Get_content(url): urls2=[] res=requests.get(url).content.decode('gbk') soup=BeautifulSoup(res,"html.parser") contents=soup.find_all("div",attrs={"class":"nav"})原创 2020-06-02 15:07:22 · 770 阅读 · 0 评论 -
爬取疫情数据
import numpy as npfrom pandas import Series, DataFrameimport osimport requestsfrom bs4 import BeautifulSoupdef get_urls(url): response = requests.get(url).text bs = BeautifulSoup(respo...原创 2020-05-07 17:40:11 · 835 阅读 · 0 评论