python爬虫进阶（初始）

最新推荐文章于 2024-02-29 00:15:00 发布

我yi癫狂

最新推荐文章于 2024-02-29 00:15:00 发布

阅读量324

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/weixin_43560272/article/details/90113902

版权

Python 专栏收录该内容

16 篇文章 1 订阅

订阅专栏

该内容主要是爬虫爬取图片以及html，属于库的基本内容，以后再在此基础上进阶更智能更全面的python代码
整体框架大致

目标：
下载图片
创建文件夹并在文件夹里加入东西
批量下载图片到文件夹里
筛选数据
批量筛选指定数据到文件夹里
将数据导入excle表格
将数据绘制成图表
在这里插入图片描述
1、爬图进阶

import urllib.request

response =urllib.request.urlopen('http://sc3.hao123img.com/data/fd5166d33dba874e15d4f8fb43be485d')
cat_img =response.read()

with open('cat_20_300.jpg','wb') as a:
    a.write(cat_img)

1+、爬取html内容


import urllib.request

response =urllib.request.urlopen('http://www.fishc.com')
html =response.read().decode("utf-8")
print(html)

1++、批量爬取图片到文件夹里

import urllib.request
import os

path='images'
os.path.exists(path)
os.makedirs(path)   
    #创建文件夹

for i in range(1,10):
    j=i*100
    # 网络上图片的地址
    img_src = 'http://placekitten.com/'+str(j)+'/'+str(j)
    # 将远程数据下载到本地，第二个参数就是要保存到本地的文件名
    urllib.request.urlretrieve(img_src,'E:/编程/python/网络爬图1/images/'+str(i)+'.jpg')

1+++、另一种批量爬取

import requests
from PIL import Image
from io import BytesIO

img_src = 'https://img-my.csdn.net/uploads/201212/25/1356422284_1112.jpg'
response = requests.get(img_src)
image = Image.open(BytesIO(response.content))
image.save('D:/9.jpg')

2、筛选标签

import urllib.request
import re                  #成功爬取当前页面所有图片地址

response =urllib.request.urlopen('http://pic.hao123.com/meinv')
html=response.read()
html=html.decode("utf-8")

par =r'<img src="(.*?)" alt="" style="width: 180px;"/>'
html=re.findall(par,html)

for each in html:
    print(each)

3、筛选正文

import urllib.request

html = urllib.request.urlopen("https://www.douban.com/").read().decode("utf-8")

# 整个html打印出来太多，这里我们就保存在文件中，再查看
of = open("E:/编程/python/网络爬图1/db_index.html","w")
of.write("dasdasdas")
of.close()

4、文档写入

f=open("E:/编程/python/网络爬图1/file.txt","w")
constant="i love you"
f.write(constant)
f.close()


#可以写入任何硬盘当中

5、文档读取

f=open("E:/编程/python/网络爬图1/file.txt","r")
constant = f.read()
print(constant)
f.close()

6、文档综合整理

import urllib.request

response =urllib.request.urlopen('http://www.fishc.com')
html =response.read().decode("utf-8")
print(html)

f=open("E:/编程/python/网络爬图1/file.html","w")
f.write(html)
f.close()

#将html文件保存到本地

7、创建文件夹

import os

path='D'
os.path.exists(path)
os.makedirs(path)

    #os.mkdir(path)确认是否创建成功

8、导入excle表格

import pandas as pd

j=pd.read_excel("E:/wps/账单/4月份账单.xlsx")

print(j)

9、将数据绘制成图

import matplotlib.pylab as pyl
import numpy as npy

x=[1,2,3,4,5]
y=[8,6,4,3,1]

pyl.plot(x,y)   #绘制成线
pyl.plot(x,y,'o')   #标出点
pyl.show()  #展示该图

9+、绘制直方图

import matplotlib.pylab as pyl
import numpy as npy

data=[8,6,9,413,49,45,41,6]
pyl.hist(data)
pyl.show()

我yi癫狂

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python爬虫进阶（初始）

该内容主要是爬虫爬取图片以及html，属于库的基本内容，以后再在此基础上进阶更智能更全面的python代码整体框架大致1、爬图进阶import urllib.requestresponse =urllib.request.urlopen('http://sc3.hao123img.com/data/fd5166d33dba874e15d4f8fb43be485d')cat_img =...
复制链接

扫一扫