Python爬虫学习（一）----简单的爬虫实践

最新推荐文章于 2024-08-03 14:18:54 发布

梦想周游全国的孩子

最新推荐文章于 2024-08-03 14:18:54 发布

阅读量874

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/qq_37163479/article/details/79185221

版权

Python 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

爬虫

爬虫通俗的理解就是抓取你在网页上看到的信息。俗称爬爬爬嘛。所有网站皆可爬，第一次我们先做一下简单的爬虫尝试，让你初步对爬虫的原理或源代码有所熟悉。

平台和需求

我采用的平台是
MacOS
Pycharm 2016
python3.6

另外需要你对Python的基础语法有所了解，另外懂html的标签和css选择器。

基础爬虫代码实践

#!/usr/local/bin/python3
# -*- coding: UTF-8 -*-
__author__ = 'Gary'

# Python 简单爬虫(Python 3.6 MacOS)

import urllib.request

# 网址
url = "http://www.baidu.com"

# 请求
request = urllib.request.Request(url)

# 爬取结果
response = urllib.request.urlopen(request)

data = response.read()

#设置编码方式fp.write(str(response.geturl()))
data = data.decode('utf-8')

#打印结果
print(data)

#打印爬取网页的各类信息
# print(type(response))
# print(response.geturl())
# print(response.info())
# print(response.getcode())


fp = open('./baidu.txt','w')
fp.write(str(type(response)))
fp.close()
print('已经保存在本地')