用python实现网络爬虫代码_第一个网络爬虫-抓取CodeSnippet代码片段

最新推荐文章于 2021-02-12 09:22:24 发布

weixin_39962675

最新推荐文章于 2021-02-12 09:22:24 发布

阅读量89

点赞数

文章标签：用python实现网络爬虫代码

CodeSnippet 抓取代码片段

目标

分析

代码

- 发布代码片段
- 片段列表
一个线程如果是个人英雄主义，那么多线程就是集体主义，你不再是一个独行侠，而是一个指挥家。
共有 {15106} 个代码片段
京ICP备13038605号

我们想要抓取的内容在为 li class="con-code bbor" 所以 BeautifulSoup find()方法获取到该标签然后获取其文本内容

准备

准备我们爬虫比用的两个模块

from urllib2 import urlopen

from bs4 import BeautifulSoup

编写抓取代码

# 抓取http://www.codesnippet.cn/index.html 中的代码片段

def GrapIndex():

html = "http://www.codesnippet.cn/index.html"

bsObj = BeautifulSoup(urlopen(html), 'html.parser')

return bsObj.find("li", {"class":"con-code bbor"}).get_text()

当我们抓取到我们想要的数据之后接下来要做的就是把数据写到数据库里，由于我们现在抓取数据简单，所以只写文件即可！

def SaveResult():

codeFile=open("code.txt", "a") # 追加

for list in GrapIndex():

codeFile.write(list)

codeFile.close()

当我们在写文件的时候出现了以下错误，而下面这个错误的造成原因则是由于python2.7是基于ascii去处理字符流，当字符流不属于ascii范围内，就会抛出异常(ordinal not in range(128))

UnicodeEncodeError: 'ascii' codec can't encode character u'u751f' in position 0: ordinal not in range(128)

分析

python2.7是基于ascii去处理字符流，当字符流不属于ascii范围内，就会抛出异常(ordinal not in range(128))

解决办法

import sys

reload(sys)

sys.setdefaultencoding('utf-8')

完整代码展示

from urllib2 import urlopen

from bs4 import BeautifulSoup

import os

import sys

reload(sys)

sys.setdefaultencoding('utf-8')

def GrapIndex():

html = "http://www.codesnippet.cn/index.html"

bsObj = BeautifulSoup(urlopen(html), 'html.parser')

return bsObj.find("li", {"class":"con-code bbor"}).get_text()

def SaveResult():

codeFile=open("code.txt", "a")

for list in GrapIndex():

codeFile.write(list)

codeFile.close()

if __name__ == '__main__':

for i in range(0,9):

SaveResult()

weixin_39962675

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用python实现网络爬虫代码_第一个网络爬虫-抓取CodeSnippet代码片段

CodeSnippet 抓取代码片段目标分析代码发布代码片段片段列表一个线程如果是个人英雄主义，那么多线程就是集体主义，你不再是一个独行侠，而是一个指挥家。共有 {15106} 个代码片段京ICP备13038605号我们想要抓取的内容在为 li class="con-code bbor" 所以 BeautifulSoup find()方法获取到该标签然后获取其文本内容准备准备我们爬虫比用的两个模块...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。