可爱的python课后习题（三）

最新推荐文章于 2022-12-12 22:35:48 发布

辉蛋儿

最新推荐文章于 2022-12-12 22:35:48 发布

阅读量1.1k

点赞数

分类专栏：可爱的python学习总结 VIM python 文章标签： python blog exception encoding download import

本文链接：https://blog.csdn.net/chen861201/article/details/7714477

版权

python 同时被 3 个专栏收录

13 篇文章 0 订阅

订阅专栏

VIM

10 篇文章 0 订阅

订阅专栏

可爱的python学习总结

7 篇文章 0 订阅

订阅专栏

1，判定某个blog的编码方式：

#!/usr/bin/python
#coding=utf-8
#filename:codingTest.py
import sys
import urllib2
import chardet

def blog_detect(blogurl):
        try:
                fp=urllib2.urlopen(blogurl)
        except Exception,e:
                print e
                print 'download exception %s' %blogurl
                return 0
        blog=fp.read()
        print blog
        codedetect=chardet.detect(blog)["encoding"]
        print '%s------>%s' %(blogurl,codedetect)
        fp.close()
        return 1
if __name__=='__main__':
        if len(sys.argv)==1:
                print 'usage:\n\t python codingTest.py http://'
        else:
                blog_detect(sys.argv[1])

测试结果：

root@zhou:/home/zhouqian/python# python codingTest.py http://www.baidu.com
http://www.baidu.com------>GB2312
root@zhou:/home/zhouqian/python# python codingTest.py http://www.google.cn
http://www.google.cn------>utf-8

遇到的问题总结：

打开url链接所用到的urllib2这个模块，第一次接触学习下

还有chardet用来检测字符类型的模块，使用方法chardet.detect(blog)["encoding"]返回字符类型，上面已经测试了

然后就是文本的编码方式，以及转换方法。其中对于python2来说默认的编码方式ASCII，对于python3来说默认的编码方式：unicode。

他们存储文本信息时大致的过程是先解码转换成对应的二进制格式，输出时再通过相应编码格式，呈现出来。

习题2.不是utf-8的变成utf-8格式的

#!/usr/bin/python
#coding=utf-8
#filename:codingTest.py
import sys
import urllib2
import chardet

def blog_detect(blogurl):
        try:
                fp=urllib2.urlopen(blogurl)#了解下就可以了。
        except Exception,e:
                print e
                print 'download exception %s' %blogurl
                return 0
        blog=fp.read()
        print blog
        codedetect=chardet.detect(blog)["encoding"]#主要的操作之一
        print '%s------>%s' %(blogurl,codedetect)
        fp.close()
        print '########进行转换#####'
        if codedetect<>'utf-8':
                try:
                        #这里是代码的核心，也是python文本编码的主要用法所在unicode和ASCII这两种编码方式
                        blog=unicode(blog,codedetect)#进行解码操作
                        blog=blog.encode('utf-8')#进行编码操作
                except:
                        print 'bad unicode encode try'
                        print 'failed convert'
                        return 0
        filename='%s_utf-8' %blogurl[7:]#存储的是文件名以博客的链接来命名的
        filename=filename.replace('/','_')
        open(filename,'w').write(blog)
        print 'save to file %s' %filename
        return 1
if __name__=='__main__':
        if len(sys.argv)==1:
                print 'usage:\n\t python codingTest.py http://'
        else:
                blog_detect(sys.argv[1])

测试结果：

zhouqian@zhou:~/python$ python codingConvert.py http://www.baidu.com
http://www.baidu.com------>GB2312
########进行转换#####
save to file www.baidu.com_utf-8

生成这么一个文件，可以看到中文字符，之前没有转换的时候是乱码。

遇到的问题跟上面差不多，两者结合而已。

辉蛋儿

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
可爱的python课后习题（三）

1，判定某个blog的编码方式：#!/usr/bin/python#coding=utf-8#filename:codingTest.pyimport sysimport urllib2import chardetdef blog_detect(blogurl): try: fp=urllib2.urlopen(blogurl)
复制链接

扫一扫