Python 3.x 与Python 2.x 的不同点汇总笔记-CSDN博客

本文链接：https://blog.csdn.net/luzuiwutong/article/details/42127547

Python3.x 相对于Python2.x 有了些许改动，这样在用最新版Python3学习那些经典的教程（通常都是Python2版本的），运行例子的时候总容易报错，但是既然是新的版本就是对之前的版本有更新改进的，用新版本编程肯定会是大的趋势，下面总结常见的区别内容。

1.open(filename, 'w')函数

Python2 中 open(filename,‘w’).write(content)

Python3 如果编辑此句代码，输出的内容格式不正确，因为python3读取的是bytes类型。如下载的html文件打开不能读取到正常的文字内容。

应改为：open(filename,‘wb’).write(content)

格式知识扩展：

"r" 以读方式打开，只能读文件，如果文件不存在，会发生异常

"w" 以写方式打开，只能写文件，如果文件不存在，创建该文件

如果文件已存在，先清空，再打开文件

"rb" 以二进制读方式打开，只能读文件，如果文件不存在，会发生异常

"wb" 以二进制写方式打开，只能写文件，如果文件不存在，创建该文件

如果文件已存在，先清空，再打开文件

"rt" 以文本读方式打开，只能读文件，如果文件不存在，会发生异常

"wt" 以文本写方式打开，只能写文件，如果文件不存在，创建该文件

如果文件已存在，先清空，再打开文件

"rb+" 以二进制读方式打开，可以读、写文件，如果文件不存在，会发生异常

"wb+" 以二进制写方式打开，可以读、写文件，如果文件不存在，创建该文件

如果文件已存在，先清空，再打开文件

2. Python2 中xrange, python3 中应将xrange 改为 range.

例如:

for i in range(0,30):
        url = 'http://cn.bing.com/HPImageArchive.aspx?format=js&idx='+str(i)+'&n=1&nc=1361089515117&FORM=HYLH1'
        html = urllib.request.urlopen(url).read()

3. Python2 中模块Beautifulsoup的调用。

import beautifulsoup as Beautifulsoup 在 Python3 中应改为 bs4

举个栗子：

from bs4 import BeautifulSoup
import re

doc = ['<html><head><title>Page title</title></head>',
       '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
       '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
       '</html>']
soup = BeautifulSoup(''.join(doc))

print (soup.prettify())

4. 在python3 中, print 应该为print(), () 里为所有要输出的内容。

5. Python3中 text =text.decode(‘utf-8)

代码例子：抓取Bing 搜索背景图

import urllib.request,re,sys,os
def get_bing_backphoto():
    if (os.path.exists('photos')== False):
        os.mkdir('photos')
    for i in range(0,30):
        url = 'http://cn.bing.com/HPImageArchive.aspx?format=js&idx='+str(i)+'&n=1&nc=1361089515117&FORM=HYLH1'
        html = urllib.request.urlopen(url).read()
        if html == 'null':
            print( 'open & read bing error!')
            sys.exit(-1)
        html = html.decode('utf-8')
        reg = re.compile('"url":"(.*?)","urlbase"',re.S)
        text = re.findall(reg,html)
        #http://s.cn.bing.net/az/hprichbg/rb/LongJi_ZH-CN8658435963_1366x768.jpg
        for imgurl in text :
            right = imgurl.rindex('/')
            name = imgurl.replace(imgurl[:right+1],'')
            savepath = 'photos/'+ name
            urllib.request.urlretrieve(imgurl, savepath)
            print (name + ' save success!')
get_bing_backphoto()

6. Python2中的urllib 和urllib2 整合为urllib在python3中。

调用模块时： import urllib.request

7. Python2 中的raw.input() 改为 input()

8. 在抓取网页内容时，python3如果按照python2的代码编辑出现如下报错：

Can’t use a string pattern on abytes-like object

可通过b’转化数据类型，例：

url = “http://www.baidu.com”

html = urllib.request.urlopen(url).read()

url = re.findall( b’/song/dt’, html, re.M)