python3.x 的urllib使用例子

最新推荐文章于 2024-09-27 09:41:28 发布

zsuguangh

最新推荐文章于 2024-09-27 09:41:28 发布

阅读量8.6w

点赞数 4

分类专栏：脚本文章标签： python encoding html import string url

脚本专栏收录该内容

10 篇文章 0 订阅

订阅专栏

2.x版本的python可以直接使用import urllib来进行操作，但是3.x版本的python使用的是import urllib.request来进行操作，下面是简单的例子：

===============================================================================

# get code of given URL as html text string
# Python3 uses urllib.request.urlopen()
# instead of Python2's urllib.urlopen() or urllib2.urlopen()

import urllib.request

fp = urllib.request.urlopen("http://www.python.org")

mybytes = fp.read()
# note that Python3 does not read the html code as string
# but as html code bytearray, convert to string with
mystr = mybytes.decode("utf8")

fp.close()

print(mystr)

================================================================================

# get the code of a given URL as html text string
# Python3 uses urllib.request.urlopen()
# get the encoding used first
# tested with Python 3.1 with the Editra IDE

import urllib.request

def extract(text, sub1, sub2):
    """
    extract a substring from text between first
    occurances of substrings sub1 and sub2
    """
    return text.split(sub1, 1)[-1].split(sub2, 1)[0]

fp = urllib.request.urlopen("http://www.python.org")

mybytes = fp.read()

encoding = extract(str(mybytes).lower(), 'charset=', '"')
print('-'*50)
print( "Encoding type = %s" % encoding )
print('-'*50)

if encoding:
    # note that Python3 does not read the html code as string
    # but as html code bytearray, convert to string with
    mystr = mybytes.decode(encoding)
    print(mystr)
else:
    print("Encoding type not found!")

fp.close()

==================================================================================

参考：http://www.daniweb.com/forums/thread213221.html