爬虫小程序之01
1.py:python 3x运行测试后的正确代码
#!user/bin/env python
# -*- coding:gbk -*-
# import importlib
# importlib.reload(sys)
import re
import urllib.request,os
import pymysql
from bs4 import BeautifulSoup
url1="http://www.doyouhike.net/dest/hongkongtamendao-camping"
# 赋一个URL
fp = urllib.request.urlopen(url1) #打开此URL
s = fp.read() #把上面操作的结果读取出来赋值给S
soup = BeautifulSoup(s) # 用BeautifulSoup分析S
polist = soup.findAll('span') # 找到所有tag <span>的内容
print (polist[0].contents[0]) # 打印出第一个tag <span>中间的内容
运行结果:
D:\SparkCollection\NetWorkSpark>python 01.py
C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\site-packages
bs4\__init__.py:181: UserWarning: No parser was explicitly specified, so I'm us
ng the best available HTML parser for this system ("lxml"). This usually isn't
problem, but if you run this code on another system, or in a different virtual
environment, it may use a different parser and behave differently.
The code that caused this warning is on line 15 of the file 01.py. To get rid o
this warning, change code that looks like this:
BeautifulSoup(YOUR_MARKUP})
to this:
BeautifulSoup(YOUR_MARKUP, "lxml")
markup_type=markup_type))
Traceback (most recent call last):
File "01.py", line 18, in <module>
print (polist[0]/contents[0]) # 打印出第一个tag <span>中间的内容
NameError: name 'contents' is not defined
D:\SparkCollection\NetWorkSpark>python 01.py
C:\Users\Administrator\AppData\Local\Programs\Python\Python35\lib\site-packages
bs4\__init__.py:181: UserWarning: No parser was explicitly specified, so I'm us
ng the best available HTML parser for this system ("lxml"). This usually isn't
problem, but if you run this code on another system, or in a different virtual
environment, it may use a different parser and behave differently.
The code that caused this warning is on line 15 of the file 01.py. To get rid o
this warning, change code that looks like this:
BeautifulSoup(YOUR_MARKUP})
to this:
BeautifulSoup(YOUR_MARKUP, "lxml")
markup_type=markup_type))
目的地
【附注】:该代码是参照书本,书本源代码的风格是python 3x版本以下的,但是我是python 3.5x版本,所以遇到一些问题,同时做了一些修改