网上找了很多都没有这方面的资料,自己摸索了一下。
针对一个爬虫,获取百度首页每条每条链接的文字,计算cpu时间(关于计时方法请看我另外一篇博客)
# -*- coding:utf-8 -*-
import timeit
class BaiduSearch:
def __init__(self, search_keyword):
self.search_keyword = search_keyword
self.user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0'
self.headers = { 'User-Agent' : self.user_agent }
self.cookies = cookielib.CookieJar()
self.tool = Tool()
def getBaiduLinkTopic(self):
basicurl = 'https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=1&rsv_idx=1&tn=baidu&wd='
url = basicurl + self.search_keyword
try:
request = urllib2.Request(url, headers = self.headers)
response = urllib2.urlopen(request)
pageCode = response.read().decode('utf-8','ignore')
# print pageCode
print 'decode successfully'
except urllib2.URLError, e:
if hasattr(e,"code"):
print u"连接百度网站失败,错误原因",e.code
if hasattr(e,"reason"):
print u"连接百度网站失败,错误原因",e.reason
content = ''
pattern1 = re.compile('data-tools=\'{\"title\":\"(.*?),\"url\"', re.S)
result1 = re.findall(pattern1, pageCode)
print '百度网页链接标题:'
file = open('LinkTopic.txt','a')
# print result1
for item in result1:
print item
file.write(item)
file.write('\n')
file.close( )
def start(self):
self.getBaiduRecommendedTopic()
bdtb = BaiduSearch("影音电器")
bdtb.start()
1.首先是在文件末尾添加计时代码print timeit.timeit('getBaiduLinkTopic()', 'from __main__ import BaiduSearch.getBaiduLinkTopic',number = 1) ,运行报错:
File "<timeit-src>", line 1
from __main__ import BaiduSearch.getBaiduLinkTopic
^
SyntaxError: invalid syntax
2.之后想了另写一个文件,导入这个类再进行计时
# -*- coding:utf-8 -*-
import timeit
import Variety
littleV = Variety.Variety("影音电器")
func1_test = littleV.BaiduSearch().getBaiduLinkTopic()
print timeit.timeit(func1_test, 'from Variety import BaiduSearch("影音电器").getBaiduLinkTopic()',number = 100)运行报错:
AttributeError: 'module' object has no attribute 'Variety'
3.还试过更改导入方法,
改成from Variety import * 结果报错:
NameError: name 'Variety' is not defined
4.最后在原文件里加了一个方法:
def localtimer(self):
print 'executive time for BaiduLinkTopic():',timeit.timeit('BaiduSearch("影音电器").getBaiduLinkTopic()', 'from __main__ import BaiduSearch',number = 1)
再在start()方法里self.localtimer()调用就好