python 写excel saves报错_python 读写excel

最近老大让从网站上获取数据,手动太慢,网上找了点python,用脚本操作。

1 importos2 importre3

4 importxlrd5 importrequests6 importxlwt7 from bs4 importBeautifulSoup8 from xlutils.copy importcopy9 from xlwt import *

10

11

12 defread_excel(path):13 #打开文件

14 workbook =xlrd.open_workbook(path)15 #获取所有sheet

16

17 #根据sheet索引或者名称获取sheet内容

18 sheet1 = workbook.sheet_by_index(0) #sheet索引从0开始

19

20 #sheet的名称,行数,列数

21 i =022 for sheet1_values insheet1._cell_values:23

24 str =sheet1_values[0]25 str.replace('\'','')26 print(str,i)27 response =get_responseHtml(str)28 soup =get_beautifulSoup(response)29 pattern1 = '^https://ews-aln-core.cisco.com/applmgmt/view-appl/+[0-9]*$'

30 pattern2 = '^https://ews-aln-core.cisco.com/applmgmt/view-endpoint/+[0-9]*$'

31 pattern3 = '^https://ews-aln-core.cisco.com/applmgmt/view-appl/by-name/'

32 if pattern_match(str,pattern1) orpattern_match(str,pattern3):33 priority = soup.find("table", class_="main_table_layout").find("tr", class_="centered sub_section_header").find_next("tr",34 align="center").find_all(35 "td")36 elifpattern_match(str,pattern2):37 priority = soup.find("table", class_="main_table_layout").find("tr",38 class_="centered").find_next(39 "tr",40 align="center").find_all(41 "td")42 else:43 print("no pattern")44 try:45 priorityNumble ='P' +get_last_td(priority)46

47 exceptException:48 print("没有找到"+str)49 priorityNumble = 'P' +get_last_td(priority)50 write_excel(path,i,1,priorityNumble)51 i = i + 1

52 defwrite_excel(path,row,col,value):53 oldwb =xlrd.open_workbook(path)54 wb =copy(oldwb)55 ws =wb.get_sheet(0)56 ws.write(row,col,value)57 wb.save(path)58 defget_last_td(result):59 for idx inrange(len(result)):60 returnResult =result[idx].contents[0]61 returnreturnResult62 defget_beautifulSoup(request):63 soup = BeautifulSoup(request, 'html.parser', from_encoding='utf-8', exclude_encodings='utf-8')64 returnsoup65 defget_responseHtml(url):66 headers ={67 'User-Agent': 'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}68 response = requests.get(url, auth=(userName, passWord),headers=headers).content69 returnresponse70 def pattern_match(str,pattern,flags =0):71 pattern =re.compile(pattern)72 returnre.match(pattern,str,flags)73

74 if __name__ == '__main__':75 userName = '*';76 passWord = '*'

77 path = r'*'

78 read_excel(path)

这里面坑可是不少

1.刚开始xlsx格式文件,save后不能打开,把excel格式改为xls才正确。

2.header网上找的,这样不会被认为是网络爬虫而报错:http.client.RemoteDisconnected: Remote end closed connection without response.

3.copy的参数要为workbook而不是xls的fileName,否则报错:AttributeError: ‘str’ object has no attribute ‘datemode’.

5.刚开始想往新的文件里save,用了新的路径,发现不可行,因为在for循环中每次都是从源excel中copy,所以实际结果只插入了一行。

6.python中beautiful soup的用法,很全的文档:Beautiful Soup 4.2.0 文档

8.从没写过python,第一次写,花了半天时间,还有很多可以改进的地方。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值