python 写excel saves报错_python 读写excel

最新推荐文章于 2023-09-04 18:08:39 发布

weixin_39517289

最新推荐文章于 2023-09-04 18:08:39 发布

阅读量419

点赞数

文章标签： python 写excel saves报错

本文链接：https://blog.csdn.net/weixin_39517289/article/details/111860435

版权

最近老大让从网站上获取数据，手动太慢，网上找了点python，用脚本操作。

1 importos2 importre3

4 importxlrd5 importrequests6 importxlwt7 from bs4 importBeautifulSoup8 from xlutils.copy importcopy9 from xlwt import *

12 defread_excel(path):13 #打开文件

14 workbook =xlrd.open_workbook(path)15 #获取所有sheet

17 #根据sheet索引或者名称获取sheet内容

18 sheet1 = workbook.sheet_by_index(0) #sheet索引从0开始

20 #sheet的名称，行数，列数

21 i =022 for sheet1_values insheet1._cell_values:23

24 str =sheet1_values[0]25 str.replace('\'','')26 print(str,i)27 response =get_responseHtml(str)28 soup =get_beautifulSoup(response)29 pattern1 = '^https://ews-aln-core.cisco.com/applmgmt/view-appl/+[0-9]*$'

30 pattern2 = '^https://ews-aln-core.cisco.com/applmgmt/view-endpoint/+[0-9]*$'

31 pattern3 = '^https://ews-aln-core.cisco.com/applmgmt/view-appl/by-name/'

32 if pattern_match(str,pattern1) orpattern_match(str,pattern3):33 priority = soup.find("table", class_="main_table_layout").find("tr", class_="centered sub_section_header").find_next("tr",34 align="center").find_all(35 "td")36 elifpattern_match(str,pattern2):37 priority = soup.find("table", class_="main_table_layout").find("tr",38 class_="centered").find_next(39 "tr",40 align="center").find_all(41 "td")42 else:43 print("no pattern")44 try:45 priorityNumble ='P' +get_last_td(priority)46

47 exceptException:48 print("没有找到"+str)49 priorityNumble = 'P' +get_last_td(priority)50 write_excel(path,i,1,priorityNumble)51 i = i + 1

52 defwrite_excel(path,row,col,value):53 oldwb =xlrd.open_workbook(path)54 wb =copy(oldwb)55 ws =wb.get_sheet(0)56 ws.write(row,col,value)57 wb.save(path)58 defget_last_td(result):59 for idx inrange(len(result)):60 returnResult =result[idx].contents[0]61 returnreturnResult62 defget_beautifulSoup(request):63 soup = BeautifulSoup(request, 'html.parser', from_encoding='utf-8', exclude_encodings='utf-8')64 returnsoup65 defget_responseHtml(url):66 headers ={67 'User-Agent': 'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}68 response = requests.get(url, auth=(userName, passWord),headers=headers).content69 returnresponse70 def pattern_match(str,pattern,flags =0):71 pattern =re.compile(pattern)72 returnre.match(pattern,str,flags)73

74 if __name__ == '__main__':75 userName = '*';76 passWord = '*'

77 path = r'*'

78 read_excel(path)

这里面坑可是不少

1.刚开始xlsx格式文件，save后不能打开，把excel格式改为xls才正确。

2.header网上找的，这样不会被认为是网络爬虫而报错：http.client.RemoteDisconnected: Remote end closed connection without response.

3.copy的参数要为workbook而不是xls的fileName，否则报错：AttributeError: ‘str’ object has no attribute ‘datemode’.

5.刚开始想往新的文件里save，用了新的路径，发现不可行，因为在for循环中每次都是从源excel中copy，所以实际结果只插入了一行。

6.python中beautiful soup的用法，很全的文档：Beautiful Soup 4.2.0 文档

8.从没写过python，第一次写，花了半天时间，还有很多可以改进的地方。