使用python处理excel,使用python处理Excel 2007文件

Using python I need to be able to do the following operations to a workbook for excel 2007:

delete rows

sorting a worksheet

getting distinct values from a column

I am looking into openpyxl; however, it seems to have limited capabilities.

Can anyone please recommend a library that can do the above tasks?

解决方案

I want to preface this with letting you know this is only a windows based solution. But if you are using Windows I would recommend using Win32Com which can be found here. This module gives Python programmatic access to any Microsoft Office Application (including Excel) and uses many of the same methods used in VBA. Usually what you will do is record a macro (or recall from memory) how to do something in VBA and then use the same functions in Python

To start we want to connect to Excel and get access to the first sheet as an example

#First we need to access the module that lets us connect to Excel

import win32com.client

# Next we want to create a variable that represents Excel

app = win32com.client.Dispatch("Excel.Application")

# Lastly we will assume that the workbook is active and get the first sheet

wbk = app.ActiveWorkbook

sheet = wbk.Sheets(1)

At this point we have a variable named sheet that represents the excel work sheet we will be working with. Of course there are multiple ways to access the sheet, this is usually the way I demo how to use win32com with excel because it is very intuitive.

Now assume I have the following values on the first sheet and I will go over one by one how to answer what you were asking:

A

1 "d"

2 "c"

3 "b"

4 "a"

5 "c"

Delete Rows:

Lets assume that you want to delete the first row in your active sheet.

sheet.Rows(1).Delete()

This creates:

A

1 "c"

2 "b"

3 "a"

4 "c"

Next Lets sort the cells in ascending order (although I would recommend extracting the values to python and doing the sorting within a list and sending the values back)

rang = sheet.Range("A1","A4")

sheet.Sort.SetRange(rang)

sheet.Sort.Apply()

This creates:

A

1 "a"

2 "b"

3 "c"

4 "c"

And now we will get distinct values from the column. The main thing to take away here is how to extract values from a cells. You can either select a lot of cells at once and with sheet.Range("A1","A4") or you can access the values by iterating over cell by cell with sheet.Cells(row,col). Range is orders of magnitude faster, but Cells is slightly easier for debugging.

#Get a list of all Values using Range

valLstRange = [val[0] for val in sheet.Range("A1","A4").Value]

#Get a list of all Values using Cells

valLstCells = [sheet.Cells(row,1).Value for row in range(1,4)]

#valLstCells and valLstRange both = ["a","b","c","c"]

Now lastly you wanted to save the workbook and you can do this with the following:

wbk.SaveAs("C:/savedWorkbook.xlsx")

And you are done!

INFO About COM

If you have worked with VBA, .NET, VBscript or any other language to work with Excel many of these Excel methods will look the same. That is because they are all using the same library provided by Microsoft. This library uses COM, which is Microsoft's way of providing API's to programmers that are language agnostic. COM itself is an older technology and can be tricky to debug. If you want more information on Python and COM I highly recommend Python Programming on Win32 by Mark Hammond. He is the guy that gets a shoutout after you install Python on Windows in the official .msi installer.

ALTERNATIVES TO WIN32COM

I also need to point out there are several fantastic open source alternatives that can be faster than COM in most situations and work on any OS (Mac, Linux, Windows, etc.). These tools all parse the zipped files that comprise a .xlsx. If you did not know that a .xlsx file is a .zip, just change the extension to .zip and you can then explore the contents (kind of interesting to do at least once in your career). Of these I recommend Openpyxl which I have used for parsing and creating Excel files on a server where performance was critical. Never use win32com for server activities as it opens an out-of-process instance of excel.exe for each instance that can be leaky

RECOMMENDATION

I would recommend win32com for users who are working intimately with individual data sets (analysts, financial services, researchers, accountants, business operations, etc.) that are performing data discovery activities as it works great with open workbooks. However, developers or users that need to perform very large tasks with a small footprint or extremely large manipulations or processing in parallel must use a package such as openpyxl.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值