Python处理Excel文件

最新推荐文章于 2023-09-21 16:32:36 发布

hujun8610

最新推荐文章于 2023-09-21 16:32:36 发布

阅读量2k

点赞数

分类专栏： python 文章标签： python excel encoding linux

本文链接：https://blog.csdn.net/hujun8610/article/details/36640959

版权

python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

在数据统计分析中，有很多需求需要在Linux环境下生成excel报表文件。在Linux中，我们可以使用Python来读取、创建、修改excel文件。

使用Python读取excel文件主要有两种方式，分别是使用xlrd、xlrt和openpyxl。其中xlrd、xlrt主要用于excel2003格式的文件的读写，而openpyxl主要用于读写excel2007/2010格式的文件。

xlrd及xlwt的使用

xlrd：用于excel文件的读取。下载xlrd 0.9.3源文件，https://pypi.python.org/packages/source/x/xlrd/xlrd-0.9.3.tar.gz，解压下载的源文件，运行python setup.py install，安装xlrd包。

xlwt：用于excel文件的写入（包括修改）。下载xlwt-0.7.5源文件，https://pypi.python.org/packages/source/x/xlwt/xlwt-0.7.5.tar.gz，解压之后，进入文件目录，运行python setup install，安装xlwt包。

使用xlrd读取excel文件

导入 import xlrd

打开excel文件：

file=xlrd.open_workbook('test.xls')

查看包含的sheet的名称

file.sheet_names()

得到工作表或者通过索引及名称获取指定的sheet

table=file.sheets()[0]

table=file.sheet_by_index(0)

table=file.sheet_by_name('Sheet1')

获取行数及列数 table.nrows，table.ncols

获取整行和整列的值（结果为数组）：table.row_values(i),table.col_values(i)

循环得到行及列的索引列表

for rownum in range(table.nrows)

for colnum in range(table.ncols)

单元格：

cell1=table.cell(0,0).value 注意此时的行与列从0开始，及（0,0）代表A1单元格

分别使用行列的索引： cell1=table.row(0)[0].value A1单元格 cell2=table.col(1)[0].value A2单元格

ctype代表类型，0 empty,1 string,2 number,3 date,4 boolean,5 error

以下是一个简单的demo

# -*- coding: utf-8 -*-
import xdrlib ,sys
import xlrd
def open_excel(file= 'file.xls'):
    try:
        data = xlrd.open_workbook(file)
        return data
    except Exception,e:
        print str(e)
#根据索引获取Excel表格中的数据   参数:file：Excel文件路径     colnameindex：表头列名所在行的所以，by_index：表的索引
def excel_table_byindex(file= 'file.xls',colnameindex=0,by_index=0):
    data = open_excel(file)
    table = data.sheets()[by_index]
    nrows = table.nrows #行数
    ncols = table.ncols #列数
    colnames = table.row_values(colnameindex) #某一行数据
    list =[]
    for rownum in range(1,nrows):

         row = table.row_values(rownum)
         if row:
             app = {}
             for i in range(len(colnames)):
                app[colnames[i]] = row[i]
             list.append(app)
    return list

#根据名称获取Excel表格中的数据   参数:file：Excel文件路径     colnameindex：表头列名所在行的所以，by_name：Sheet1名称
def excel_table_byname(file= 'file.xls',colnameindex=0,by_name=u'Sheet1'):
    data = open_excel(file)
    table = data.sheet_by_name(by_name)
    nrows = table.nrows #行数
    colnames = table.row_values(colnameindex) #某一行数据
    list =[]
    for rownum in range(1,nrows):
         row = table.row_values(rownum)
         if row:
             app = {}
             for i in range(len(colnames)):
                app[colnames[i]] = row[i]
             list.append(app)
    return list

def main():
   tables = excel_table_byindex()
   for row in tables:
       print row

   tables = excel_table_byname()
   for row in tables:
       print row

if __name__=="__main__":
    main()

使用xlwt写excel文件

导入 import xlwt

新建一个excel文件： file=xlwt.Workbook(),如果写入的数据中包含中文，则需要加上encoding='UTF-8'

新建一个sheet：table=file.add_sheet('sheet_name')

写入数据：table.write(0,0,'test')

注意：如果对一个单元格进行重复操作，会出现return error,出现异常，此时需要在创建sheet时，添加cell_overwrite_ok=True来解决。eg：table=file.add_sheet('sheetname',cell_overwrite_ok=True )

保存文件：file.save('test.xls')

使用style：

style=xlwt.XFStyle

font=style.Font()

font.name = 'Times New Roman'

font.bold = True

style.font = font #为样式设置字体

table.write(0, 0, 'some bold Times text', style) # 使用样式

实例：

from xlwt import Workbook,Style

def getRows():
    rows = []
    row1 = {'a':'6876890809890890098', 'b':'中过人', 'c':''}
    row2 = {'a':'687235890890098', 'b':'中国人','c':'98734978329'}
    row3 = {'a':'68768343890098', 'b':'English', 'c':''}
    row4 = {'a':'34534534643653452', 'b':'American','c':'3234'}
    rows.append(row1)
    rows.append(row2)
    rows.append(row3)
    rows.append(row4)
    return rows

def getFieldnames():
    return {'a':'银行帐号', 'b':'国籍','c':'电话'}

def convDict2SortedList(dictRow):

    listRow = []
    keys = dictRow.keys()
    keys.sort()
    for key in keys:
        listRow.append(dictRow[key])
    return listRow

def write2Excel(filename, fieldnames, rows):

    wb = Workbook(encoding='UTF-8')
    ws = wb.add_sheet('表格',cell_overwrite_ok=True)

    #得到第0行,写入标题
    row = ws.row(0)
    listRow = convDict2SortedList(fieldnames)
    for j in range(len(listRow)):
        row.write(j, listRow[j])
        print listRow[j]

    #从第i行开始写
    i_row = 1
    for i in range(i_row, len(rows) + i_row):
        #得到工作表的第i行
        row = ws.row(i)
        #将字典行转换为经过key排序的list,保证不错行
        listRow = convDict2SortedList(rows[i - i_row])
        #逐个单元格写入
        for j in range(len(rows[i - i_row])):
            #如果转换为字符串后大于15的长度,是数字的话会被
            #损失精度,后面转换为0,这里用文本的方式写入
            if len(str(listRow[j])) > 15:
                row.set_cell_text(j, listRow[j])
            else:
                row.write(j, listRow[j])
    wb.save(filename)

if __name__ == '__main__':
    filename = '/home/xiaobing/workspace/python/excel/myexcel.xls'
    rows = getRows()
    fieldnames = getFieldnames()
    write2Excel(filename, fieldnames, rows)

需要注意的是使用xlwt时，最多只能写入65535行，则是excel2003格式的一个限制，因此如果需要存储的数据真的特别多话，还是推荐使用openpyxl读写xlsx文件，下面就将介绍使用openpyxl读取excel2010格式的文件。

openpyxl的使用

openpyxl可以同时读取及写入数据，其主要用于读写xlsx文件。

下载安装文件，https://pypi.python.org/packages/source/o/openpyxl/openpyxl-2.0.4.tar.gz，解压之后运行，python setup.py install

导入：import openpyxl

读取xlsx文件：

from openpyxl import load_workbook

wb=load_workbook(filename='demo.xlsx')

ws=wb.get_sheet_by_name(name='test')

print ws['A1'].value

读取大文件：

wb = load_workbook(filename = 'large_file.xlsx', use_iterators = True)

ws = wb.get_sheet_by_name(name = 'big_data') # ws is now an IterableWorksheet

for row in ws.iter_rows(): # it brings a new method: iter_rows()

for cell in row:

print cell.value

写xlsx文件：

wb=Workbook()

当写入的数据中存在中文时，可能会包异常，下面是一种解决异常的方案：

reload(sys)
sys.setdefaultencoding("utf-8")
file=Workbook(encoding='utf-8') #以utf-8的格式创建文件

创建worksheet

ws1 = wb.create_sheet() # insert at the end (default)
ws2 = wb.create_sheet(0) # insert at first position

设置title： ws.title='Testtitle'

获取sheet:get_sheet_by_name('sheet_name')

cell的操作

cell1=ws['A4'] #与excel中描述单元格的方式相同
ws['A4']=4 设置单元格的值
cell2=ws.cell('A4')
cell3=ws.cell(row=4,column=2)
ws.cell(row=4,column=2).value='test' #设置单元格的值
ws.cell('%s%s'%('A',4)).value='test' #设置单元格值

保存文件：

wb=Workbook()

wb.save(filename='demo.xlsx')

需要注意的是在openpyxl中，通过index访问excel文件，index的初值为1，这与xlrd及xlwt是不一样的。

实例

from openpyxl import Workbook
from openpyxl.compat import range
from openpyxl.cell import get_column_letter

wb = Workbook()
dest_filename = r'empty_book.xlsx'
ws = wb.active
ws.title = "range names"
for col_idx in range(1, 40):
    col = get_column_letter(col_idx)
    for row in range(1, 600):
        ws.cell('%s%s'%(col, row)).value = '%s%s' % (col, row)

ws = wb.create_sheet(
ws.title = 'Pi'
ws['F5'] = 3.14
wb.save(filename = dest_filename)

以上就是在使用python处理excel的几种方式，比较推荐第二种使用openpyxl处理excel文件的方式，主要是因为xlsx相对于xls有很多优点，比如相同的数据，占用的空间更小，并且支持的最大的数据量也比xlsx大很多。

参考网页：

http://www.cnblogs.com/lhj588/archive/2012/01/06/2314181.html

http://zhaojing366.blog.163.com/blog/static/952384020124311071392/

http://blog.csdn.net/xiaobing_blog/article/details/14057097

http://pythonhosted.org//openpyxl/index.html

hujun8610

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Python处理Excel文件

在数据统计分析中，有很多需求需要在Linux环境下生成excel报表文件。在Linux中，我们可以使用Python来读取、创建、修改excel文件。使用Python读取excel文件主要有两种方式，分别是使用xlrd、xlrt和openpyxl。其中xlrd、xlrt主要用于excel2003格式的文件的读写，而openpyxl主要用于读写excel2007/2010格式的文件。
复制链接

扫一扫