python可以帮助做excel-让搬砖变得轻松——python操作excel用的脚本-CSDN博客

这篇博客介绍了如何使用Python的openpyxl库将大量的.dat文件转换为.xlsx格式。首先，通过openpyxl安装并导入必要的模块，然后定义读取.dat文件的函数，将数据分割并存储为列表。接着，创建.xlsx文件，每个.dat文件对应一个工作表，并以目录名命名。最后，将数据写入工作表并保存。整个过程自动化处理多个文件，简化了手动操作。

摘要由CSDN通过智能技术生成

首先引入一个概念，python这个语言里面有很多很成熟的"包”，也就是各种操作的moudle，对于某一个特定的project里面的code，我们可以调用各种各样的包来帮助我们完成某些操作，这里不由得感叹一句，那些开发出这些包的人真的是蛮厉害的

在调用这些interpreter之前，我们要先加载这些包，这个操作可以在Pycharm里面进行，preference->Project: Projectname->Project interpreter->"+" sign

1ce36e71c37d

加载包们的窗口

读了一些各位大牛的博客之后发现，好像python的话常用来操作excel的有这个openpyxl，xlwt，xlrd等等，大概看了看大佬们的推荐，感觉openpyxl的功能比较全面，而且只有一个包就够了

1ce36e71c37d

各个操作excel的python包的功能对比

好像这个的缺点是处理数据需要的时间相对来说比较长，不过我的数据处理量也没有那么大，于是就选择了openpyxl这个包来折腾

首先明确一下这个脚本的目的，就是读取从server上下载的.dat文件，这个文件里面的格式大概是这样的

1ce36e71c37d

.dat 文件的样子（可能已经有人发现了是bader的输出文件）

对于这样一个文件，我们把它转换成Excel的文件会更容易进行直观的编辑，我们想要的结果大概是这样的

1ce36e71c37d

希望他变成.xlsx的文件的样子

而我们有一大堆这样的文件，分别处于以规律的数字命名的directory里（其实也就是提交job时候的文件夹），于是我们就需要一个script来分别打开这样的.dat文件，再把他们写入到一个workbook里面，每一个.dat文件独占一个worksheet，worksheet以读取该.dat文件的directory的名字命名，这就是我们这个script的意义

步骤如下：

1. 读取.dat文件

使用python里面的open打开文件，读取每一行的数据

def ReadTxt(file):

ls = list()

with open(file, 'r', encoding='utf-8-sig') as f:

再将读取的数据分割成columns，因为数据一共7个cols，所以这里分割成了7个，这个写得有点愚蠢，大大们不要喷我哈，以后再尝试改成更精简的方法

for line in f.readlines():

# To check if the given data can be split

try:

lists = line.split(None, 7)

# Split the list 'line' into 7 different parts, according to if it is an alpha

co1 = lists[0]

co2 = lists[1]

co3 = lists[2]

co4 = lists[3]

co5 = lists[4]

co6 = lists[5]

co7 = lists[6]

# split the given data into cols

ls.append((float(co1), float(co2), float(co3), float(co4), float(co5), float(co6), float(co7)))

# Put remainders in the list

except:

print('Wrong format!')

这样我们就得到了一个存着被分割好的数据的list（这个list感觉就像c里面的一个2d array，不知道我理解得有没有问题），然后我们把这个list写进.xlsx文件就可以了

2. 写入.xlsx文件

a. 首先用pip3安装openpyxl

openpyxl这个Moudle似乎比较特殊，要先安装jdcal这个moudle，否则是无法安装成功的？这里有大大懂这个的话希望可以帮忙解释一下

pip3 install jdcal

提示成功之后再如法炮制安装openpyxl

pip3 install openpyxl

b. 安装完成之后可以开始写调用这个包的code了

这里我们用到的openpyxl里面的东西主要是这个load_workbook

from openpyxl import Workbook, load_workbook

之后可以写出

wb = load_workbook(path)

这样就可以打开/创建一个处于给定的path的xlsx文件了，这个文件是一个workbook，里面最开始会自动创建一个worksheet，名叫Sheet，这个操作就和直接新建一个空白的xlsx是一样的

在创建了这个文件之后，我们可以对这个文件进行写入和读出等操作，首先是创建一个以读取.dat文件的directory的名字命名的worksheet

sheet = wb.create_sheet(sheet_name)

然后将之前的list写入，保存文件就可以了，记得写完之后要保存，不然就像我们平时写东西但是退出的时候选择’不保存"一样，白干一场了

index = len(value)

for i in range(index):

sheet.append(value[i])

# write the list into worksheet rank by rank

# Save the workbook in previous path

wb.save(path)

3. 整理一下并且加入一些prompts

加入一些读取格式错误的prompts，让这些prompts能够print到log文件里，以后出了问题查起来也方便，当然最好是不出问题

最后的代码就是这个样子了

#!/usr/bin/env python

# a script to read text and transfer to xlsx file

#-*- coding:utf-8 -*-

from openpyxl import Workbook, load_workbook

# Read from Text (.dat)

def ReadTxt(file):

ls = list()

with open(file, 'r', encoding='utf-8-sig') as f:

num = 1

# A statistic num of ranks

for line in f.readlines():

# To check if the given data can be split

try:

lists = line.split(None, 7)

# Split the list 'line' into 7 different parts, according to if it is an alpha

co1 = lists[0]

co2 = lists[1]

co3 = lists[2]

co4 = lists[3]

co5 = lists[4]

co6 = lists[5]

co7 = lists[6]

# split the given data into cols

if co1 != "#":

# Chew up the first line title

ls.append((float(co1), float(co2), float(co3), float(co4), float(co5), float(co6), float(co7)))

# Put remainders in the list

num = num +1

except:

print('Wrong format in line ' + str(num) + '!')

num = num +1

# return as a list

return ls

# Write in xlsx

def Write_Excel(path, sheet_name, value):

index = len(value)

# To detect how many ranks in the list

wb = load_workbook(path)

# Open a workbook (No matter already exist or not) in a specific path, which we can specify when we call this func

sheet = wb.create_sheet(sheet_name)

# Create a new worksheet in this workbook, named as given name

#sheet.column_dimensions['B'].width = 115

# Set cell format, width, height...

for i in range(index):

sheet.append(value[i])

# write the list into worksheet rank by rank

# Save the workbook in previous path

wb.save(path）

print("Current txt " + sheet_name + " has been wrote, Tadaaaaaa!")

# Remove empty sheet

def Remove_empty(path):

wb = load_workbook(path)

ws = wb['Sheet']

wb.remove(ws)

# Remove empty sheet

wb.save(path)

print('Empty sheet has been removed successfully')

# Main func

if __name__=='__main__':

book_name_xlsx = r'/path/sum.xlsx'

# .xlsx file path where we want to generate this file

wb = Workbook()

wb.save(book_name_xlsx)

# Create and save file as given name and path

# Create a worksheet named as given word

for name in range(12,21):

# Set the range as 12 to 20, which is the range of interested interlayer distance

sheet_name_xlsx = str(name)

# Use target directory name as sheet name

art = ReadTxt(r'/path/'+sheet_name_xlsx+'/ACF.dat')

# Call previous ReadText func, the path is the parent folder of tasks

# Insert the title

art.insert(0, ('#', 'X', 'Y', 'Z', 'CHARGE', 'MIN DIST', 'ATOMIC VOL'))

Write_Excel(book_name_xlsx, sheet_name_xlsx, art)

# Call previous write func

Remove_empty(book_name_xlsx)

# Remove empty sheet

这个就是最后的半成品了，可以再加一些其他的功能，整体的框架大概就是这样，引入了一个openpyxl的包，实现了对.xlsx文件的创建以及写入，希望能帮到苦于机械式重复操作的你

range () 函数的使用是这样的:

range(start, stop, [step])，分别是起始、终止和步长，实际范围是从start 到 stop-1