python路径转换_Python应用-临床路径格式转换

最新推荐文章于 2022-11-23 15:03:15 发布

weixin_39814378

最新推荐文章于 2022-11-23 15:03:15 发布

阅读量136

点赞数

文章标签： python路径转换

有这么一个需求：有近1000个临床路径，给的doc格式的文档，里面有文字描述形式的住院流程和表格形式的临床路径表单，现在需要将其中的表格按照模板存为excel格式，导入医院HIS。

doc中表格是这样：

excel模板是这样：

时间段排序12345，主要诊疗工作、重点医嘱、主要护理工作排序123，组合成1-1、1-2、1-3、2-1形式的单元标记。

手动做一两个没问题，多了就太麻烦了，想想办法批量处理吧。

先考虑一下如何读取word中的表格，在网上查询、踩坑之后总结如下：

环境是Python3.5，网上先搜到的win32com和docx不好用，要用的是python-docx模块，When import docx in python3.3 I have error ImportError: No module named 'exceptions' 这个里面说明了Python3x与python-docx兼容。另外搜到的是，python-docx不支持读取doc，需要先将doc转存为docx了再处理。

只需要导入：

from docx import Document

主要操作如下：

转换doc到docx：

if filename.endswith('.doc'):

word = wc.Dispatch('word.application')

doc = word.Documents.Open(filename)

docxfilename = filename+'x'

doc.SaveAs(docxfilename, 12)

word.Application.Quit()

d = Document(docxfilename)

excelfilename = (filename.strip('doc'))+'xls'

print(excelfilename)

else:

d = Document(filename)

excelfilename = (filename.strip('docx'))+'xls'

print(excelfilename)

建表格：

#过滤文本

declude = ['', '长期医嘱：', '临时医嘱：', '出院医嘱：', '长期医嘱:', '临时医嘱:', '出院医嘱:']

misorder = ['主','要','诊','疗','工','作','重','点','医','嘱']

#总行号

linenumber = 0

wbk = xlwt.Workbook(encoding='utf-8', style_compression=0)

#表格建立，首行写入标题

sheet = wbk.add_sheet('数据', cell_overwrite_ok=True)

sheet.write(0, 0, '项目名称')

sheet.write(0, 1, '归属路径')

sheet.write(0, 2, '单元标记')

excellinenumber = 1

检查表格是否为7行，不是的话不进行后续的处理：

#取每一个表格，检查表格是否为7行，表格行数存tablelength中

tablelength = []

for t in d.tables:

tablelength.append(len(t.rows))

print('debug:tablelength=', tablelength)

#标准表格为7行

for i in tablelength:

if not i == 7:

return 1

WORD的表格中，第一列、第一行，第五行及以上的文字都不需要，跳过。每一列作为序号，每一大行作为小序号；内容分行，与归属路径和单元标记一起写进excel表格中保存；

#取每一个表格

for t in d.tables:

#取每一列

for columnnumber, columnelement in enumerate(t.columns):

#第一列不需要

if columnnumber>0:

# 取每一大格

linenumber += 1

for j,cellelement in enumerate(columnelement.cells):

#取第二、三、四行

if j > 0 and j < 4:

#生成单元标记

danyuanbiaoji = str(linenumber) + '-' + str(j)

#文本内容分行

textindex = cellelement.text.splitlines()

for line in textindex:

if line in misorder:

return 1

#去掉文本中的'**医嘱：'以及空行

if line not in declude:

print(line)

#去掉文本中的‘□’

if '□' in line:

newline = line.strip()

newline = newline[1:]

newline = newline.strip()

# print(newline, danyuanbiaoji)

sheet.write(excellinenumber, 0, newline)

else:

line = line.strip()

# print(line, danyuanbiaoji)

sheet.write(excellinenumber, 0, line)

sheet.write(excellinenumber, 2, danyuanbiaoji)

excellinenumber += 1

运行结果如下，打印日志提取出了要写入表格的内容，生成了上述模板：

批量处理：

def getdirfiles(self, dirname):

self.cleartempfiles(dirname)

dicts = {}

for root, dirs, files in os.walk(dirname):

for file in files:

# print(os.path.join(root, file))

filename = os.path.join(root, file)

shortname, extension = os.path.splitext(file)

excelfilename = shortname + '.xls'

excelfilename = os.path.join(root, excelfilename)

print('excelfilename:',excelfilename)

if os.path.exists(excelfilename):

dicts[shortname] = 0

if shortname not in dicts:

if not filename.startswith('~$'):

if filename.endswith('.doc') or filename.endswith('.docx'):

print('process:', filename)

dicts[shortname] = self.getdocxexcel(filename)

print('处理文件如下：')

filename1 = dirname + '\\已处理文件.txt'

filename2 = dirname + '\\非标准、未处理文件.txt'

f1 = open(filename1, 'w')

for k, v in dicts.items():

if v == 0:

print(k, v)

tempstr = k + '\n'

f1.write(tempstr)

f1.close()

print('未处理，非标准格式文件如下：')

f2 = open(filename2, 'w')

for k, v in dicts.items():

if v == 1:

print(k, v)

tempstr = k + '\n'

f2.write(tempstr)

f2.close()

self.cleartempfiles(dirname)

return

生成的excel模板加上自定义的归属路径编号就可以往HIS里面上传制作临床路径了。

除开两个主要模块之外的其它部分：

import os

from docx import Document

from win32com import client as wc

import xlwt

class Solution:

def getdirfiles(self, dirname):

……

def cleartempfiles(self, dirname):

#清理临时文件

count = 0

for root, dirs, files in os.walk(dirname):

for file in files:

filename = os.path.join(root, file)

if file.startswith('~$'):

count += 1

os.remove(filename)

print('共清理文件数量：', count)

def getdocxexcel(self, filename):

……

test = Solution()

test.getdocxexcel("D:\\临床路径\\胫骨平台骨折.doc")

# test.getdirfiles('D:\\临床路径')

weixin_39814378

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python路径转换_Python应用-临床路径格式转换

有这么一个需求：有近1000个临床路径，给的doc格式的文档，里面有文字描述形式的住院流程和表格形式的临床路径表单，现在需要将其中的表格按照模板存为excel格式，导入医院HIS。doc中表格是这样：excel模板是这样：时间段排序12345，主要诊疗工作、重点医嘱、主要护理工作排序123，组合成1-1、1-2、1-3、2-1形式的单元标记。手动做一两个没问题，多了就太麻烦了，想想办法批量处理吧。...
复制链接

扫一扫