python怎么批量读取文件_Python3自动化_文件批量处理(文本、PDF、Excel;读取、筛选、导出)...

利用Python3脚本语言的简练语法,高级语言的丰富类库,快速写了几个文件读取、筛选、导出的“脚本”。

这里简单总结一下关键功能。

读取ini配置文件

检查ini文件是否存在;检查输入的key在ini文件里是否有定义。

1 importconfigparser2

4 defgetConfigInfo(_ini_nm):5

6 #Open Ini File

7 config =configparser.ConfigParser()8 if not config.read(os.path.join(os.getcwd(), _ini_nm + r'.ini')):9 printLog('E', 'Read Ini file fail.')10

11 whileTrue:12 sysCode = input(r'Please input the system code : (Press [Enter] to quit):').strip()13 if 0 ==len(sysCode.strip()):14 exit()15

16 #Init ConnectionSettings

17 if sysCode inconfig.sections():18 returndict(config[sysCode])19 else:20 print('Ini info of System [%s] is blank.\n' % sysCode)

多参数输入的获取

检查参数个数;检查参数合法性(长度,是否目录);检查参数是否整个都是汉字。

1 def_main():2

3 path = ''

4 keyWord = ''

5

6 whileTrue:7 para = input(r'Please input the PDF directory and Key Word: (Press [Enter] to quit):').strip().split()8

9 if 0 ==len(para):10 exit()11

12 if 2 !=len(para):13 print('Two para -> [PDF directory and Key Word] is needed .' + '\n')14 continue

15

16 path =para[0]17 keyWord = para[1]18

19 if notos.path.exists(path):20 print('input path is not a exists path.' + '\n')21 continue

22

23 flg =True24 for char inkeyWord.strip():25 if char <= u'\u4e00' or char >= u'\u9fa5':26 flg =False27 break

28 if notflg:29 print('Please input the Chinese Key Word for search.(Such as \'物流\').' + '\n')30 continue

31

32 break

PostgreSQL数据库处理

根据ini文件定义的数据库连接信息,尝试连库;执行SQL文。

1 importpsycopg24 importtraceback5

6 defconnDB(_cfg):7 try:8 conn = psycopg2.connect(database=_cfg['servicename'],9 user=_cfg['dbuser'],10 password=_cfg['dbpw'],11 host=_cfg['host'],12 port=_cfg['port'])13 returnconn14 exceptException:15 printLog('E', 'Exception occur at DB Connection.' + '\n' +traceback.format_exc())16

17 defexecuteSql(_cfg, _sql):18 try:19 conn =connDB(_cfg)20 cur =conn.cursor()21 cur.execute(_sql)22

23 results =cur.fetchall()24 return list(map(lambdax: x[0], results))25 exceptException:26 printLog('E', 'Exception occur at Execute SQL.' + '\n' +traceback.format_exc())27 finally:28 cur.close()29 conn.rollback()30 conn.close()

日志处理

定义输出日志的级别;异常级别时,处理结束。

1 logging.basicConfig(filename='log_' + datetime.now().strftime('%Y%m%d') + '.txt',2 level=logging.INFO,3 format='%(asctime)s - %(levelname)s - %(message)s')4

5 logLevel = {'D': logging.DEBUG,6 'I': logging.INFO,7 'W': logging.WARNING,8 'E': logging.ERROR,9 'C': logging.CRITICAL}10

11 defprintLog(_lvl, _msg):12 logging.log(logLevel[_lvl], _msg)13 if logging.ERROR ==logLevel[_lvl]:14 print(_msg)15 exit()16

17

18 printLog('E', 'srcpath is not a exists path.')19 printLog('I', 'Get Src Path : %s' % srcPath)

MAP函数运用

列表元素批量处理,按第二个下划线字符截取字符串。

1 defgetPreOfNm(x):2 if 1 < x.count('_'):3 return x[0:x.find('_', x.find('_') + 1)]4 else:5 returnx6

7 #Get prefix of CRUD object name

8 prefixObjNm =list(set(map(getPreOfNm, lstTb)))9 prefixObjNm.sort()

目录处理

目录/文件判断;目录的路径分割;完整路径的文件名取得;

1 #Check the srcPath

2 fullFilePaths =[]3 ifos.path.isdir(srcPath):4 for folderName, subFolders, fileNames inos.walk(srcPath):5 if os.path.split(folderName)[1] in ['tcs', 'doc']: continue

6 for fn infileNames:7 #Get src file

8 mObj =fileNmReg.search(fn)9 ifmObj:10 fullFilePaths.append(os.path.join(folderName, fn))11 elifos.path.isfile(srcPath):12 #Get src file

13 fn =os.path.basename(os.path.realpath(srcPath))14 mObj =fileNmReg.search(fn)15 ifmObj:16 fullFilePaths.append(srcPath)

PDF文件读取

来源:https://www.cnblogs.com/alexzhang92/p/11488949.html

1 from pdfminer.converter importTextConverter2 from pdfminer.layout importLAParams3 from pdfminer.pdfinterp importPDFResourceManager, process_pdf4 importos5

6

7 defread_pdf(pdf):8 #resource manager

9 rsrcmgr =PDFResourceManager()10 retstr =StringIO()11 laparams =LAParams()12 #device

13 device = TextConverter(rsrcmgr, retstr, laparams=laparams)14 process_pdf(rsrcmgr, device, pdf)15 device.close()16 content =retstr.getvalue()17 retstr.close()18 #获取所有行

19 contents = str(content).split("\n")20

21 return contents

CSV文件导出

1 #Init result file

2 rstFile = open(os.path.join(srcPath, '[CRUD]' + datetime.now().strftime('%Y%m%d%H%M%S') + '.csv'), 'w', newline='')3 rstWtr = csv.writer(rstFile, delimiter='\t', lineterminator='\n')4 #Write head

5 rstWtr.writerow(['TYPE', 'CI', 'ENCODE', 'LINE NUM', 'CRUD', 'TABLE NM', 'FULL PATH'])

Excel文件读写

利用openpyxl读写xlsx,不支持xls;

获取工作簿、工作表、单元格(直接定位及相对位置);

单元格赋值。

1 importos, openpyxl2

3 #Init file path

4 srcPath = r'.\newDocs'

5

6 #Start Search

7 for folderName, subFolders, fileNames inos.walk(srcPath):8

9 for fileName infileNames:10

11 filePath =os.path.join(folderName, fileName)12

13 try:14 wb_WorkBook =openpyxl.load_workbook(filePath)15 exceptopenpyxl.utils.exceptions.InvalidFileException:16 print(fileName + '\t' + 'xls read failed')17 wb_WorkBook.close()18 continue

19

20 #_履歴

21 st_Rireki = wb_WorkBook['_履歴']22 if None isst_Rireki:23 print(fileName + '\t' + '_履歴 is not exist')24 continue

25

26 cl_Version = st_Rireki['AQ10']27 whileTrue:28 if None is cl_Version.value or\29 0 ==len(str(cl_Version.value).strip()):30 cl_Version.value = '1.0.2.0'

31

32 cl_JobNo = st_Rireki.cell(row=cl_Version.row - 1, column=cl_Version.column)33 cl_JobNo.value = '123'

34

35 wb_WorkBook.save(filePath)36

37 break

38

39 else:40 cl_Version = st_Rireki.cell(row=cl_Version.row + 5, column=cl_Version.column)41

42 wb_WorkBook.close()

转载请注明原文链接,谢谢。

原文链接:https://www.cnblogs.com/soulxj/p/12788250.html

  • 0
    点赞
  • 0
    评论
  • 0
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

表情包
插入表情
评论将由博主筛选后显示,对所有人可见 | 还能输入1000个字符
©️2021 CSDN 皮肤主题: 1024 设计师:白松林 返回首页
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值