python怎么批量读取文件_Python3自动化_文件批量处理(文本、PDF、Excel；读取、筛选、导出)...

最新推荐文章于 2023-08-30 18:43:38 发布

weixin_39714763

最新推荐文章于 2023-08-30 18:43:38 发布

阅读量185

点赞数

文章标签： python怎么批量读取文件

利用Python3脚本语言的简练语法，高级语言的丰富类库，快速写了几个文件读取、筛选、导出的“脚本”。

这里简单总结一下关键功能。

读取ini配置文件

检查ini文件是否存在；检查输入的key在ini文件里是否有定义。

1 importconfigparser2

4 defgetConfigInfo(_ini_nm):5

6 #Open Ini File

7 config =configparser.ConfigParser()8 if not config.read(os.path.join(os.getcwd(), _ini_nm + r'.ini')):9 printLog('E', 'Read Ini file fail.')10

11 whileTrue:12 sysCode = input(r'Please input the system code : (Press [Enter] to quit):').strip()13 if 0 ==len(sysCode.strip()):14 exit()15

16 #Init ConnectionSettings

17 if sysCode inconfig.sections():18 returndict(config[sysCode])19 else:20 print('Ini info of System [%s] is blank.\n' % sysCode)

多参数输入的获取

检查参数个数；检查参数合法性（长度，是否目录）；检查参数是否整个都是汉字。

1 def_main():2

3 path = ''

4 keyWord = ''

6 whileTrue:7 para = input(r'Please input the PDF directory and Key Word: (Press [Enter] to quit):').strip().split()8

9 if 0 ==len(para):10 exit()11

12 if 2 !=len(para):13 print('Two para -> [PDF directory and Key Word] is needed .' + '\n')14 continue

16 path =para[0]17 keyWord = para[1]18

19 if notos.path.exists(path):20 print('input path is not a exists path.' + '\n')21 continue

23 flg =True24 for char inkeyWord.strip():25 if char <= u'\u4e00' or char >= u'\u9fa5':26 flg =False27 break

28 if notflg:29 print('Please input the Chinese Key Word for search.(Such as \'物流\').' + '\n')30 continue

32 break

PostgreSQL数据库处理

根据ini文件定义的数据库连接信息，尝试连库；执行SQL文。

1 importpsycopg24 importtraceback5

6 defconnDB(_cfg):7 try:8 conn = psycopg2.connect(database=_cfg['servicename'],9 user=_cfg['dbuser'],10 password=_cfg['dbpw'],11 host=_cfg['host'],12 port=_cfg['port'])13 returnconn14 exceptException:15 printLog('E', 'Exception occur at DB Connection.' + '\n' +traceback.format_exc())16

17 defexecuteSql(_cfg, _sql):18 try:19 conn =connDB(_cfg)20 cur =conn.cursor()21 cur.execute(_sql)22

23 results =cur.fetchall()24 return list(map(lambdax: x[0], results))25 exceptException:26 printLog('E', 'Exception occur at Execute SQL.' + '\n' +traceback.format_exc())27 finally:28 cur.close()29 conn.rollback()30 conn.close()

日志处理

定义输出日志的级别；异常级别时，处理结束。

1 logging.basicConfig(filename='log_' + datetime.now().strftime('%Y%m%d') + '.txt',2 level=logging.INFO,3 format='%(asctime)s - %(levelname)s - %(message)s')4

5 logLevel = {'D': logging.DEBUG,6 'I': logging.INFO,7 'W': logging.WARNING,8 'E': logging.ERROR,9 'C': logging.CRITICAL}10

11 defprintLog(_lvl, _msg):12 logging.log(logLevel[_lvl], _msg)13 if logging.ERROR ==logLevel[_lvl]:14 print(_msg)15 exit()16

18 printLog('E', 'srcpath is not a exists path.')19 printLog('I', 'Get Src Path : %s' % srcPath)

MAP函数运用

列表元素批量处理，按第二个下划线字符截取字符串。

1 defgetPreOfNm(x):2 if 1 < x.count('_'):3 return x[0:x.find('_', x.find('_') + 1)]4 else:5 returnx6

7 #Get prefix of CRUD object name

8 prefixObjNm =list(set(map(getPreOfNm, lstTb)))9 prefixObjNm.sort()

目录处理

目录/文件判断；目录的路径分割；完整路径的文件名取得；

1 #Check the srcPath

2 fullFilePaths =[]3 ifos.path.isdir(srcPath):4 for folderName, subFolders, fileNames inos.walk(srcPath):5 if os.path.split(folderName)[1] in ['tcs', 'doc']: continue

6 for fn infileNames:7 #Get src file

8 mObj =fileNmReg.search(fn)9 ifmObj:10 fullFilePaths.append(os.path.join(folderName, fn))11 elifos.path.isfile(srcPath):12 #Get src file

13 fn =os.path.basename(os.path.realpath(srcPath))14 mObj =fileNmReg.search(fn)15 ifmObj:16 fullFilePaths.append(srcPath)

PDF文件读取

来源：https://www.cnblogs.com/alexzhang92/p/11488949.html

1 from pdfminer.converter importTextConverter2 from pdfminer.layout importLAParams3 from pdfminer.pdfinterp importPDFResourceManager, process_pdf4 importos5

7 defread_pdf(pdf):8 #resource manager

9 rsrcmgr =PDFResourceManager()10 retstr =StringIO()11 laparams =LAParams()12 #device

13 device = TextConverter(rsrcmgr, retstr, laparams=laparams)14 process_pdf(rsrcmgr, device, pdf)15 device.close()16 content =retstr.getvalue()17 retstr.close()18 #获取所有行

19 contents = str(content).split("\n")20

21 return contents

CSV文件导出

1 #Init result file

2 rstFile = open(os.path.join(srcPath, '[CRUD]' + datetime.now().strftime('%Y%m%d%H%M%S') + '.csv'), 'w', newline='')3 rstWtr = csv.writer(rstFile, delimiter='\t', lineterminator='\n')4 #Write head

5 rstWtr.writerow(['TYPE', 'CI', 'ENCODE', 'LINE NUM', 'CRUD', 'TABLE NM', 'FULL PATH'])

Excel文件读写

利用openpyxl读写xlsx，不支持xls；

获取工作簿、工作表、单元格（直接定位及相对位置）；

单元格赋值。

1 importos, openpyxl2

3 #Init file path

4 srcPath = r'.\newDocs'

6 #Start Search

7 for folderName, subFolders, fileNames inos.walk(srcPath):8

9 for fileName infileNames:10

11 filePath =os.path.join(folderName, fileName)12

13 try:14 wb_WorkBook =openpyxl.load_workbook(filePath)15 exceptopenpyxl.utils.exceptions.InvalidFileException:16 print(fileName + '\t' + 'xls read failed')17 wb_WorkBook.close()18 continue

20 #_履歴

21 st_Rireki = wb_WorkBook['_履歴']22 if None isst_Rireki:23 print(fileName + '\t' + '_履歴　is not exist')24 continue

26 cl_Version = st_Rireki['AQ10']27 whileTrue:28 if None is cl_Version.value or\29 0 ==len(str(cl_Version.value).strip()):30 cl_Version.value = '1.0.2.0'

32 cl_JobNo = st_Rireki.cell(row=cl_Version.row - 1, column=cl_Version.column)33 cl_JobNo.value = '123'

35 wb_WorkBook.save(filePath)36

37 break

39 else:40 cl_Version = st_Rireki.cell(row=cl_Version.row + 5, column=cl_Version.column)41

42 wb_WorkBook.close()

转载请注明原文链接，谢谢。

原文链接:https://www.cnblogs.com/soulxj/p/12788250.html