利用Python批量读取文件

近日,从欧盟统计局下载了一些数据,下载的数据表中的指标、地区等都是用的编码,而编码具体代表的含义则需要参照对应的字典文件。然而字典文件有500多个,为了查看全部字典文件的信息,于是想用Python把这些字典文件合并成一个文件。
在这里插入图片描述
在这里插入图片描述
基本思路是利用os.listdir读取文件夹下的文件列表,然后依次读取并追加每个文件的内容,最终保存成csv文件,具体代码如下。

import pandas as pd
import numpy as np
import os

path='data/EUROSTAT/all_dic/'  ##所在文件夹目录

dicnum=[]  ##记录文件编号
dicname=[] ##记录文件名称
code=[]    ##记录每个文件中编码字段
detail=[]  ##记录每个文件中名称字段
i=0
j=0   
for name in os.listdir(r'data/EUROSTAT/all_dic'):  ##利用os.listdir读取文件夹下的文件列表
    i+=1
    file=path+name   
    print(i,file)   ##打印文件的编号与文件名
    temp=pd.read_csv(file,encoding='utf-8',sep=' ',delimiter="\t")   
    print(len(temp))   ##打印每个文件的记录数
    j+=len(temp)
    dicnum.extend([i]*len(temp))
    dicname.extend([name]*len(temp))
    code.extend(temp.ix[:,0].values)
    detail.extend(temp.ix[:,1].values)
#print(code)
print(j)

data=pd.DataFrame()
#print (data)
data["dicnum"]=dicnum
data["dicname"]=dicname
data["code"]=code
data['detail']=detail
print(data.head())
data.to_csv('data/EUROSTAT/EuroStat_dic.csv',encoding='utf-8',index=1)

在这里插入图片描述

ps:初衷是通过撰写博文记录自己所学所用,实现知识的梳理与积累;将其分享,希望能够帮到面临同样困惑的小伙伴儿。如发现博文中存在问题,欢迎随时交流~~

% Known encoding formats are the following FDSN codes: % 0: ASCII % 1: 16-bit integer % 2: 24-bit integer (untested) % 3: 32-bit integer % 4: IEEE float32 % 5: IEEE float64 % 10: Steim-1 % 11: Steim-2 % 12: GEOSCOPE 24-bit (untested) % 13: GEOSCOPE 16/3-bit gain ranged % 14: GEOSCOPE 16/4-bit gain ranged (untested) % 19: Steim-3 (alpha and untested) % % See also MKMSEED to export data in miniSEED format. % % % Author: Franois Beauducel % Institut de Physique du Globe de Paris % Created: 2010-09-17 % Updated: 2012-04-21 % % Acknowledgments: % Ljupco Jordanovski, Jean-Marie Saurel, Mohamed Boubacar, Jonathan Berger, % Shahid Ullah. % % References: % IRIS (2010), SEED Reference Manual: SEED Format Version 2.4, May 2010, % IFDSN/IRIS/USGS, http://www.iris.edu % Trabant C. (2010), libmseed: the Mini-SEED library, IRIS DMC. % Steim J.M. (1994), 'Steim' Compression, Quanterra Inc. % History: % [2012-04-21] % - Correct bug with Steim + little-endian coding % (thanks to Shahid Ullah) % [2012-03-21] % - Adds IDs for warning messages % [2011-11-10] % - Correct bug with multiple channel name length (thanks to % Jonathan Berger) % [2011-10-27] % - Add LocationIdentifier to X.ChannelFullName % [2011-10-24] % - Validation of IEEE double encoding (with PQL) % - Import/plot data even with file integrity problem (like PQL) % [2011-07-21] % - Validation of ASCII encoding format (logs) % - Blockettes are now stored in substructures below a single % field X.BLOCKETTES % - Add import of blockettes 500 and 2000 % - Accept multi-channel files with various data coding % [2010-10-16] % - Alpha-version of Steim-3 decoding... % - Extend output parameters with channel detection % - Add gaps and overlaps on plots % - Add possibility to force the plot % [2010-10-02] % - Add the input formats for GEOSCOPE multiplexed old data files % - Additional output argument with gap and overlap analysis % - Create a plot when no output argument are specified % - Optimize script coding (30 times faster STEIM decoding!) % % [2010-09-28] % - Correction of a problem with STEIM-1 nibble 3 decoding (one % 32-bit difference) % - Add reading of files without blockette 1000 with additional % input arguments (like Seismic Handler output files). % - Uses warning() function instead of fprintf().
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值