python 批量读取csv 文件到dataframe_将多个CSV文件读取到Python Pandas Dataframe中

The general use case behind the question is to read multiple CSV log files from a target directory into a single Python Pandas DataFrame for quick turnaround statistical analysis & charting. The idea for utilizing Pandas vs MySQL is to conduct this data import or append + stat analysis periodically throughout the day.

The script below attempts to read all of the CSV (same file layout) files into a single Pandas dataframe & adds a year column associated with each file read.

The problem with the script is it now only reads the very last file in the directory instead of the desired outcome being all files within the targeted directory.

# Assemble all of the data files into a single DataFrame & add a year field

# 2010 is the last available year

years = range(1880, 2011)

for year in years:

path ='C:\\Documents and Settings\\Foo\\My Documents\\pydata-book\\pydata-book-master`\\ch02\\names\\yob%d.txt' % year

frame = pd.read_csv(path, names=columns)

frame['year'] = year

pieces.append(frame)

# Concatenates everything into a single Dataframe

names = pd.concat(pieces, ignore_index=True)

# Expected row total should be 1690784

names

Int64Index: 33838 entries, 0 to 33837

Data columns:

name 33838 non-null values

sex 33838 non-null values

births 33838 non-null values

year 33838 non-null values

dtypes: int64(2), object(2)

# Start aggregating the data at the year & gender level using groupby or pivot

total_births = names.pivot_table('births', rows='year', cols='sex', aggfunc=sum)

# Prints pivot table

total_births.tail()

Out[35]:

sex F M

year

2010 1759010 1898382

解决方案

The append method on an instance of a DataFrame does not function the same as the append method on an instance of a list. Dataframe.append() does not occur in-place and instead returns a new object.

years = range(1880, 2011)

names = pd.DataFrame()

for year in years:

path ='C:\\Documents and Settings\\Foo\\My Documents\\pydata-book\\pydata-book-master`\\ch02\\names\\yob%d.txt' % year

frame = pd.read_csv(path, names=columns)

frame['year'] = year

names = names.append(frame, ignore_index=True)

or you can use concat:

years = range(1880, 2011)

names = pd.DataFrame()

for year in years:

path ='C:\\Documents and Settings\\Foo\\My Documents\\pydata-book\\pydata-book-master`\\ch02\\names\\yob%d.txt' % year

frame = pd.read_csv(path, names=columns)

frame['year'] = year

names = pd.concat(names, frame, ignore_index=True)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值