python pandas读取txt文件_Python Pandas 通过读取txt文件内容创建DataFrame

1、示例txt文件内容Alabama[edit]

Auburn (Auburn University)[1]

Florence (University of North Alabama)

Jacksonville (Jacksonville State University)[2]

Livingston (University of West Alabama)[2]

Montevallo (University of Montevallo)[2]

Troy (Troy University)[2]

Tuscaloosa (University of Alabama, Stillman College, Shelton State)[3][4]

Tuskegee (Tuskegee University)[5]

Alaska[edit]

Fairbanks (University of Alaska Fairbanks)[2]

Arizona[edit]

Flagstaff (Northern Arizona University)[6]

Tempe (Arizona State University)

Tucson (University of Arizona)

Arkansas[edit]

2、通过pd.DataFrame.from_records()创建DataFrameimport pandas as pd

from collections import namedtuple

Item = namedtuple('Item', 'state area')

items = []

with open('unis.txt') as f:

for line in f:

l = line.rstrip('\n')

if l.endswith('[edit]'):

state = l.rstrip('[edit]')

else:

i = l.index(' (')

area = l[:i]

items.append(Item(state, area))

df = pd.DataFrame.from_records(items, columns=['State', 'Area'])

print df

输出:

State Area

0 Alabama Auburn

1 Alabama Florence

2 Alabama Jacksonville

3 Alabama Livingston

4 Alabama Montevallo

5 Alabama Troy

6 Alabama Tuscaloosa

7 Alabama Tuskegee

8 Alaska Fairbanks

9 Arizona Flagstaff

10 Arizona Tempe

11 Arizona Tucson

3、通过pd.read_csv()读取创建DataFramedf = pd.read_csv('filename.txt', sep=";", names=['Region Name'])

df.insert(0, 'State', df['Region Name'].str.extract('(.*)\[edit\]', expand=False).ffill())

df['Region Name'] = df['Region Name'].str.replace(r' \(.+$', '')

df = df[~df['Region Name'].str.contains('\[edit\]')].reset_index(drop=True)

print (df)

输出:

State Region Name

0 Alabama Auburn

1 Alabama Florence

2 Alabama Jacksonville

3 Alabama Livingston

4 Alabama Montevallo

5 Alabama Troy

6 Alabama Tuscaloosa

7 Alabama Tuskegee

8 Alaska Fairbanks

9 Arizona Flagstaff

10 Arizona Tempe

11 Arizona Tucson

或者:df = pd.read_csv('filename.txt', sep=";", names=['Region Name'])

df.insert(0, 'State', df['Region Name'].str.extract('(.*)\[edit\]', expand=False).ffill())

df = df[~df['Region Name'].str.contains('\[edit\]')].reset_index(drop=True)

print (df)

输出:

State Region Name

0 Alabama Auburn (Auburn University)[1]

1 Alabama Florence (University of North Alabama)

2 Alabama Jacksonville (Jacksonville State University)[2]

3 Alabama Livingston (University of West Alabama)[2]

4 Alabama Montevallo (University of Montevallo)[2]

5 Alabama Troy (Troy University)[2]

6 Alabama Tuscaloosa (University of Alabama, Stillman Co...

7 Alabama Tuskegee (Tuskegee University)[5]

8 Alaska Fairbanks (University of Alaska Fairbanks)[2]

9 Arizona Flagstaff (Northern Arizona University)[6]

10 Arizona Tempe (Arizona State University)

11 Arizona Tucson (University of Arizona)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值