python如何引用txt,使用python从txt文件中提取单词

I want to extract all the words that are between single quotation marks from a text file. The text file looks like this:

u'MMA': 10,

=u'acrylic'= : 19,

== u'acting lessons': 2,

=u'aerobic': 141,

=u'alto': 2= 4,

=u= 39;art therapy': 4,

=u'ballet': 939,

=u'ballroom'= ;: 234,

= =u'banjo': 38,

And ideally, my output would look lie this:

MMA,

acrylic,

acting lessons,

...

From browsing posts, it seems like I should use some combination of NLTK / regex for python to accomplish this. I've tried the following:

import re

file = open('artsplus_categories.txt', 'r').readlines()

for line in file:

list = re.search('^''$', file)

file.close()

And get the following error:

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 142, in search

return _compile(pattern, flags).search(string)

TypeError: expected string or buffer

I think the error might be caused by how I'm looking for the pattern. My logic is that I search for everything inside of the '....'.

What's tripping up re.py?

Thanks!

--------------------------------

Following Ashwini's comment:

import re

file = open('artsplus_categories.txt', 'r').readlines()

for line in file:

list = re.search('^''$', line)

print list

#file.close()

But the output contains nothing:

Samuel-Finegolds-MacBook-Pro:~ samuelfinegold$ /var/folders/jv/9_sy0bn10mbdft1bk9t14qz40000gn/T/Cleanup\ At\ Startup/artsplus_categories_clean-393952531.278.py.command ; exit;

None

logout

@Rasco: here's the error I'm getting:

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 177, in findall

return _compile(pattern, flags).findall(string)

TypeError: expected string or buffer

logout

I'm using this code:

file2 = open('artsplus_categories.txt', 'r').readlines()

list = re.findall("'[^']*'", file2)

for x in list:

print (x)

解决方案

Instead of passing the line to the regex you actually passed it the whole list(file). You should pass line to re.search not file.

for line in file:

lis = re.search('^''$', line) # line not file

Don't use list, file as variable names. They are built-in functions.

Update:

with open('artsplus_categories.txt') as f:

for line in f:

print re.search(r"'(.*)'", line).group(1)

...

MMA

acrylic

acting lessons

aerobic

alto

art therapy

ballet

ballroom

banjo

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值