python如何引用txt,使用python从txt文件中提取单词

最新推荐文章于 2023-04-13 12:52:55 发布

龚伟(William)

最新推荐文章于 2023-04-13 12:52:55 发布

阅读量568

点赞数

文章标签： python如何引用txt

I want to extract all the words that are between single quotation marks from a text file. The text file looks like this:

u'MMA': 10,

=u'acrylic'= : 19,

== u'acting lessons': 2,

=u'aerobic': 141,

=u'alto': 2= 4,

=u= 39;art therapy': 4,

=u'ballet': 939,

=u'ballroom'= ;: 234,

= =u'banjo': 38,

And ideally, my output would look lie this:

MMA,

acrylic,

acting lessons,

...

From browsing posts, it seems like I should use some combination of NLTK / regex for python to accomplish this. I've tried the following:

import re

file = open('artsplus_categories.txt', 'r').readlines()

for line in file:

list = re.search('^''$', file)

file.close()

And get the following error:

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 142, in search

return _compile(pattern, flags).search(string)

TypeError: expected string or buffer

I think the error might be caused by how I'm looking for the pattern. My logic is that I search for everything inside of the '....'.

What's tripping up re.py?

Thanks!

--------------------------------

Following Ashwini's comment:

import re

file = open('artsplus_categories.txt', 'r').readlines()

for line in file:

list = re.search('^''$', line)

print list

#file.close()

But the output contains nothing:

Samuel-Finegolds-MacBook-Pro:~ samuelfinegold$ /var/folders/jv/9_sy0bn10mbdft1bk9t14qz40000gn/T/Cleanup\ At\ Startup/artsplus_categories_clean-393952531.278.py.command ; exit;

None

logout

@Rasco: here's the error I'm getting:

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 177, in findall

return _compile(pattern, flags).findall(string)

TypeError: expected string or buffer

logout

I'm using this code:

file2 = open('artsplus_categories.txt', 'r').readlines()

list = re.findall("'[^']*'", file2)

for x in list:

print (x)

解决方案

Instead of passing the line to the regex you actually passed it the whole list(file). You should pass line to re.search not file.

for line in file:

lis = re.search('^''$', line) # line not file

Don't use list, file as variable names. They are built-in functions.

Update:

with open('artsplus_categories.txt') as f:

for line in f:

print re.search(r"'(.*)'", line).group(1)

...

MMA

acrylic

acting lessons

aerobic

alto

art therapy

ballet

ballroom

banjo

龚伟(William)

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python如何引用txt,使用python从txt文件中提取单词

I want to extract all the words that are between single quotation marks from a text file. The text file looks like this:u'MMA': 10,=u'acrylic'= : 19,== u'acting lessons': 2,=u'aerobic': 141,=u'alto': ...
复制链接

扫一扫