I'm working on creating a word cloud program in Python and I'm getting stuck on a word replace function. I am trying to replace a set of numbers in an html file (so I'm working with a string) with words from an ordered list. So 000 would be replaced with the first word in the list, 001 with the second, etc.
So below I have it selecting the word to replace w properly but I can't get it to properly replace the it with the words from the string. Any help is appreciated. Thanks!
def replace_all():
text = '000 001 002 003 '
word = ['foo', 'bar', 'that', 'these']
for a in word:
y = -1
for w in text:
y = y + 1
x = "00"+str(y)
w = {x:a}
for i, j in w.iteritems():
text = text.replace(i, j)
print text
解决方案
This is actually a really simple list comprehension:
>>> text = '000 001 002 003 '
>>> words = ['foo', 'bar', 'that', 'these']
>>> [words[int(item)] for item in text.split()]
['foo', 'bar', 'that', 'these']
Edit: If you need other values to be left alone, this can be catered for:
def get(seq, item):
try:
return seq[int(item)]
except ValueError:
return item
Then simply use something like [get(words, item) for item in text.split()] - naturally, more testing might be required in get() if there will be other numbers in the string that could get accidentally replaced. (End of edit)
What we do is split the text into the individual numbers, then convert them to integers and use them to index the list you have given to find words.
As to why your code doesn't work, the main issue is you are looping over the string, which will give you characters, not words. However, it's not a great way of solving the task.
It's also worth a quick note that when you are looping over values and want indices to go with them, you should use the enumerate() builtin rather than using a counting variable.
E.g: Instead of:
y = -1
for w in text:
y = y + 1
...
Use:
for y, w in enumerate(text):
...
This is much more readable and Pythonic.
Another thing with your existing code is this:
w = {x:a}
for i, j in w.iteritems():
text = text.replace(i, j)
Which, if you think about it, simplifies down to:
text = text.replace(x, a)
You are setting w to be a dictionary of one item, then looping over it, but you know it will only ever contain one item.
A solution that follows your method more closely would be something like this:
words_dict = {"{0:03d}".format(index): value for index, value in enumerate(words)}
for key, value in words_dict.items():
text = test.replace(key, value)
We create a dictionary from the zero padded number string (using str.format()) to the value, then replace for each item. Note as you are using 2.x, you'll want dict.iteritems(), and if you are pre-2.7, use the dict() builtin on a generator of tuples as dict comprehensions don't exist.