I need to loop in a list containing french words and find an asterisk because I want to concatenate the word before the asterisk and the word after the asterisk each time an asterisk appear and continue to the next.
For example, in the sequence:
['les','engage', '*', 'ment', 'de','la']
I want to concatenate 'engage' and 'ment' and the output (engagement) should be checked by a dictionary. If in the dictionary, append to a list.
With my code I only get the asterisk:
import nltk
from nltk.tokenize import word_tokenize
import re
with open ('text-test.txt') as tx:
text =word_tokenize(tx.read().lower())
with open ('Fr-dictionary.txt') as fr:
dic = word_tokenize(fr.read().lower())
ast=re.compile(r'[\*]+')
regex=list(filter(ast.match,text))
valid_words=[]
invalid_words=[]
last = None
for w in text:
if w in regex:
last=w
a=last + w[+1]
break
if a in dic:
valid_words.append(a)
else:
continue
解决方案
I wondered how to manage a list (nonsense) like this:
words = ['Bien', '*', 'venue', 'pour', 'les','engage', '*', 'ment', 'trop', 'de', 'YIELD', 'peut','être','contre', '*', 'productif' ]
So I came u with a method like this:
def join_asterisk(ary):
i, size = 0, len(ary)
while i < size-2:
if ary[i+1] == '*':
yield ary[i] + ary[i+2]
i+=2
else: yield ary[i]
i += 1
if i < size:
yield ary[i]
Which returns:
print(list(join_asterisk(words)))
#=> ['Bienvenue', 'pour', 'les', 'engagement', 'trop', 'de', 'YIELD', 'peut', 'être', 'contreproductif']