运行下列代码后,发现products新家的great列,只处理前100个数。
def great_count(word_count_vector):
if 'great' in word_count_vector:
return word_count_vector['great']
else:
return 0
products['great'] = products['word_count'].apply(great_count)
products['great'].show()
研究后发现,是apply的用法,当第二个参数dtype不设置时,就只计算前100个数,因此,使用时,需要加上dtype。如下
def great_count(word_count_vector):
if 'great' in word_count_vector:
return word_count_vector['great']
else:
return 0
products['great'] = products['word_count'].apply(great_count, int)
products['great'].show()