python股票直方图代码_python中最有效的直方图代码

I've seen a number of questions on making histograms in clean one-liners, but I haven't yet found anyone trying to make them as efficiently as possible. I'm currently creating a lot of tfidf vectors for a search algorithm, and this involves creating a number of histograms and my current code, while being very short and readable is not as fast as I would like. Sadly, I've tried a number of other methods that turned out far slower. Can you do it faster? cleanStringVector is a list of strings (all lowercase, no punctuation), and masterWordList is also a list of words that should contain every word within the cleanStringVector.

from collections import Counter

def tfidfVector(cleanStringVector, masterWordList):

frequencyHistogram = Counter(cleanStringVector)

featureVector = [frequencyHistogram[word] for word in masterWordList]

return featureVector

Worth noting that the fact that the Counter object returns a zero for non-existent keys instead of raising a KeyError is a serious plus and most of the histogram methods in other questions fail this test.

Example: If I have the following data:

["apple", "orange", "tomato", "apple", "apple"]

["tomato", "tomato", "orange"]

["apple", "apple", "apple", "cucumber"]

["tomato", "orange", "apple", "apple", "tomato", "orange"]

["orange", "cucumber", "orange", "cucumber", "tomato"]

And a master wordlist of:

["apple", "orange", "tomato", "cucumber"]

I would like a return of the following from each test case respectively:

[3, 1, 1, 0]

[0, 1, 2, 0]

[3, 0, 0, 1]

[2, 2, 2, 0]

[0, 2, 1, 2]

I hope that helps.

Approximate final results:

Original Method: 3.213

OrderedDict: 5.529

UnorderedDict: 0.190

解决方案

This improves the runtime in my unrepresentative micro benchmark by 1 order of magnitude with Python 3:

mapping = dict((w, i) for i, w in enumerate(masterWordList))

def tfidfVector(cleanStringVector, masterWordList):

featureVector = [0] * len(masterWordList)

for w in cleanStringVector:

featureVector[mapping[w]] += 1

return featureVector

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值