python循环速度提高_如何提高循环中的python速度？

最新推荐文章于 2024-05-17 12:03:08 发布

weixin_39955953

最新推荐文章于 2024-05-17 12:03:08 发布

阅读量590

点赞数

文章标签： python循环速度提高

I have a dataset of 370k records stored in a Pandas Dataframe which needs to be integrated. I tried multiprocessing, threading, Cpython and loop unrolling. But I was not successful and the time shown to compute was 22 hrs. The task is as follows:

%matplotlib inline

from numba import jit, autojit

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

with open('data/full_text.txt', encoding = "ISO-8859-1") as f:

strdata=f.readlines()

data=[]

for string in strdata:

data.append(string.split('\t'))

df=pd.DataFrame(data,columns=["uname","date","UT","lat","long","msg"])

df=df.drop('UT',axis=1)

df[['lat','long']] = df[['lat','long']].apply(pd.to_numeric)

from textblob import TextBlob

from tqdm import tqdm

df['polarity']=np.zeros(len(df))

Threading:

from queue import Queue

from threading import Thread

import logging

logging.basicConfig(

level=logging.DEBUG,

format='(%(threadName)-10s) %(message)s',

)

class DownloadWorker(Thread):

def __init__(self, queue):

Thread.__init__(self)

self.queue = queue

def run(self):

while True:

# Get the work from the queue and expand the tuple

lowIndex, highIndex = self.queue.get()

a = range(lowIndex,highIndex-1)

for i in a:

df['polarity'][i]=TextBlob(df['msg'][i]).sentiment.polarity

self.queue.task_done()

def main():

# Create a queue to communicate with the worker threads

queue = Queue()

# Create 8 worker threads

for x in range(8):

worker = DownloadWorker(queue)

worker.daemon = True

worker.start()

# Put the tasks into the queue as a tuple

for i in tqdm(range(0,len(df)-1,62936)):

logging.debug('Queueing')

queue.put((i,i+62936 ))

queue.join()

print('Took {}'.format(time() - ts))

main()

Multiprocessing with loop unrolling:

pool = multiprocessing.Pool(processes=2)

r = pool.map(assign_polarity, df)

pool.close()

def assign_polarity(df):

a=range(0,len(df),5)

for i in tqdm(a):

df['polarity'][i]=TextBlob(df['msg'][i]).sentiment.polarity

df['polarity'][i+1]=TextBlob(df['msg'][i+1]).sentiment.polarity

df['polarity'][i+2]=TextBlob(df['msg'][i+2]).sentiment.polarity

df['polarity'][i+3]=TextBlob(df['msg'][i+3]).sentiment.polarity

df['polarity'][i+4]=TextBlob(df['msg'][i+4]).sentiment.polarity

How to increase the speed of computation? or storing the computation in dataframe in a faster way? My laptop configuration

Ram: 8GB

Physical cores: 2

Logical cores: 8

Windows 10

Implementing Multiprocessing gave me a higher computation time.

Threading was being executed sequentially (I think because of GIL)

Loop Unrolling gave me the same computation speed.

Cpython was giving me errors while importing libraries.

解决方案

ASD -- I noticed that storing something in a df iteratively is VERY slow. I'd try to store your TextBlobs in a list (or another structure) and then converting that list into a column of a df.

weixin_39955953

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python循环速度提高_如何提高循环中的python速度？

I have a dataset of 370k records stored in a Pandas Dataframe which needs to be integrated. I tried multiprocessing, threading, Cpython and loop unrolling. But I was not successful and the time shown ...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。