dask分布式出现TypeError: can't pickle _thread.lock objects错误解决办法

最新推荐文章于 2023-09-17 23:17:08 发布

守云开见月明

最新推荐文章于 2023-09-17 23:17:08 发布

阅读量1.2k

点赞数 1

分类专栏： python工具使用文章标签： python dask

本文链接：https://blog.csdn.net/qq_16963597/article/details/102891430

版权

python工具使用专栏收录该内容

8 篇文章 2 订阅

订阅专栏

看了好多博客，这个问题都没有讲清本质，看得我一知半解，下面是出错代码，一个简单地单词计数。

#dask 利用hdfs单词计数
from hdfs import Client
from distributed import Client as Cl
from collections import defaultdict

hdfs = Client('http://192.168.175.139:9870')       #连接hdfs
print(hdfs.list('/'))
client = Cl('172.26.244.71:8786')   #连接到dask分布式
print(client.ncores)                #查看dask分布式运算资源情况

filenames = hdfs.list('/test/input1')
print(filenames)

def count_words(fn):
    fn = '/test/input1/' + fn
    word_counts = defaultdict(int)
    with hdfs.read(fn) as f:
        for line in f.readlines():
            for word in line.split():
                word_counts[word] += 1
    return word_counts
# counts = count_words(filenames[0])
# print(counts)

future = client.submit(count_words,filenames[0])
counts = future.result()
当程序中出现TypeError: can't pickle _thread.lock objects错误的时候，在我这里，是我运用dask分布式，通俗的原因就是分布式连接局域网中的HDFS文件系统的时候，我们把HDFS当成了一个全局变量，其实不然，分布式中其他主机并不知道HDFS代表什么，也连接不上远程数据库，所以应当把hdfs连接局域网中的数据库写到函数里面，分布到集群中的任意机器，问题得以解决。这个问题看了两天，原因是对python函数理解不够，谨记！！！
更改后的代码：

#dask 利用hdfs单词计数
from hdfs import Client
from distributed import Client as Cl
from collections import defaultdict

hdfs = Client('http://192.168.175.139:9870')       #连接hdfs
print(hdfs.list('/'))
client = Cl('172.26.244.71:8786')   #连接到dask分布式
print(client.ncores)                #查看dask分布式运算资源情况

filenames = hdfs.list('/test/input1')
print(filenames)

def count_words(fn):
    hdfs = Client('http://192.168.175.139:9870')  # 连接hdfs
    fn = '/test/input1/' + fn
    word_counts = defaultdict(int)
    with hdfs.read(fn) as f:
        for line in f.readlines():
            for word in line.split():
                word_counts[word] += 1
    return word_counts
# counts = count_words(filenames[0])
# print(counts)

future = client.submit(count_words,filenames[0])
counts = future.result()
print(counts)

守云开见月明

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
dask分布式出现TypeError: can't pickle _thread.lock objects错误解决办法

看了好多博客，这个问题都没有讲清本质，看得我一知半解，下面是出错代码，一个简单地单词计数。#dask 利用hdfs单词计数from hdfs import Clientfrom distributed import Client as Clfrom collections import defaultdicthdfs = Client('http://192.168.175.139:9...
复制链接

扫一扫

专栏目录