Python爬虫踩坑记录 _pickle.PicklingError: Can‘t pickle <class>

做大作业老师要求帮他们组运行一个爬虫程序,下载源码后在Anaconda里运行,发现了奇怪的报错。
在这里插入图片描述

Traceback (most recent call last):
  File "ccf_crawler.py", line 118, in <module>
    save_dblp_papers()
  File "ccf_crawler.py", line 102, in save_dblp_papers
    watcher_process.start()
  File "E:\jupter\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "E:\jupter\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "E:\jupter\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
  File "E:\jupter\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "E:\jupter\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'paper_crawler.crawler_manager.CrawlerManager'>: it's not the same object as paper_crawler.crawler_manager.CrawlerManager
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "E:\jupter\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "E:\jupter\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

是python pickle报错,pickle是一个简单的持久化功能,将对象序列化后以文件的形式保存。为什么在windows里运行不了呢?查阅资料,发现关于pickle和进程

原来这个爬虫是用进程编写的,而在windows系统中,进程使用socket对象,而socket对象是不可序列化的;在linux系统中,进程使用的是fork对象,因此可以被序列化。
所以解决方案有二:

  1. 重构代码,用线程而不是进程编写,对性能不会造成大的影响,这样就可以在windows上运行
  2. 使用linux系统运行即可

因此,打开了ubuntu虚拟机,配置环境,安装mongoDB,运行代码,成功!
在这里插入图片描述
没内存啦,唉
在这里插入图片描述

插播:Ubuntu 14.04系统安装MongoDB的报错
安装网络教程安装MongoDB,后出现报错

 mongod --dbpath data/db
Mon Dec  7 19:03:14.143 [initandlisten] MongoDB starting : pid=129519 port=27017 dbpath=data/db 64-bit host=ubuntu
Mon Dec  7 19:03:14.143 [initandlisten] db version v2.4.9
Mon Dec  7 19:03:14.143 [initandlisten] git version: nogitversion
Mon Dec  7 19:03:14.143 [initandlisten] build info: Linux orlo 3.2.0-58-generic #88-Ubuntu SMP Tue Dec 3 17:37:58 UTC 2013 x86_64 BOOST_LIB_VERSION=1_54
Mon Dec  7 19:03:14.143 [initandlisten] allocator: tcmalloc
Mon Dec  7 19:03:14.143 [initandlisten] options: { dbpath: "data/db" }
Mon Dec  7 19:03:14.149 [initandlisten] journal dir=data/db/journal
Mon Dec  7 19:03:14.149 [initandlisten] recover : no journal files present, no recovery needed
Mon Dec  7 19:03:14.149 [initandlisten] 
Mon Dec  7 19:03:14.149 [initandlisten] ERROR: Insufficient free space for journal files
Mon Dec  7 19:03:14.149 [initandlisten] Please make at least 3379MB available in data/db/journal or use --smallfiles
Mon Dec  7 19:03:14.149 [initandlisten] 
Mon Dec  7 19:03:14.150 [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating
Mon Dec  7 19:03:14.150 dbexit: 
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: going to close listening sockets...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: going to flush diaglog...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: going to close sockets...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: waiting for fs preallocator...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: lock for final commit...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: final commit...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: closing all files...
Mon Dec  7 19:03:14.150 [initandlisten] closeAllFiles() finished
Mon Dec  7 19:03:14.150 [initandlisten] journalCleanup...
Mon Dec  7 19:03:14.150 [initandlisten] removeJournalFiles
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: removing fs lock...
Mon Dec  7 19:03:14.150 dbexit: really exiting now

是因为内存不够了啊,orz,没办法,虚拟机太小了,只能使用替代命令–smallfiles运行了
注意空格之类的格式(这几天空格踩雷好多,疲惫)(要注意细节啊),
运行命令

 mongod --dbpath data/db --smallfiles


成功!

Mon Dec  7 19:03:59.144 [initandlisten] MongoDB starting : pid=129886 port=27017 dbpath=data/db 64-bit host=ubuntu
Mon Dec  7 19:03:59.145 [initandlisten] db version v2.4.9
Mon Dec  7 19:03:59.145 [initandlisten] git version: nogitversion
Mon Dec  7 19:03:59.145 [initandlisten] build info: Linux orlo 3.2.0-58-generic #88-Ubuntu SMP Tue Dec 3 17:37:58 UTC 2013 x86_64 BOOST_LIB_VERSION=1_54
Mon Dec  7 19:03:59.145 [initandlisten] allocator: tcmalloc
Mon Dec  7 19:03:59.145 [initandlisten] options: { dbpath: "data/db", smallfiles: true }
Mon Dec  7 19:03:59.148 [initandlisten] journal dir=data/db/journal
Mon Dec  7 19:03:59.148 [initandlisten] recover : no journal files present, no recovery needed
Mon Dec  7 19:03:59.215 [FileAllocator] allocating new datafile data/db/local.ns, filling with zeroes...
Mon Dec  7 19:03:59.215 [FileAllocator] creating directory data/db/_tmp
Mon Dec  7 19:03:59.217 [FileAllocator] done allocating datafile data/db/local.ns, size: 16MB,  took 0 secs
Mon Dec  7 19:03:59.218 [FileAllocator] allocating new datafile data/db/local.0, filling with zeroes...
Mon Dec  7 19:03:59.219 [FileAllocator] done allocating datafile data/db/local.0, size: 16MB,  took 0 secs
Mon Dec  7 19:03:59.223 [initandlisten] waiting for connections on port 27017
Mon Dec  7 19:03:59.225 [websvr] admin web console waiting for connections on port 28017

Mon Dec  7 19:07:59.266 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 8ms
Mon Dec  7 19:48:17.641 [PeriodicTask::Runner] task: WriteBackManager::cleaner took: 5ms


  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
这个错误是因为在使用 Python 多进程时,需要将传递的对象序列化,而 rockx.RockX.Object 这个对象无法被序列化。解决办法是将这个对象转换成可以序列化的类型,比如字典或者元组,然后再传递。你可以参考下面的示例代码: ```python import rockx # 将 RockX.Object 对象转换成字典 def obj_to_dict(obj): return { 'class_name': obj.__class__.__name__, 'params': obj.__dict__ } # 将字典转换成 RockX.Object 对象 def dict_to_obj(d): if d['class_name'] == 'Object': obj = rockx.RockX.Object() obj.__dict__.update(d['params']) return obj return d # 在多进程中使用转换后的对象 if __name__ == '__main__': import multiprocessing as mp # 创建 RockX.Object 对象 obj = rockx.RockX.Object() # 将对象转换成字典 obj_dict = obj_to_dict(obj) # 使用 multiprocessing.Process 创建进程 p = mp.Process(target=worker, args=(obj_dict,)) # 启动进程 p.start() # 等待进程结束 p.join() # 在进程中将字典转换成 RockX.Object 对象 def worker(obj_dict): # 将字典转换成 RockX.Object 对象 obj = dict_to_obj(obj_dict) # 在进程中使用对象 ... ``` 在这个示例代码中,我们定义了两个函数 obj_to_dict 和 dict_to_obj,用于将 RockX.Object 对象转换成字典和将字典转换成 RockX.Object 对象。在主进程中,我们先将 RockX.Object 对象转换成字典,然后将字典传递给子进程。在子进程中,我们再将字典转换成 RockX.Object 对象,然后在进程中使用这个对象。这样就可以避免 "Can't pickle <class 'rockx.RockX.Object'>: attribute lookup Object on rockx.RockX failed" 这个错误了。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值