Python爬虫踩坑记录 _pickle.PicklingError: Can‘t pickle <class>

做大作业老师要求帮他们组运行一个爬虫程序,下载源码后在Anaconda里运行,发现了奇怪的报错。
在这里插入图片描述

Traceback (most recent call last):
  File "ccf_crawler.py", line 118, in <module>
    save_dblp_papers()
  File "ccf_crawler.py", line 102, in save_dblp_papers
    watcher_process.start()
  File "E:\jupter\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "E:\jupter\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "E:\jupter\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
  File "E:\jupter\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "E:\jupter\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'paper_crawler.crawler_manager.CrawlerManager'>: it's not the same object as paper_crawler.crawler_manager.CrawlerManager
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "E:\jupter\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "E:\jupter\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

是python pickle报错,pickle是一个简单的持久化功能,将对象序列化后以文件的形式保存。为什么在windows里运行不了呢?查阅资料,发现关于pickle和进程

原来这个爬虫是用进程编写的,而在windows系统中,进程使用socket对象,而socket对象是不可序列化的;在linux系统中,进程使用的是fork对象,因此可以被序列化。
所以解决方案有二:

  1. 重构代码,用线程而不是进程编写,对性能不会造成大的影响,这样就可以在windows上运行
  2. 使用linux系统运行即可

因此,打开了ubuntu虚拟机,配置环境,安装mongoDB,运行代码,成功!
在这里插入图片描述
没内存啦,唉
在这里插入图片描述

插播:Ubuntu 14.04系统安装MongoDB的报错
安装网络教程安装MongoDB,后出现报错

 mongod --dbpath data/db
Mon Dec  7 19:03:14.143 [initandlisten] MongoDB starting : pid=129519 port=27017 dbpath=data/db 64-bit host=ubuntu
Mon Dec  7 19:03:14.143 [initandlisten] db version v2.4.9
Mon Dec  7 19:03:14.143 [initandlisten] git version: nogitversion
Mon Dec  7 19:03:14.143 [initandlisten] build info: Linux orlo 3.2.0-58-generic #88-Ubuntu SMP Tue Dec 3 17:37:58 UTC 2013 x86_64 BOOST_LIB_VERSION=1_54
Mon Dec  7 19:03:14.143 [initandlisten] allocator: tcmalloc
Mon Dec  7 19:03:14.143 [initandlisten] options: { dbpath: "data/db" }
Mon Dec  7 19:03:14.149 [initandlisten] journal dir=data/db/journal
Mon Dec  7 19:03:14.149 [initandlisten] recover : no journal files present, no recovery needed
Mon Dec  7 19:03:14.149 [initandlisten] 
Mon Dec  7 19:03:14.149 [initandlisten] ERROR: Insufficient free space for journal files
Mon Dec  7 19:03:14.149 [initandlisten] Please make at least 3379MB available in data/db/journal or use --smallfiles
Mon Dec  7 19:03:14.149 [initandlisten] 
Mon Dec  7 19:03:14.150 [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating
Mon Dec  7 19:03:14.150 dbexit: 
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: going to close listening sockets...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: going to flush diaglog...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: going to close sockets...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: waiting for fs preallocator...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: lock for final commit...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: final commit...
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: closing all files...
Mon Dec  7 19:03:14.150 [initandlisten] closeAllFiles() finished
Mon Dec  7 19:03:14.150 [initandlisten] journalCleanup...
Mon Dec  7 19:03:14.150 [initandlisten] removeJournalFiles
Mon Dec  7 19:03:14.150 [initandlisten] shutdown: removing fs lock...
Mon Dec  7 19:03:14.150 dbexit: really exiting now

是因为内存不够了啊,orz,没办法,虚拟机太小了,只能使用替代命令–smallfiles运行了
注意空格之类的格式(这几天空格踩雷好多,疲惫)(要注意细节啊),
运行命令

 mongod --dbpath data/db --smallfiles


成功!

Mon Dec  7 19:03:59.144 [initandlisten] MongoDB starting : pid=129886 port=27017 dbpath=data/db 64-bit host=ubuntu
Mon Dec  7 19:03:59.145 [initandlisten] db version v2.4.9
Mon Dec  7 19:03:59.145 [initandlisten] git version: nogitversion
Mon Dec  7 19:03:59.145 [initandlisten] build info: Linux orlo 3.2.0-58-generic #88-Ubuntu SMP Tue Dec 3 17:37:58 UTC 2013 x86_64 BOOST_LIB_VERSION=1_54
Mon Dec  7 19:03:59.145 [initandlisten] allocator: tcmalloc
Mon Dec  7 19:03:59.145 [initandlisten] options: { dbpath: "data/db", smallfiles: true }
Mon Dec  7 19:03:59.148 [initandlisten] journal dir=data/db/journal
Mon Dec  7 19:03:59.148 [initandlisten] recover : no journal files present, no recovery needed
Mon Dec  7 19:03:59.215 [FileAllocator] allocating new datafile data/db/local.ns, filling with zeroes...
Mon Dec  7 19:03:59.215 [FileAllocator] creating directory data/db/_tmp
Mon Dec  7 19:03:59.217 [FileAllocator] done allocating datafile data/db/local.ns, size: 16MB,  took 0 secs
Mon Dec  7 19:03:59.218 [FileAllocator] allocating new datafile data/db/local.0, filling with zeroes...
Mon Dec  7 19:03:59.219 [FileAllocator] done allocating datafile data/db/local.0, size: 16MB,  took 0 secs
Mon Dec  7 19:03:59.223 [initandlisten] waiting for connections on port 27017
Mon Dec  7 19:03:59.225 [websvr] admin web console waiting for connections on port 28017

Mon Dec  7 19:07:59.266 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 8ms
Mon Dec  7 19:48:17.641 [PeriodicTask::Runner] task: WriteBackManager::cleaner took: 5ms


  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值