pyspider爬虫学习-文档翻译-Working-with-Results.md

Working with Results 结果处理
====================
#从WebUI下载和查看您的数据很方便,但可能不适合计算机。
Downloading and viewing your data from WebUI is convenient, but may not suitable for computer.

Working with ResultDB 结果数据处理
---------------------
#虽然resultdb只是为结果预览而设计,不适合大规模存储。但是,如果您想从resultdb获取数据,那么有一些简单的代码片段使用数据库API来帮助您连接和查询数据。
Although resultdb is only designed for result preview, not suitable for large scale storage. But if you want to grab data from resultdb, there are some simple snippets using database API that can help you to connect and select the data.

```
from pyspider.database import connect_database
resultdb = connect_database("<your resutldb connection url>")
for project in resultdb.projects:
    for result in resultdb.select(project):
        assert result['taskid']
        assert result['url']
        assert result['result']
```
#结果['result']是由脚本提交的“return”语句返回的对象。
The `result['result']` is the object submitted by `return` statement from your script.

Working with ResultWorker 使用 ResultWorker
-------------------------
#在生产环境中,您可能希望将pyspider连接到系统/后端处理管道,而不是将其存储到resultdb中。强烈建议重写ResultWorker
In product environment, you may want to connect pyspider to your system / post-processing pipeline, rather than store it into resultdb. It's highly recommended to override ResultWorker.

```
from pyspider.result import ResultWorker

class MyResultWorker(ResultWorker):
    def on_result(self, task, result):
        assert task['taskid']
        assert task['project']
        assert task['url']
        assert result
        # your processing code goes here
```
#结果['result']是由脚本提交的“return”语句返回的对象。
`result` is the object submitted by `return` statement from your script.
#你可以将写好的就脚本放在你的pyspider启动文件夹下,并使用'result_worker'子命令添加参数:
You can put this script (e.g., `my_result_worker.py`) at the folder where you launch pyspider. Add argument for `result_worker` subcommand:
`pyspider result_worker --result-cls=my_result_worker.MyResultWorker`
或者
Or

```
{
  ...
  "result_worker": {
    "result_cls": "my_result_worker.MyResultWorker"
  }
  ...
}
```
#如果你想使用配置文件【请参考部署文档】
if you are using config file. [Please refer to Deployment](/Deployment)
#设计自己的数据库模式
Design Your Own Database Schema
-------------------------------
#的结果被编码为兼容的JSON存储在数据库中。强烈建议设计自己的数据库,并覆写ResultWorker描述的结果。
The results stored in database is encoded as JSON for compatibility. It's highly recommended to design your own database, and override the ResultWorker described above.

TIPS about Results #关于结果的建议
-------------------
#### Want to return more than one result in callback?#想要在回调中返回多个单一结果吗?
#由于taskid(url)的resultdb不重复结果,新的结果将覆盖之前的结果。
As resultdb de-duplicate results by taskid(url), the latest will overwrite previous results.
#一个解决方案是使用“send_message”API为每个结果制作一个“假”的taskid。
One workaround is using `send_message` API to make a `fake` taskid for each result.

```
def detail_page(self, response):
    for li in response.doc('li').items():
        self.send_message(self.project_name, {
            ...
        }, url=response.url+"#"+li('a.product-sku').text())

def on_message(self, project, msg):
    return msg
```

See Also: [apis/self.send_message](/apis/self.send_message)

转载于:https://my.oschina.net/sijinge/blog/1530053

pyspider错误"subprocess-exited-with-error"表示在运行pyspider时,遇到一个或多个子进程出现错误,导致整个命令无法完成。这种错误通常与操作系统或Python版本相关。为了解决这个问题,你可以采取以下几个步骤: 1. 确认你的电脑上是否安装了所需的依赖库。你可以使用pip install命令来安装所需的库。确保你已经安装了pyspider的所有依赖库,并且它们的版本是兼容的。 2. 确保你的操作系统是最新的,并且已经安装了最新版本的Python。有些依赖库只能在特定版本的Python上运行。你可以通过运行python --version命令来查看你当前的Python版本。如果你的Python版本过旧,你可能需要更新到最新的版本。 3. 如果以上步骤都无法解决问题,你可以尝试重新安装Python。你可以使用python -m ensurepip命令来重新安装Python。这将确保你有最新的pip和setuptools库。 如果你使用的是Windows操作系统,还可以尝试下载并安装与你的Python版本对应的pycurl库。你可以在https://www.lfd.uci.edu/~gohlke/pythonlibs/#pycurl网站上找到Python版本对应的pycurl文件。下载后,你可以使用pip install命令来安装pycurl库。 总结起来,要解决pyspider错误"subprocess-exited-with-error",你可以尝试安装所需的依赖库、更新Python版本、重新安装Python或下载并安装适用于你的Python版本的pycurl库。<span class="em">1</span><span class="em">2</span><span class="em">3</span>
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值