淘宝 代写 python_python爬虫代写 代做python爬虫

ABOUT

This is the base implementation of a full crawler that uses a spacetime cache server to receive requests.

CONFIGURATION

Step 1: Install dependencies

If you do not have Python 3.6+:

Check if pip is installed by opening up a terminal/command prompt and typing the commands python3 -m pip. This should show the help menu for all the commands possible with pip. If it does not, then get pip by following the instructions at https://pip.pypa.io/en/stable/installing/

To install the dependencies for this project run the following two commands after ensuring pip is installed for the version of python you are using. Admin privileges might be required to execute the commands. Also make sure that the terminal is at the root folder of this project.python -m pip install packages/spacetime-2.1.1-py3-none-any.whl

python -m pip install -r packages/requirements.txt

Use this command to pip install BeautifulSoup4

MAC: sudo python3 -m pip install bs4

Step 2: Configuring config.ini

Set the options in the config.ini file. The following configurations exist.

USERAGENT: Set the useragent to IR F19 uci-id1,uci-id2,uci-id3. It is important to set the useragent appropriately to get the credit for hitting our cache.

HOST: This is the host name of our caching server. Please set it as per spec.

PORT: THis is the port number of our caching server. Please set it as per spec.

SEEDURL: The starting url that a crawler first starts downloading.

POLITENESS: The time delay each thread has to wait for after each download.

SAVE: Th

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值