crontab部署定时任务，部署爬虫

最新推荐文章于 2024-06-28 15:39:28 发布

风雪3

最新推荐文章于 2024-06-28 15:39:28 发布

阅读量123

点赞数

文章标签： linux

本文链接：https://blog.csdn.net/weixin_44122586/article/details/132795559

版权

1 脚本文件部署

linux内置的cron进程能帮我们实现这些需求，cron搭配shell脚本，非常复杂的指令也没有问题。

1.1 `crontab`的使用

crontab [-u username]　　　　//省略用户表表示操作当前用户的crontab
    -e      (编辑工作表)
    -l      (列出工作表里的命令)
    -r      (删除工作)

我们用crontab -e进入当前用户的工作表编辑，是常见的vim界面。每行是一条命令。

crontab的命令构成为时间+动作，其时间有分、时、日、月、周五种，操作符有

***** 取值范围内的所有数字
/ 每过多少个数字
- 从X到Z
**，**散列数字

代表意义	分钟	小时	日期	月份	周	命令
数字范围	0~59	0~23	1~31	1~12	0~6	就命令

1.2 为当前用户创建`cron`服务

可以键入 crontab -e 编辑crontab服务文件

举例：

* * * * *      #每分钟都执行

实例1:每5分钟执行一次文档写入

*/5 * * * * echo 'hello world' >> /home/tuling/Documents/Xl/demo.txt

实例2：每小时的第3和第15分钟执行

3,15 * * * *  echo 'hello world' >> /home/xxoo/demo/demo.log

实例：在上午8点的20分钟执行

20 8 * * * myCommand

1.3 单个脚本使用定时任务部署

1.3.1 上传本地文件到服务器

scp /path/filename username@servername:/path
scp mians.py tuling@ 192.168.70.206:/home/tuling/Documents/Xl

1.3.2 定时采集任务

1、使用定时任务启动每30分钟采集一次数据

*/30 * * * *  python /home/tuling/Documents/Xl/mians.py spider >> /home/tuling/Documents/Xl/demo.log

scrapyd定时任务

编写一个shell脚本

#!/bin/bash
source /home/spiders/.venv36/bin/activate
cd /home/spiders/YouMei
curl http://47.98.134.164:6800/schedule.json -d project=YouMei -d spider=CrawlSpider

第一行是激活虚拟环境
第二行是进入项目目录
第三行是 scrapyd启动爬虫，项目名称为YouMei ，爬虫名为CrawlSpider

编辑crontab工作表

crontab -e

写入内容

0 * * * * /home/shll_file/spider_youmei.sh >> /home/shll_file/crontab_youmei.log

每小时执行一下shell脚本

风雪3

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
crontab部署定时任务，部署爬虫

crontab定时部署爬虫
复制链接

扫一扫