scrapy对接docker
一.安装python
- 配置yum源
如:163.repo ali.repo bak epel.repo local.repo - 配置pip源
mkdir ~/.pip
vim ~/.pip/pip.conf
[global]
index-url = https://pypi.tuna.tsinghua.edu.cn/simple
Windows:https://blog.csdn.net/kan2016/article/details/81203465 - 安装python依赖
yum install gcc openssl-devel libffi-devel sqlite* -y - 源码安装python3.7
tar -zxvf Python-3.7.0.tgz
cd Python-3.7.0
./configure --prefix=/usr/local/python3/ --enable-loadable-sqlite-extensions --enable-shared
make && make install
ln -s /usr/local/python3/bin/python3 /usr/bin/python3
ln -s /usr/local/python3/bin/pip3 /usr/bin/pip3
问题:error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory
https://blog.csdn.net/Hearthougan/article/details/83091205 - 其他
安装ipython:pip3 install ipython
删除python3:
rpm -qa|grep python3|xargs rpm -ev --allmatches --nodeps
whereis python3 |xargs rm -frv
whereis python
二.安装scrapy
- 安装scrapy依赖
yum install libxslt-devel pyOpenSSL python-lxml python-devel -y
pip3 install twisted - 安装scrapy
pip3 install --upgrade pip
pip3 install scrapy --default-timeout=100
ln -s /usr/local/python3/bin/scrapy /usr/bin/scrapy - 测试
scrapy shell www.baidu.com - scrapy常用命令
scrapy shell url
scrapy startproject project_name
scrapy genspider name allowed_domains
scrapy crawl name
scrapy crawl name -o items.csv
scrapy parse --spider=basic url 用于调试异常的url
scrapy crawl aitaotu -o info.csv -s CLOSESPIDER_ITEMCOUNT=200 - IDE中启动scrapy,有问题
if name == ‘main’:
from scrapy import cmdline
cmdline.execute(“scrapy crawl basic”.split()) - 其他
rar命令:
wget http://www.rarlab.com/rar/rarlinux-x64-5.3.0.tar.gz
tar -zxvf/cd/make
unrar x ScrapySplashTest.rar
三.安装docker
-
参考:https://www.runoob.com/docker/centos-docker-install.html
安装软件包:
yum install -y yum-utils device-mapper-persistent-data lvm2设置稳定仓库:
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo安装最新版本:
yum install docker-ce -y
#yum install docker-ce docker-ce-cli containerd.io -y 依赖已安装
或指定版本:
yum list docker-ce --showduplicates | sort -r
yum install docker-ce-<VERSION_STRING> -y启动/测试:
systemctl start docker
systemctl enable docker
docker versionUser-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36问题:WARNING: bridge-nf-call-iptables is disabled
https://yq.aliyun.com/articles/278801 -
设置docker镜像加速
镜像站点:https://www.daocloud.io/mirror#accelerator-doc
curl -sSL https://get.daocloud.io/daotools/set_mirror.sh | sh -s http://f1361db2.m.daocloud.iovim /etc/docker/daemon.json
{“registry-mirrors”:[“http://hub-mirror.c.163.com”]}systemctl daemon-reload
systemctl restart docker -
docker常用命令
docker images
docker pull ubuntu:latest
docker rmi hello-worlddocker ps -a
docker stop/start id
docker rm -f 93c482e50098
docker rm $(docker ps -aq)查看容器的输出:
docker logs -f bf08b7f2cd89通过镜像运行一个容器,并设置端口映射
docker run -p 0.0.0.0:5555:5555 proxypool:latestdocker run -itd --name ubuntu-test ubuntu /bin/bash
docker exec -it ubuntu-test /bin/bash
docker search python
三.构建docker镜像
-
在项目根目录下创建requirements.txt、Dockerfile
[root@localhost ScrapySplashTest]# cat requirements.txt
scrapy
pymongo
scrapy-splash[root@localhost ScrapySplashTest]# cat Dockerfile
FROM python
ENV PATH /usr/local/bin:$PATH
ADD . /code
WORKDIR /code
RUN pip3 install -r requirements.txt --default-timeout=100
CMD scrapy crawl taobao -
构建镜像
docker build -t name:verison .