1、项目准备
项目目录:spider_baidu.py,requirement.txt,dockerfilie,venv,.dockerignore
spider_baidu.py:发送请求获取百度首页的html内容
from playwright.sync_api import sync_playwright
import platform
system_platform = platform.system().lower() #获取当前的操作系统名称def get_baidu_homepage():
with sync_playwright() as p:
executable_path = None
if system_platform == "linux":
executable_path = "/usr/bin/google-chrome-stable"
# 使用chromium,但你也可以使用firefox或webkit
browser = p.chromium.launch(executable_path=executable_path)
page = browser.new_page()
page.goto('https://www.baidu.com')# 获取页面HTML内容
html_content = page.content()
# 关闭浏览器
browser.close()
return html_contentif __name__ == '__main__':
content = get_baidu_homepage()
print(content)
requirement.txt:playwright
dockerfilie:
FROM python:3.11-slim-bookworm
#运行安装浏览器
RUN apt-get update \
&& apt-get install -y wget gnupg \
&& wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | gpg --dearmor -o /usr/share/keyrings/googlechrome-linux-keyring.gpg \
&& sh -c 'echo "deb [arch=amd64 signed-by=/usr/share/keyrings/googlechrome-linux-keyring.gpg] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
&& apt-get update \
&& apt-get install -y google-chrome-stable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-khmeros fonts-kacst fonts-freefont-ttf libxss1 \
--no-install-recommends \
&& rm -rf /var/lib/apt/lists/*
# 设置工作目录
WORKDIR /app
# 安装任何必要的依赖
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt
# 将当前目录的文件复制到容器的/app目录
COPY . /app/
# 定义容器启动时要运行的命令
CMD python spider_baidu.py
.dockerignore:允许你定义哪些文件或目录应该被Docker忽略,不被包含在构建Docker镜像的上下文中
2、docker镜像打包
#命令 --mac系统在打包的时候需要注意一点,要指定--platform linux/amd64,避免电脑架构和普通电脑不一样导致打包好的镜像无法使用
windows:docker build -t [images_name] .
docker build --platform linux/amd64 -t [images_name] .
3、推送到镜像仓库
docker push [images_name]