python实现爬虫所需要安装好的包
结合最近所查资料来展示一下在开始编写代码前需要安装的包。
常用的5个包
农场:Requests
炖菜:Beautiful Soup 4
色拉:lxml
餐厅:Selenium
厨师:Scrapy
具体解释附上原作者链接: https://learnku.com/articles/37861
具体安装方法
1.首先需要将包放在Python的安装目录下,找到Scripts的目标文件夹,复制文件路径,这里是D:\Python\python3.8\Scripts。

2.Windows+R输入cmd打开命令控制窗口。
进入到上述目标文件夹下


3.接下来进入包的安装
附上我看的原作者发布的相关包的安装:https://blog.csdn.net/qq_46556714/article/details/121379749
1)可以更新一下pip版本(也可以不更新,建议更新)
python -m pip install --upgrade pip
2)requests包
pip3 install requests
等待安装成功
3)beautifulsoup4安装
pip3 install beautifulsoup4

因为我安装过了,没法呈现安装过程中的截图,这里展示一下如何输入命令行。
4)安装lxml
pip3 install lxml
4.验证是否安装成功
python
import requests
import bs4
import lxml

没有提示错误就成功了。
其他的包也是类似的安装过程,pip install 包名,因为我只需要上述三个包,所以就没演示剩余两个包的安装,接下来就可以编写爬取数据的代码了。
5.最后附上一些pip命令参数:
Usage:
pip <command> [options]
Commands:
install Install packages.
uninstall Uninstall packages.
freeze Output installed packages in requirements format.
list List installed packages.
show Show information about installed packages.
search Search PyPI for packages.
wheel Build wheels from your requirements.
help Show help for commands.
General Options:
-h, --help Show help.
--isolated Run pip in an isolated mode, ignoring
environment variables and user configuration.
-v, --verbose Give more output. Option is additive, and can be
used up to 3 times.
-V, --version Show version and exit.
-q, --quiet Give less output.
--log <path> Path to a verbose appending log.
--proxy <proxy> Specify a proxy in the form
[user:passwd@]proxy.server:port.
--retries <retries> Maximum number of retries each connection should
attempt (default 5 times).
--timeout <sec> Set the socket timeout (default 15 seconds).
--exists-action <action> Default action when a path already exists:
(s)witch, (i)gnore, (w)ipe, (b)ackup.
--trusted-host <hostname> Mark this host as trusted, even though it does
not have valid or any HTTPS.
--cert <path> Path to alternate CA bundle.
--client-cert <path> Path to SSL client certificate, a single file
containing the private key and the certificate
in PEM format.
--cache-dir <dir> Store the cache data in <dir>.
--no-cache-dir Disable the cache.
--disable-pip-version-check
Don't periodically check PyPI to determine
whether a new version of pip is available for
download. Implied with --no-index.
希望对初学者有所帮助,如有错误请多指教。

9636

被折叠的 条评论
为什么被折叠?



