1、在官网下载python3.8.4
https://www.python.org/downloads/release/python-384/
2、安装后设置环境变量
win+R输入cmd,enter
输入python,查看版本。
3、下载pip21.1.1。
https://pypi.org/project/pip/。
4、进入pip目录。
cmd执行python setup.py
遇到错误:No module named setup tool
进入https://www.cnblogs.com/kaishirenshi/p/9951396.html解决
5、解决后进入以下网站安装库
https://www.cnblogs.com/copywang/p/7832527.html
6、安装效果图
7、示例
代码跑之前,在e盘创建文件夹e:\python_result
import requests
import re
url = "https://www.sohu.com"
url_list = []
crawl_urls=0
save_page_num = 0
r=requests.get(url)
html = r.text
#print(html)
urls=re.findall(r'href="(.*?)"',html) #
for url in urls:
#print(url)#提取到了所有网页上的url
url=url.strip()#去掉url的前后空格
if url.startswith("mailto"):
continue
elif url.endswith("ico") or url.endswith("png") \
or url.endswith("css") or url.endswith("jpg") or url.endswith("js"):
continue
elif url.startswith("javascript"):
continue
elif url=="/":
continue
elif url.startswith("//"):
url = "https:" +url
url_list.append(url)
else:
url_list.append(url)
for url in url_list:
print(url)
if not url:
continue
crawl_urls+=1
r=requests.get(url)
if "汽车" in r.text:
save_page_num+=1
with open("e:\\python_result\\"+str(save_page_num)+".html","w",encoding="utf-8") as fp:
fp.write(r.text)
print("一共爬了%s个网页" %crawl_urls)
print("一共保存了%s个网页" %save_page_num)