爬虫入门（一）：用Python爬取静态HTML网页

最新推荐文章于 2024-09-21 16:31:53 发布

the-white

最新推荐文章于 2024-09-21 16:31:53 发布

阅读量3.1k

点赞数

分类专栏：爬虫 Python 文章标签：爬虫 Python HTML

本文链接：https://blog.csdn.net/qq_41824206/article/details/80170161

版权

本文是爬虫入门系列的第一篇，主要针对新手介绍如何使用Python抓取和解析静态HTML网页。通过实践了解相关函数和参数，逐步掌握爬虫基本技巧。

摘要由CSDN通过智能技术生成

系统环境：

操作系统：Windows10 专业版 64bit  
Python：anaconda2、Python2.7  
Python packages:requests、beautifulsoup os

新手入门爬虫时一般都会先从静态HTML网页下手，并且爬取HTML网页不难，容易上手。遇到没见过函数可以找度娘，去理解那些函数有什么作用，弄清楚那些参数的用途，然后用多几次，就大概知道他的套路是怎么样的了（小白我就是这样入门滴）。好了，废话不多说，上代码：

# -*- coding: utf-8 -*-
"""
Created on Thu Apr 26 18:09:20 2018

@author: zww
"""

import requests
from bs4 import BeautifulSoup
import os

proxies = { 'https': 'http://41.118.132.69:4433' }
hd={ 'User-Agent': "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"}
url='http://q.10jqka.com.cn/thshy/'

req =requests.get(url,headers=hd, proxies =proxies )
#print req

bs=BeautifulSoup(req.content,'html.parser')

div_all=bs.find_all('div',attrs=