爬虫之爬取网页表格数据（一）

最新推荐文章于 2024-07-07 06:42:20 发布

木玉曾有约

最新推荐文章于 2024-07-07 06:42:20 发布

阅读量2.9k

点赞数 1

原文链接：https://blog.csdn.net/qq_30500113/article/details/83783834

版权

使用Pyquery 爬取数据存为csv文件

爬取网页：http://www.zuihaodaxue.cn/zuihaodaxuepaiming2018.html
环境：windows+Anaconda
代码如下：

import requests
from pyquery import PyQuery as pq

def get_page(url):
    """发起请求 获得源码"""
    r = requests.get(url)
    r.encoding = 'utf8'
    html = r.text
    return html

def parse(text):
    """解析数据 写入文件"""
    doc = pq(text)
    # 获得每一行的tr标签
    tds = doc('table.table tbody tr.alt').items()
    for td in tds:
        rank = td.find('td:first-child').text()     # 排名
        name = td.find('div').text()  # 大学名称
        city = td.find('td:nth-child(3)').text()    # 城市
        score = td.find('td:nth-child(4)').text()   # 总分
        with open('college.csv', 'a+', encoding='utf8') as f:
            f.write(rank + '\t\t')
            f.write(name + '\t\t')
            f.write(city + '\t\t')
            f.write(score + '\t\t\n')
    print("写入完成")

if __name__ == "__main__":
    url = "http://www.zuihaodaxue.cn/zuihaodaxuepaiming2018.html"
    text = get_page(url)
    parse(text)

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

木玉曾有约

关注关注

1
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
爬虫之爬取网页表格数据（一）

使用Pyquery 爬取数据存为csv文件爬取网页：http://www.zuihaodaxue.cn/zuihaodaxuepaiming2018.html环境：windows+Anaconda代码如下：import requestsfrom pyquery import PyQuery as pqdef get_page(url): """发起请求获得源码""" ...
复制链接

扫一扫