PYTHON爬取中国最好大学排行榜报错TypeError: unsupported format string passed to NoneType.format

月同学不写Bug

于 2021-09-01 11:53:19 发布

阅读量643

点赞数 4

分类专栏： python 文章标签： python 爬虫

本文链接：https://blog.csdn.net/qq_49783140/article/details/120035285

版权

python 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

嵩天老师python爬虫爬取大学排行榜代码修改
嵩天老师课上的网页已经无法打开，本文使用的是如下网址：

http://gaokao.xdf.cn/201911/10991728.html

1.问题分析与解决

报错为类型错误，显示我们传递了不支持的格式字符串

1.1strip()

我们查看网页源码，发现我们所传递的字符串头尾包含空格及换行（红色方框），但是这不是报错的原因，这只会导致格式不太好看，因此我在获取字符串是添加了.strip()函数，既tds[0].text.strip()。
strip()函数可去除头尾的指定字符，默认为空格及换行。

2.2string与text

r.text #响应内容的字符串形式，即url对应页面的内容
r.string #标签内非属性字符串，<>...<>中字符串，格式:<tag>.string

通过对比我们可以发现r.string获取的是标签内非属性字符串，而我们查看源代码可以发现大学名字不是td标签的字符串，属于td儿子的儿子的儿子……的字符串，因此tds[0].string只能获取到None。

所以应该使用r.text获取，即tds[1].text.strip()

 ulist.append([tds[0].text.strip(), tds[1].text.strip(), tds[3].text.strip()])

2.源码显示

import requests
from bs4 import BeautifulSoup
import bs4

#获取url内容
def getHTMLText(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""

#数据提取填充
def fillUnivList(ulist, html):
    soup = BeautifulSoup(html, "html.parser")
    for tr in soup.find('tbody').children:
        if isinstance(tr, bs4.element.Tag):
            tds = tr('td')
            ulist.append([tds[0].text.strip(), tds[1].text.strip(), tds[3].text.strip()])       #.strip()去除头尾空格、换行

#格式化输出
def printUnivList(ulist, num):
    tplt = "{0:^10}\t{1:{3}^10}\t{2:^10}"
    print(tplt.format("排名", "学校名称", "总分", chr(12288)))
    for i in range(num):
        u = ulist[i]
        print(tplt.format(u[0],u[1],u[2],chr(12288)))

#主函数
def main():
    uinfo = []
    url = 'http://gaokao.xdf.cn/201911/10991728.html'
    html = getHTMLText(url)
    fillUnivList(uinfo,html)
    printUnivList(uinfo,20)  # 20 univs


main()

月同学不写Bug

关注

4
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
PYTHON爬取中国最好大学排行榜报错TypeError: unsupported format string passed to NoneType.__format__

嵩天老师python爬虫爬取大学排行榜代码修改嵩天老师课上的网页已经无法打开，本文使用的是如下网址：http://gaokao.xdf.cn/201911/10991728.html1.问题分析与解决报错为类型错误，显示我们传递了不支持的格式字符串1.1strip()我们查看网页源码，发现我们所传递的字符串头尾包含空格及换行（红色方框），但是这不是报错的原因，这只会导致格式不太好看，因此我在获取字符串是添加了.strip()函数，既tds[0].text.strip()。str..
复制链接

扫一扫