Python爬去最好大学排名代码

最新推荐文章于 2023-12-29 23:49:31 发布

青枫不流花弋江

最新推荐文章于 2023-12-29 23:49:31 发布

阅读量1.2k

点赞数

文章标签： python 爬虫

本文链接：https://blog.csdn.net/weixin_47434673/article/details/124161861

版权

Python爬取最好大学排名代码：import bs4from urllib import requestfrom bs4 import BeautifulSoup’’’（）获取网站页面’’‘def getHTMLText(url): try: resp=request.urlopen(url) html_data=resp.read().decode(‘utf-8’) return html_data except: return “”’’’(2)处理页面，提取相关信息’’‘def fillUnivList(ulist,html): soup=BeautifulSoup(html,“html.parser”) for tr in soup.find(‘tbody’).children: #搜索’tbody’后面的子节点 if isinstance(tr,bs4.element.Tag): tds=tr(‘td’) ulist.append([tds[0].text.strip(),tds[1].text.strip(),tds[3].text.strip()])’’’(3)解析数据，格式化输出结果’’'def printUnivList(ulist,num): tplt="{0:^10}\t{1:{3}10}\t{2:^10}" print(tplt.format(“排名”,“学校名称”,“总分”,chr(12288))) for i in range(num): u=ulist[i] print(tplt.format(u[0],u[1],u[2],chr(12288)))if name==‘main’: uinfo=[] url=‘https://www.shanghairanking.cn/rankings/bcur/2020’ html=getHTMLText(url) fillUnivList(uinfo,html) printUnivList(uinfo,10) 在这里插入图片描述
代码运行过程中TypeError: unsupported format string passed to NoneType.__format代码：ulist.append([tds[0].string,tds[1].string,tds[2].string])改为：ulist.append([tds[0].text.strip(),tds[1].text.strip(),tds[2].text.strip()])（网页格式已经改变了，要删除的多余空格，用.strip()）；AttributeError: ‘NoneType’ object has no attribute ‘children’ 错误意思是 ‘NoneType’ 对象没有属性 ‘children’ ，这个错误说明’children’ 属性的对象 soup 是一个空类型，那就意味着soup = BeautifulSoup(html,‘html.parser’)中soup并没有得到解析出来的html页面，那就是说在调用getHTMLText(url)函数时这个函数并没有得到url链接对应的网页信息。错误就出在getHTMLText(url)函数之中，可是仔细审查一遍后发现并没有错误。那所有的所有都指向了最后的一个可能，真相只有一个，那就是url地址有问题。更新网址。格式化字符串 IndexError: Replacement index 5 out of range for positional args tuple可能原因：1、格式化方法提供了0~5的6个占位符，但是format()内只有5个变量。

青枫不流花弋江

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python爬去最好大学排名代码

Python爬取最好大学排名代码：import bs4from urllib import requestfrom bs4 import BeautifulSoup’’’（）获取网站页面’’‘def getHTMLText(url): try: resp=request.urlopen(url) html_data=resp.read().decode(‘utf-8’) return html_data except: return “”’
复制链接

扫一扫