python 提取出所有学生的序号，姓名，成绩(简单易懂，代码可以直接运行，非正则表达式)

henu-于笨笨

已于 2022-10-20 13:05:14 修改

阅读量3.8k

点赞数 6

分类专栏： python 文章标签：正则表达式 python 爬虫

于 2021-09-26 11:36:59 首次发布

本文链接：https://blog.csdn.net/weixin_45527999/article/details/120485048

版权

python 专栏收录该内容

49 篇文章 14 订阅

订阅专栏

python 提取出所有学生的序号，姓名，成绩(简单易懂，代码可以直接运行，非正则表达式)

非正则表达式提取信息利用的是字符串的切片原理，商铺先用spilt函数将每一条<tr><\tr>切割出来，然后通过循环一次遍历每一条数据。先设置begin通过index函数定位要提取内容前一条信息的位置，然后加上该条信息的长度，就到达了要提取内容的位置，然后设置end，通过index得到提取数据结束的位置。最后在每一次循环中通过字符串的切片就可以将数据提取出来了
点个👍吧
代码如下：

#使用字符串常用方法提取学生信息
str='''<tbody>
<tr><td><span><span class="c-index c-index-hot1 c-gap-icon-right-small">1</span>张婷婷</span></td><td class="opr-toplist-right">92<i class="opr-toplist-st c-icon c-icon-down"></i></td></tr>
<tr><td><span><span class="c-index c-index-hot1 c-gap-icon-right-small">2</span>王华</span></td><td class="opr-toplist-right">91<i class="opr-toplist-st c-icon c-icon-down"></i></td></tr>
<tr><td><span><span class="c-index c-index-hot1 c-gap-icon-right-small">3</span>张岚</span></td><td class="opr-toplist-right">90<i class="opr-toplist-st c-icon c-icon-down"></i></td></tr>
<tr><td><span><span class="c-index c-gap-icon-right-small">4</span>孙鸿峰</span></td><td class="opr-toplist-right">90<i class="opr-toplist-st c-icon c-icon-down"></i></td></tr>
<tr><td><span><span class="c-index c-gap-icon-right-small">5</span>周海栋</span></td><td class="opr-toplist-right">89<i class="opr-toplist-st c-icon c-icon-down"></i></td></tr>
<tr><td><span><span class="c-index c-gap-icon-right-small">6</span>武静</span></td><td class="opr-toplist-right">88<i class="opr-toplist-st c-icon c-icon-down"></i></td></tr>
</tbody>'''
n = 0
slt = str.split("</tr>")
for content in slt:
    n+=1#设置一个标记判断内容是否被切割出来
    #print(content)
    order_begin = content.index('c-gap-icon-right-small">') + len('c-gap-icon-right-small">')
    order_end = content.index('</span>')
    name_begin = order_end + len("</span>")
    name_end = content.index("</span>",name_begin)
    score_begin = content.index('''<td class="opr-toplist-right">''')+len('''<td class="opr-toplist-right">''')#这里用了多行注释，让单行注释的效果保持
    score_end = content.index('''<i class="opr-toplist-st c-icon c-icon-down">''')
    order = content[order_begin:order_end]
    name = content[name_begin:name_end]
    score = content[score_begin:score_end]
    print(order,end=' ')
    print(name,end=' ')
    print(score,end=' ')
    print('\n')
print(n)

运行的结果如下：
在这里插入图片描述
这个程序存在报错信息，但是不影响数据的提取，具体为什么会报错我还没搞清楚，之前爬取豆瓣电影榜单的时候也遇到过找不到信息的情况，不过当时是用的正则表达式的re.search函数直接返回出来一个None（这个提取学生信息我也写了一个re.search形式的，那个是在第五次作业里面，感兴趣的可以看一下）
等我和同学讨论后这个报错原因后再更新一下，知道为什么会有这种原因的也请在评论区告诉我，不胜感激
在这里插入图片描述

henu-于笨笨

关注

6
点赞
踩
9

收藏

觉得还不错? 一键收藏
打赏
7
评论
python 提取出所有学生的序号，姓名，成绩(简单易懂，代码可以直接运行，非正则表达式)

python 提取出所有学生的序号，姓名，成绩(简单易懂，代码可以直接运行，非正则表达式)非正则表达式提取信息利用的是字符串的切片原理，商铺先用spilt函数将每一条<tr><\tr>切割出来，然后通过循环一次遍历每一条数据。先设置begin通过index函数定位要提取内容前一条信息的位置，然后加上该条信息的长度，就到达了要提取内容的位置，然后设置end，通过index得到提取数据结束的位置。最后在每一次循环中通过字符串的切片就可以将数据提取出来了点个????吧代码如下：#使
复制链接

扫一扫