使用python爬取中国电影票房数据并写入csv文件

最新推荐文章于 2024-07-23 16:58:49 发布

eeeasyFan

最新推荐文章于 2024-07-23 16:58:49 发布

阅读量4.5k

点赞数 2

分类专栏： python爬虫文章标签： python

本文链接：https://blog.csdn.net/weixin_47700137/article/details/118676674

版权

python爬虫专栏收录该内容

3 篇文章 3 订阅

订阅专栏

环境

PyCharm 2021.1.2 x64
爬取的目标网页

一、代码

import requests
from bs4 import BeautifulSoup
url = "http://58921.com/alltime/wangpiao"#目标网页
response = requests.get(url)
#print(response.text)
response.encoding = "utf-8"
text = response.text
bs = BeautifulSoup(text,'lxml')
#print(bs)
table = bs.find('table',attrs={'class':'center_table table table-bordered table-condensed'})
#print(table)
thead = table.find('thead')
#print(thead)
tbody = table.find('tbody')
#print(tbody)
f = open('中国电影票房.csv',mode="w",encoding="UTF-8")
ths = thead.find_all('th')
#print(ths),
trs = tbody.find_all('tr')
for th in ths:
    if th==0:
        break
    f.write(th.text)
    f.write(",")
f.write("\n")#换行写
for tr in trs:
    if tr==0:
        break
    tds = tr.find_all("td")
    for td in tds:
        if td==0:#最后一个也被写后退出
            break
        f.write(td.text)
        f.write(",")#换列写
    f.write("\n")#换行写

二、结果

在这里插入图片描述

需要说明的问题

C3没有数据
原因是在网页原代码中这一数据是通过img标签（png格式图片）来显示的，不是网页文本显示的，我的想法是利用python文字识别技术来识别这张图片（识别中文需格外下载中文语言包），之后再写入csv文件。
目前还在努力实现中…

eeeasyFan

关注

2
点赞
踩
49

收藏

觉得还不错? 一键收藏
打赏
1
评论
使用python爬取中国电影票房数据并写入csv文件

环境PyCharm 2021.1.2 x64爬取的目标网页一、代码import requestsfrom bs4 import BeautifulSoupurl = "http://58921.com/alltime/wangpiao"#目标网页response = requests.get(url)#print(response.text)response.encoding = "utf-8"text = response.textbs = BeautifulSoup(text,'l
复制链接

扫一扫