Python爪巴取GDP数据

最新推荐文章于 2023-05-17 16:45:35 发布

Fay!

最新推荐文章于 2023-05-17 16:45:35 发布

阅读量357

点赞数

文章标签： python 爬虫

本文链接：https://blog.csdn.net/acmfay/article/details/114837898

版权

爪巴出来的是某个网站GDP数据

from selenium import webdriver
from bs4 import BeautifulSoup
import csv

driver=webdriver.Chrome()

url='https://www.kylc.com/stats/global/yearly/g_gdp/1960.html' #路径
xpath='/html/body/div[2]/div[1]/div[5]/div[1]/div/div/div/table' #对应元素的xpath，在所在目录右键可以找到xpath
driver.get(url)
tablel=driver.find_element_by_xpath(xpath).get_attribute('innerHTML') #get_attribute是获取源代码的

out=open('e:/gdpallyear.csv','w',newline='')
csv_write=csv.writer(out,dialect='excel')

soup=BeautifulSoup(tablel,"html.parser")
table=soup.find_all('tr') #寻找所有的<tr标签>
for row in table:
    cols=[col.text for  col in row.find_all('td')] #.text去掉标签后的内容，意思就是没有<td></td>
    if len(cols)==0 or not cols[0].isdigit(): #剔除不规则的数据，如空数据或者广告等......
        continue
    csv_write.writerow(cols)
    print(cols)
    
out.close() #将新打开的文件合理的关闭？合理的关闭是怎么关闭......因为上面打开文件了，下面要关。没有这条语句的话表格里面没有内容而且是只读模式
driver.close() #避免手动关闭打开的网页

还要再运行一下

out.close()

爪巴下来的内容CSV格式出来是表格

CSV貌似是一个EXCEL文件，其实是一个文本

我的老师并不满意一年的GDP数据于是可以爪巴出来所有年份的
修改后
如下图：
在这里插入图片描述
为什么最后只保留了最后一年的结果呢？
因为每次打开文件形成了覆盖，新的会覆盖掉旧的
如何解决？

out=open('e:/gdpallyear.csv','w',newline='')
csv_write=csv.writer(out,dialect='excel')

这两条语句放在循环外面，只打开一次就好

Fay!

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python爪巴取GDP数据

from selenium import webdriverfrom bs4 import BeautifulSoupimport csvdriver=webdriver.Chrome()url='https://www.kylc.com/stats/global/yearly/g_gdp/1960.html'xpath='/html/body/div[2]/div[1]/div[5]/div[1]/div/div/div/table'driver.get(url)tablel=drive
复制链接

扫一扫