read.html函数怎么用,如何在新的列中保存读取pandas read_html（）函数的url？

最新推荐文章于 2024-01-15 16:04:59 发布

徐梅栋

最新推荐文章于 2024-01-15 16:04:59 发布

阅读量944

点赞数

文章标签： read.html函数怎么用

我有兴趣从网站中提取一些表格，我定义了表格所在链接的列表。每个链接都有几个具有相同列数的表格。所以，我提取的链接列表中的所有表到单个表大熊猫read_html()功能如下：如何在新的列中保存读取pandas read_html()函数的url？

links = ['url1.com','url2.com',...,'urlN.com']

import multiprocessing

def process_url(link):

return pd.concat(pd.read_html(link), ignore_index=False) # add in a new column the link where the table was extracted..

p = multiprocessing.Pool()

df = pd.concat(p.map(process, links), ignore_index=True)

我注意到，这将有助于开展做实各表的出处链接(即保存在链接来自最终表的行的新列中)。因此，我的问题是，如何在一个新的列中执行熊猫read_html()引用链接？

例如：

中的表1和2是在url1.com：

表1：

fruit, color, season, price

apple, red, winter, 2$

watermelon, green, winter, 3$

orange, orange, spring, 1$

表2：

fruit, color, season, price

peppermint, green, fall, 3$

pear, yellow, fall, 4$

表3生活在在url2.com

fruit, color, season, price

tomato, red, fall, 3$

pumpking, orange, fall, 1$

我想在新列中保存每个表格被提取的位置(即，在一个新的列)进行表的参考：

fruit, color, season, price, link

0 apple, red, winter, 2$, url1.com

1 watermelon, green, winter, 3$, url1.com

2 orange, orange, spring, 1$, url1.com

3 peppermint, green, fall, 3$, url1.com

4 pear, yellow, fall, 4$, url1.com

5 tomato, red, fall, 3$, url2.com

6 pumpking, orange, fall, 1$, url2.com

又如这个“图”，注意，Table 1和Table在url1.com。另一方面，表3位于url2.com。具有上述功能创建从处于不同的链路表的单个表中，我的目标是创建且符合提取表中的位置的一列(只是保存referece)：

source: url1.com

fruit, color, season, price

apple, red, winter, 2$

watermelon, green, winter, 3$

orange, orange, spring, 1$

source: url1.com

fruit, color, season, price

peppermint, green, fall, 3$

pear, yellow, fall, 4$

----> fruit, color, season, price, link

apple, red, winter, 2$, url1.com

watermelon, green, winter, 3$, url1.com

orange, orange, spring, 1$, url1.com

peppermint, green, fall, 3$, url1.com

pear, yellow, fall, 4$, url1.com

tomato, red, fall, 3$, url2.com

source: url2.com pumpking, orange, fall, 1$, url1.com

fruit, color, season, price

tomato, red, fall, 3$

pumpking, orange, fall, 1$

不限怎么做的想法？

+1

你能澄清你的问题？我不确定你在问什么。 “执行每张桌子的出处链接”是什么意思？ –

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
read.html函数怎么用,如何在新的列中保存读取pandas read_html（）函数的url？

我有兴趣从网站中提取一些表格，我定义了表格所在链接的列表。每个链接都有几个具有相同列数的表格。所以，我提取的链接列表中的所有表到单个表大熊猫read_html()功能如下：如何在新的列中保存读取pandas read_html()函数的url？links = ['url1.com','url2.com',...,'urlN.com']import multiprocessingdef proces...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。