how manyhdu_使用how参数合并的python熊猫

how manyhdu

In reality, Data Science projects often involves gathering information from variety of sources which might require data from multiple tables. Therefore, in order to conduct analysis, there is a need to join the tables. Merging in Python Pandas is a very effective way to successfully carry out this operation.

实际上,数据科学项目通常涉及从各种来源收集信息,这可能需要来自多个表的数据。 因此,为了进行分析,需要加入表格。 在Python Pandas中合并是成功执行此操作的非常有效的方法。

This tutorial aims to take you through the step by step process of different methods of merging data frames with pandas library in Python using the “how” argument.

本教程旨在逐步指导您使用“ how”参数在Python中将数据框与pandas库合并的不同方法。

First, Import datasets and convert to tables

首先,导入数据集并转换为表

import pandas as pd

将熊猫作为pd导入

df = pd.read_csv(‘link.csv’)#convert year column to dateobjectdf[‘Year’] = pd.to_datetime(df[‘Year’], format = ‘%Y’).dt.year#groupby countrydf= pd.DataFrame(df1.groupby([‘Country’, ‘Year’],as_index=False).sum())print(df)

df = pd.read_csv('link.csv')#将年份列转换为dateobjectdf ['Year'] = pd.to_datetime(df ['Year'],format ='%Y')。dt.year#groupby countrydf = pd.DataFrame(df1.groupby(['Country','Year'],as_index = False).sum())print(df)

Image for post

df1 = pd.read_excel (r’path were file is stored.xlsx’)print(df1)

df1 = pd.read_excel(r'存放文件的路径.xlsx')print(df1)

Image for post

So, we have two tables: df and df1

因此,我们有两个表:df和df1

df columns= Country, Year and Value

df列=国家,年份和价值

df1 columns= Country Name, Country Code, Year and value

df1列=国家/地区名称,国家/地区代码,年份和值

In order to merge both tables, a primary key is needed. Notice that the column that signifies Country has different names for both tables. Therefore, let’s rename table df1 ‘Country Name’ column to ‘Country’ and group by the column.

为了合并两个表,需要一个主键。 请注意,表示“国家/地区”的列在两个表中都有不同的名称。 因此,让我们将表df1的“国家名称”列重命名为“国家”并按该列进行分组。

Image for post

Now, merging both dataframes

现在,合并两个数据框

Image for post

Notice that we merged left, this implies that the table above contain only rows that match with table df only. That is, any extra countries contained in table df1 that is not in table df is not included in the above table df2

请注意,我们向左合并,这意味着上面的表仅包含仅与表df匹配的行。 也就是说,表df1中包含但表df中未包含的任何其他国家/地区,都不包含在上述表df2中

Now let’s use right, inner and outer merge

现在让我们使用右,内和外合并

The right merge removed all rows that do not match with the table at the right hand side. i.e, df1
THE RIGHT MERGE REMOVES ROWS THAT DO NOT MATCH WITH THE TABLE AT THE RIGHT HAND SIDE. I.E, DF1
正确的合并会删除与右侧的桌子不匹配的行。 IE,DF1
Image for post
THE OUTER MERGE RETAIN ALL ROWS
外合并保留所有行
Image for post

The inner merge remove rows that do not match in both dataframes. This is the default pandas merge in Python if you do not specify the kind of merge you want.

内部合并删除两个数据框中不匹配的行。 如果您未指定所需的合并类型,则这是Python中的默认熊猫合并。

Kindly let me know of any comments, suggestions or questions you might have :)

请让我知道您可能有任何意见,建议或问题:)

翻译自: https://medium.com/swlh/python-pandas-merging-using-the-how-argument-6f2013667e08

how manyhdu

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值