how manyhdu_使用how参数合并的python熊猫

最新推荐文章于 2024-09-27 10:11:28 发布

weixin_26714375

最新推荐文章于 2024-09-27 10:11:28 发布

阅读量86

点赞数

文章标签： python

原文链接：https://medium.com/swlh/python-pandas-merging-using-the-how-argument-6f2013667e08

版权

how manyhdu

In reality, Data Science projects often involves gathering information from variety of sources which might require data from multiple tables. Therefore, in order to conduct analysis, there is a need to join the tables. Merging in Python Pandas is a very effective way to successfully carry out this operation.

实际上，数据科学项目通常涉及从各种来源收集信息，这可能需要来自多个表的数据。因此，为了进行分析，需要加入表格。在Python Pandas中合并是成功执行此操作的非常有效的方法。

This tutorial aims to take you through the step by step process of different methods of merging data frames with pandas library in Python using the “how” argument.

本教程旨在逐步指导您使用“ how”参数在Python中将数据框与pandas库合并的不同方法。

First, Import datasets and convert to tables

首先，导入数据集并转换为表

import pandas as pd

将熊猫作为pd导入

df = pd.read_csv(‘link.csv’)#convert year column to dateobjectdf[‘Year’] = pd.to_datetime(df[‘Year’], format = ‘%Y’).dt.year#groupby countrydf= pd.DataFrame(df1.groupby([‘Country’, ‘Year’],as_index=False).sum())print(df)

df = pd.read_csv('link.csv')＃将年份列转换为dateobjectdf ['Year'] = pd.to_datetime(df ['Year']，format ='％Y')。dt.year＃groupby countrydf = pd.DataFrame(df1.groupby(['Country'，'Year']，as_index = False).sum())print(df)

df1 = pd.read_excel (r’path were file is stored.xlsx’)print(df1)

df1 = pd.read_excel(r'存放文件的路径.xlsx')print(df1)

So, we have two tables: df and df1

因此，我们有两个表：df和df1

df columns= Country, Year and Value

df列=国家，年份和价值

df1 columns= Country Name, Country Code, Year and value

df1列=国家/地区名称，国家/地区代码，年份和值

In order to merge both tables, a primary key is needed. Notice that the column that signifies Country has different names for both tables. Therefore, let’s rename table df1 ‘Country Name’ column to ‘Country’ and group by the column.

为了合并两个表，需要一个主键。请注意，表示“国家/地区”的列在两个表中都有不同的名称。因此，让我们将表df1的“国家名称”列重命名为“国家”并按该列进行分组。

Now, merging both dataframes

现在，合并两个数据框

Notice that we merged left, this implies that the table above contain only rows that match with table df only. That is, any extra countries contained in table df1 that is not in table df is not included in the above table df2

请注意，我们向左合并，这意味着上面的表仅包含仅与表df匹配的行。也就是说，表df1中包含但表df中未包含的任何其他国家/地区，都不包含在上述表df2中

Now let’s use right, inner and outer merge

现在让我们使用右，内和外合并

The right merge removed all rows that do not match with the table at the right hand side. i.e, df1 — THE RIGHT MERGE REMOVES ROWS THAT DO NOT MATCH WITH THE TABLE AT THE RIGHT HAND SIDE. I.E, DF1

The inner merge remove rows that do not match in both dataframes. This is the default pandas merge in Python if you do not specify the kind of merge you want.

内部合并删除两个数据框中不匹配的行。如果您未指定所需的合并类型，则这是Python中的默认熊猫合并。

Kindly let me know of any comments, suggestions or questions you might have :)

请让我知道您可能有任何意见，建议或问题:)