使用Python合并excel表格的两列

I have a 20 x 4000 dataframe in python using pandas. 我在Python中使用熊猫有20 x 4000数据帧。 Two of these columns are named Year and quarter. 这些列中的两个分别命名为Year和Quarter。 I'd like to create a variable called period that makes Year = 2000 and quarter= q2 into 2000q2 我想创建一个称为period的变量,使Year = 2000 and Quarter = q2变成2000q2

Can anyone help with that? 有人可以帮忙吗?


#1楼

参考:https://stackoom.com/question/1JJ5t/在pandas-python中的数据框中合并两列文本


#2楼

dataframe["period"] = dataframe["Year"].map(str) + dataframe["quarter"]

#3楼

df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})
df['period'] = df[['Year', 'quarter']].apply(lambda x: ''.join(x), axis=1)

Yields this dataframe 产生此数据框

   Year quarter  period
0  2014      q1  2014q1
1  2015      q2  2015q2

This method generalizes to an arbitrary number of string columns by replacing df[['Year', 'quarter']] with any column slice of your dataframe, eg df.iloc[:,0:2].apply(lambda x: ''.join(x), axis=1) . 通过将df[['Year', 'quarter']]替换为数据帧的任何列切片,例如df.iloc[:,0:2].apply(lambda x: ''.join(x), axis=1) 。

You can check more information about apply() method here 您可以在此处查看有关apply()方法的更多信息


#4楼

Although the @silvado answer is good if you change df.map(str) to df.astype(str) it will be faster: 尽管如果将df.map(str)更改为df.map(str) , df.astype(str) silvado答案很好,但它会更快:

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'quarter': ['q1', 'q2']})
 
In [131]: %timeit df["Year"].map(str)
10000 loops, best of 3: 132 us per loop
 
In [132]: %timeit df["Year"].astype(str)
10000 loops, best of 3: 82.2 us per loop

#5楼

The method cat() of the .str accessor works really well for this: .str访问器的cat()方法对此非常有效:

>>> import pandas as pd
>>> df = pd.DataFrame([["2014", "q1"], 
...                    ["2015", "q3"]],
...                   columns=('Year', 'Quarter'))
>>> print(df)
   Year Quarter
0  2014      q1
1  2015      q3
>>> df['Period'] = df.Year.str.cat(df.Quarter)
>>> print(df)
   Year Quarter  Period
0  2014      q1  2014q1
1  2015      q3  2015q3

cat() even allows you to add a separator so, for example, suppose you only have integers for year and period, you can do this: cat()甚至允许您添加一个分隔符,因此,例如,假设年份和期间只有整数,则可以执行以下操作:

>>> import pandas as pd
>>> df = pd.DataFrame([[2014, 1],
...                    [2015, 3]],
...                   columns=('Year', 'Quarter'))
>>> print(df)
   Year Quarter
0  2014       1
1  2015       3
>>> df['Period'] = df.Year.astype(str).str.cat(df.Quarter.astype(str), sep='q')
>>> print(df)
   Year Quarter  Period
0  2014       1  2014q1
1  2015       3  2015q3

Joining multiple columns is just a matter of passing either a list of series or a dataframe containing all but the first column as a parameter to str.cat() invoked on the first column (Series): 连接多列只是传递一系列列表或包含除第一列之外的所有列的数据str.cat()作为在第一列(系列)上调用的str.cat()的参数的问题:

>>> df = pd.DataFrame(
...     [['USA', 'Nevada', 'Las Vegas'],
...      ['Brazil', 'Pernambuco', 'Recife']],
...     columns=['Country', 'State', 'City'],
... )
>>> df['AllTogether'] = df['Country'].str.cat(df[['State', 'City']], sep=' - ')
>>> print(df)
  Country       State       City                   AllTogether
0     USA      Nevada  Las Vegas      USA - Nevada - Las Vegas
1  Brazil  Pernambuco     Recife  Brazil - Pernambuco - Recife

Do note that if your pandas dataframe/series has null values, you need to include the parameter na_rep to replace the NaN values with a string, otherwise the combined column will default to NaN. 请注意,如果您的pandas数据框/系列具有空值,则需要包括参数na_rep以用字符串替换NaN值,否则合并的列将默认为NaN。


#6楼

Use of a lamba function this time with string.format(). 这次通过string.format()使用lamba函数。

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'Quarter': ['q1', 'q2']})
print df
df['YearQuarter'] = df[['Year','Quarter']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
print df
 
  Quarter  Year
0      q1  2014
1      q2  2015
  Quarter  Year YearQuarter
0      q1  2014      2014q1
1      q2  2015      2015q2

This allows you to work with non-strings and reformat values as needed. 这使您可以根据需要使用非字符串并重新格式化值。

import pandas as pd
df = pd.DataFrame({'Year': ['2014', '2015'], 'Quarter': [1, 2]})
print df.dtypes
print df
 
df['YearQuarter'] = df[['Year','Quarter']].apply(lambda x : '{}q{}'.format(x[0],x[1]), axis=1)
print df
 
Quarter     int64
Year       object
dtype: object
   Quarter  Year
0        1  2014
1        2  2015
   Quarter  Year YearQuarter
0        1  2014      2014q1
1        2  2015      2015q2

来源于:https://blog.csdn.net/xfxf996/article/details/80930414

注:来源于这篇博主,手残退出找不到这篇文章了。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值