python iterrows,Python:使用.iterrows()创建列

I am trying to use a loop function to create a matrix of whether a product was seen in a particular week.

Each row in the df (representing a product) has a close_date (the date the product closed) and a week_diff (the number of weeks the product was listed).

import pandas

mydata = [{'subid' : 'A', 'Close_date_wk': 25, 'week_diff':3},

{'subid' : 'B', 'Close_date_wk': 26, 'week_diff':2},

{'subid' : 'C', 'Close_date_wk': 27, 'week_diff':2},]

df = pandas.DataFrame(mydata)

My goal is to see how many alternative products were listed for each product in each date_range

I have set up the following loop:

for index, row in df.iterrows():

i = 0

max_range = row['Close_date_wk']

min_range = int(row['Close_date_wk'] - row['week_diff'])

for i in range(min_range,max_range):

col_head = 'job_week_' + str(i)

row[col_head] = 1

Can you please help explain why the "row[col_head] = 1" line is neither adding a column, nor adding a value to that column for that row.

For example, if:

row A has date range 1,2,3

row B has date range 2,3

row C has date range 3,4,5'

then ideally I would like to end up with

row A has 0 alternative products in week 1

1 alternative products in week 2

2 alternative products in week 3

row B has 1 alternative products in week 2

2 alternative products in week 3

&c..

解决方案

You can't mutate the df using row here to add a new column, you'd either refer to the original df or use .loc, .iloc, or .ix, example:

In [29]:

df = pd.DataFrame(columns=list('abc'), data = np.random.randn(5,3))

df

Out[29]:

a b c

0 -1.525011 0.778190 -1.010391

1 0.619824 0.790439 -0.692568

2 1.272323 1.620728 0.192169

3 0.193523 0.070921 1.067544

4 0.057110 -1.007442 1.706704

In [30]:

for index,row in df.iterrows():

df.loc[index,'d'] = np.random.randint(0, 10)

df

Out[30]:

a b c d

0 -1.525011 0.778190 -1.010391 9

1 0.619824 0.790439 -0.692568 9

2 1.272323 1.620728 0.192169 1

3 0.193523 0.070921 1.067544 0

4 0.057110 -1.007442 1.706704 9

You can modify existing rows:

In [31]:

# reset the df by slicing

df = df[list('abc')]

for index,row in df.iterrows():

row['b'] = np.random.randint(0, 10)

df

Out[31]:

a b c

0 -1.525011 8 -1.010391

1 0.619824 2 -0.692568

2 1.272323 8 0.192169

3 0.193523 2 1.067544

4 0.057110 3 1.706704

But adding a new column using row won't work:

In [35]:

df = df[list('abc')]

for index,row in df.iterrows():

row['d'] = np.random.randint(0,10)

df

Out[35]:

a b c

0 -1.525011 8 -1.010391

1 0.619824 2 -0.692568

2 1.272323 8 0.192169

3 0.193523 2 1.067544

4 0.057110 3 1.706704

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值