python中match方法中r什么意思_python Pandas/numpy的R的match()等价于什么?

以下是我最终使用的完整代码:#read in df containing actions in chunks:

tp = read_csv('/data/logactions.csv',

quoting=csv.QUOTE_NONNUMERIC,

iterator=True, chunksize=1000,

encoding='utf-8', skipinitialspace=True,

error_bad_lines=False)

df = concat([chunk for chunk in tp], ignore_index=True)

# set classes to NaN

df["klass"] = NaN

df = df[notnull(df['url'])]

df = df.reset_index(drop=True)

# iterate over text files, match, grab klass

startdate = date(2013, 1, 1)

enddate = date(2013, 1, 26)

d = startdate

while d <= enddate:

dstring = d.isoformat()

print dstring

# Read in each file w/ classifications in chunks

tp = read_csv('/data/textContentClassified/content{dstring}classfied.tsv'.format(**locals()),

sep = ',', quoting=csv.QUOTE_NONNUMERIC,

iterator=True, chunksize=1000,

encoding='utf-8', skipinitialspace=True,

error_bad_lines=False)

thisdatedf = concat([chunk for chunk in tp], ignore_index=True)

thisdatedf=thisdatedf.drop_duplicates(['url'])

thisdatedf=thisdatedf.reset_index(drop=True)

thisdatedf = thisdatedf[notnull(thisdatedf['url'])]

df["klass"] = df.klass.combine_first(thisdatedf.set_index('url').klass[df.url].reset_index(drop=True))

# Now iterate

d = d + timedelta(days=1)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值