python不定长参数 *args 的用法

gt.csv

301     234  ['ad','bd','cd']
301     235     ['a','b','c']
301     237  ['af','bf','cf']
301     239  ['a2','b2','c2']
302     236  ['a1','b1','c1']
303     238  ['a3','b3','c3']
303    2323  ['a7','b7','c7']
304     230  ['a9','b9','c9']

需求:
针对gt.csv,按第一列的值划分后,随意取出不同的值及对应的行

import pandas as pd

ground_truth = './gt.csv'
ground_truth_data = pd.read_csv(ground_truth,
                                names=['queryID', 'termID', 'Context'],
                                delim_whitespace=True)

group = ground_truth_data.groupby('queryID')

def get_partData(group, *args):
    # 前提是已经读入ground_truth_data,并且进行groupby操作
    args_dict = {}
    for i in range(len(args)):
        args_dict[i] = args[i]
    all_lt = [[] for _ in range(len(args))]
    for g in group:
        queryID = g[0]
        for key, value in args_dict.items():
            if queryID in args_dict[key]:
                all_lt[key].append(g[1])
    return [pd.concat(x) for x in all_lt]

train_lt = [301]
dev_lt = [302]
test_lt = [303, 304]

train_data = get_partData(group, train_lt)
print('First:')
print(train_data)
print('\n')

train_data, dev_data = get_partData(group, train_lt, dev_lt)
print('Second:')
print(train_data)
print(dev_data)
print('\n')

train_data, dev_data, test_data = get_partData(group, train_lt, dev_lt, test_lt)
print('Third:')
print(train_data)
print(dev_data)
print(test_data)
print('\n')

>>>
First:
[   queryID  termID           Context
0      301     234  ['ad','bd','cd']
1      301     235     ['a','b','c']
3      301     237  ['af','bf','cf']
5      301     239  ['a2','b2','c2']]


Second:
   queryID  termID           Context
0      301     234  ['ad','bd','cd']
1      301     235     ['a','b','c']
3      301     237  ['af','bf','cf']
5      301     239  ['a2','b2','c2']
   queryID  termID           Context
2      302     236  ['a1','b1','c1']


Third:
   queryID  termID           Context
0      301     234  ['ad','bd','cd']
1      301     235     ['a','b','c']
3      301     237  ['af','bf','cf']
5      301     239  ['a2','b2','c2']
   queryID  termID           Context
2      302     236  ['a1','b1','c1']
   queryID  termID           Context
4      303     238  ['a3','b3','c3']
6      303    2323  ['a7','b7','c7']
7      304     230  ['a9','b9','c9']

注意:
1.生成多个空列表的方法是

all_lt = [[] for _ in range(len(args))]

结果是[[],[],[]]

而不是下面这种方法

all_lt = [[] * len(args)]

这个的结果是[[]]

2.在return部分,如果返回的值确定是大于一个的话,可以用

return (pd.concat(x) for x in all_lt])

否则当返回的值只有一个时,返回的是生成器对象,而不是具体的值

展开阅读全文

没有更多推荐了,返回首页