apply筛选 pandas_pandas dataframe 过滤——apply最灵活!!!

按照某特定string字段长度过滤:

import pandas as pd

df = pd.read_csv('filex.csv')

df['A'] = df['A'].astype('str')

df['B'] = df['B'].astype('str')

mask = (df['A'].str.len() == 10) & (df['B'].str.len() == 10)

df = df.loc[mask]

print(df)

Applied to filex.csv:

A,B

123,abc

1234,abcd

1234567890,abcdefghij

the code above prints

A B

2 1234567890 abcdefghij

或者是:

data={"names":["Alice","Zac","Anna","O"],"cars":["Civic","BMW","Mitsubishi","Benz"],

"age":["1","4","2","0"]}

df=pd.DataFrame(data)

"""

df:

age cars names

0 1 Civic Alice

1 4 BMW Zac

2 2 Mitsubishi Anna

3 0 Benz O

Then:

"""

df[

df['names'].apply(lambda x: len(x)>1) &

df['cars'].apply(lambda x: "i" in x) &

df['age'].apply(lambda x: int(x)<2)

]

"""

We will have :

age cars names

0 1 Civic Alice

"""

最灵活的是用apply:

def load_metadata(dir_name):

columns_index_list = [

MetaIndex.M_METADATA_ID_INDEX,

MetaIndex.M_SRC_IP_INDEX,

MetaIndex.M_DST_IP_INDEX,

MetaIndex.M_SRC_PORT_INDEX,

MetaIndex.M_DST_PORT_INDEX,

MetaIndex.M_PROTOCOL_INDEX,

MetaIndex.M_HEADER_H,

MetaIndex.M_PAYLOAD_H,

MetaIndex.M_TCP_FLAG_H,

MetaIndex.M_FLOW_FIRST_PKT_TIME,

MetaIndex.M_FLOW_LAST_PKT_TIME,

MetaIndex.M_OCTET_DELTA_COUNT_FROM_TOTAL_LEN,

]

columns_name_list = [

"M_METADATA_ID_INDEX",

"M_SRC_IP_INDEX",

"M_DST_IP_INDEX",

"M_SRC_PORT_INDEX",

"M_DST_PORT_INDEX",

"M_PROTOCOL_INDEX",

"M_HEADER_H",

"M_PAYLOAD_H",

"M_TCP_FLAG_H",

"M_FLOW_FIRST_PKT_TIME",

"M_FLOW_LAST_PKT_TIME",

"M_OCTET_DELTA_COUNT_FROM_TOTAL_LEN",

]

def metadata_parse_filter(row):

try:

if row['M_PROTOCOL_INDEX'] != 6:

return False

if len(row['M_HEADER_H']) < 2 or len(row['M_PAYLOAD_H']) < 2 or not is_l34_tcp_metadata(row['M_METADATA_ID_INDEX']):

return False

first_time = row['M_FLOW_FIRST_PKT_TIME'].split('-')

last_time = row['M_FLOW_LAST_PKT_TIME'].split('-')

flow_first_pkt_time = int(first_time[0])

rev_flow_first_pkt_time = int(first_time[1])

flow_last_pkt_time = int(last_time[0])

rev_flow_last_pkt_time = int(last_time[1])

if flow_first_pkt_time > flow_last_pkt_time or rev_flow_first_pkt_time > rev_flow_last_pkt_time:

return False

return True

except Exception as e:

return False

for root, dirs, files in os.walk(dir_name):

for filename in files:

file_path = os.path.join(root, filename)

df = pd.read_csv(file_path, delimiter='^', usecols=columns_index_list, names=columns_name_list, encoding='utf-8', error_bad_lines=False, warn_bad_lines=True, header=0, lineterminator="\n")

filter_df = df.loc[df.apply(metadata_parse_filter, axis=1)]

yield filter_df

直接按照row过滤!

在使用pandas dataframeapply()方法时,可以通过传递一个函数作为参数来对数据进行操作。apply()方法可以用于对每个元素、每行或每列应用函数。 例如,假设我们有一个名为df的pandas dataframe,可以使用apply()方法来对其中的元素进行平方根操作。我们可以传递np.sqrt函数作为参数,来对df中的每个元素进行平方根计算。具体操作如下: df.apply(np.sqrt) # 相当于np.sqrt(df) 这样,我们就可以得到一个新的dataframe,其中的每个元素都是原始df中对应元素的平方根。通过apply()方法和传递适当的函数,我们可以对dataframe中的数据进行灵活的操作和处理。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* *3* [pandas进阶--Dataframeapply方法](https://blog.csdn.net/qq_38727995/article/details/124459704)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* [一文搞懂Pandas Dataframe中的apply方法](https://blog.csdn.net/weixin_39915649/article/details/126476752)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值