python计算csv文件内的数据_使用python计算CSV文件数据的持续时间和平均值

最新推荐文章于 2024-01-14 17:21:44 发布

机智啵啵鸡

最新推荐文章于 2024-01-14 17:21:44 发布

阅读量1k

点赞数

文章标签： python计算csv文件内的数据

本文链接：https://blog.csdn.net/weixin_33980343/article/details/114462318

版权

本文展示了如何使用pandas处理CSV文件，提取特定IP地址间的时间戳，并计算时间间隔的平均值和标准差。通过读取CSV，筛选数据，计算时间差，最终得出统计信息。

摘要由CSDN通过智能技术生成

如果您使用像pandas这样的高级库，您可以更容易地解决这个问题。我来演示一下：

假设您在file.csv中保存了下一个数据文件：2013-07-18 04:54:15.871 UDP 172.12.332.11:20547 172.12.332.11:20547 -> 172.56.213.80:53 CREATE Ignore 0

2013-07-18 04:54:15.841 UDP 192.33.230.81:37192 192.81.130.82:37192 -> 172.81.123.70:53 CREATE Ignore 0

2013-07-18 04:54:15.831 TCP 172.12.332.11:42547 172.12.332.11:42547 -> 172.56.213.80:53 CREATE Ignore 0

2013-07-18 04:54:15.821 UDP 192.33.230.81:37192 192.81.130.82:37192 -> 172.81.123.70:53 CREATE Ignore 0

2013-07-18 04:54:15.811 TCP 172.12.332.11:42547 172.12.332.11:42547 -> 172.56.213.80:53 CREATE Ignore 0

首先，我们将其读入数据帧：

^{pr2}$

我们只需要第0列，第4列和第6列>> df = df[['0_1', 4, 6]]

>> print df.to_string()

0_1 4 6

0 2013-07-18 04:54:15.871000 172.12.332.11:20547 172.56.213.80:53

1 2013-07-18 04:54:15.841000 192.81.130.82:37192 172.81.123.70:53

2 2013-07-18 04:54:15.831000 172.12.332.11:42547 172.56.213.80:53

3 2013-07-18 04:54:15.821000 192.81.130.82:37192 172.81.123.70:53

4 2013-07-18 04:54:15.811000 172.12.332.11:42547 172.56.213.80:53

然后我们应该修复IP地址并删除端口：>>> df[4] = df[4].str.split(':').str.get(0)

>>> df[6] = df[6].str.split(':').str.get(0)

>>> print df.to_string()

0_1 4 6

0 2013-07-18 04:54:15.871000 172.12.332.11 172.56.213.80

1 2013-07-18 04:54:15.841000 192.81.130.82 172.81.123.70

2 2013-07-18 04:54:15.831000 172.12.332.11 172.56.213.80

3 2013-07-18 04:54:15.821000 192.81.130.82 172.81.123.70

4 2013-07-18 04:54:15.811000 172.12.332.11 172.56.213.80

假设您对源地址172.12.332.11和目的地172.56.213.80感兴趣。我们将筛选出：>>> filtered = df[(df[4] == '172.12.332.11') & (df[6] == '172.56.213.80')]

>>> print filtered.to_string()

0_1 4 6

0 2013-07-18 04:54:15.871000 172.12.332.11 172.56.213.80

2 2013-07-18 04:54:15.831000 172.12.332.11 172.56.213.80

4 2013-07-18 04:54:15.811000 172.12.332.11 172.56.213.80

现在我们需要计算时间戳之间的差异：>>> timestamps = filtered['0_1']

>>> diffs = (timestamps.shift() - timestamps).dropna()

>>> print diffs.to_string()

2 00:00:00.040000

4 00:00:00.020000

我们现在可以计算任何我们想要的统计数据：>>> diffs.mean() # this is in nanoseconds

30000000.0

>>> diffs.std()

14142135.62373095

编辑：对于您发送给我的数据import io

import pandas as pd

def load_dataframe(filename):

# First you read the data as a regular csv file and extract the _raw column values

values = pd.read_csv(filename)['_raw'].values

# Cleanup the values: remove newline character

values = map(lambda x: x.replace('\n', ' '), values)

# Add them to a stream

s = io.StringIO(u'\n'.join(values))

# And now everithing is the same just read it from the stream

df = pd.read_table(s, sep='\s+', header=None, parse_dates=[[0,1]])[['0_1',4, 6]]

df[4] = df[4].str.split(':').str.get(0)

df[6] = df[6].str.split(':').str.get(0)

return df

def get_diffs(df, source, destination):

timestamps = df[(df[4] == source) & (df[6] == destination)]['0_1']

return (timestamps.shift() - timestamps).dropna()

def main():

filename = raw_input('Enter filename: ')

df = load_dataframe(filename)

while True:

source = raw_input('Enter source IP: ').strip()

destination = raw_input('Enter destination IP: ').strip()

diffs = get_diffs(df, source, destination)

for i, row in enumerate(diffs):

print('row %d - row %d = %s' % (i+2, i+1, row.astype('timedelta64[ms]')))

print('Mean: %s' % diffs.mean())

yn = raw_input('Again? [y/n]: ').lower().strip()

if yn != 'y':

return

if __name__ == '__main__':

main()

用法示例：$ python test.py

Enter filename: Data.csv

Enter source IP: 172.16.122.21

Enter destination IP: 172.55.102.107

Mean: 3333333.33333

Std: 5773502.6919

Again? [y/n]: n

机智啵啵鸡

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python计算csv文件内的数据_使用python计算CSV文件数据的持续时间和平均值

如果您使用像pandas这样的高级库，您可以更容易地解决这个问题。我来演示一下：假设您在file.csv中保存了下一个数据文件：2013-07-18 04:54:15.871 UDP 172.12.332.11:20547 172.12.332.11:20547 -> 172.56.213.80:53 CREATE Ignore 02013-07-18 04:54:15.841 UDP 19...
复制链接

扫一扫