【深度学习】预取和替换---对msr-cambridge的数据预处理

1. Files

This distribution contains the following files:
     a) README.txt: this file
     b) *.csv.gz: I/O trace files, described below
     c) MD5.txt: output of md5sum for csv.gz files
     d) DISCLAIMER.txt: Disclaimer applying to all files in distribution

2. I/O trace files

There are 36 I/O traces from 36 different volumes on 13 servers.  More
information about these is available from the FAST 2008 paper (see
below).

Each trace file is named as <hostname>_<disknumber>.csv.gz. The hostnames
are "friendly" names corresponding to the names used in the paper. Disk
numbers are logical block device numbers.

Generally, disk 0 corresponds to the system or boot disk. However, for
the host "src1", disk 2 is the system disk.

3. I/O trace file format

The files are gzipped csv (comma-separated text) files. The fields in
the csv are:

Timestamp,Hostname,DiskNumber,Type,Offset,Size,ResponseTime

Timestamp is the time the I/O was issued in "Windows filetime"
Hostname is the hostname (should be the same as that in the trace file name)
DiskNumber is the disknumber (should be the same as in the trace file name)
Type is "Read" or "Write"
Offset is the starting offset of the I/O in bytes from the start of the logical
disk.
Size is the transfer size of the I/O request in bytes.
ResponseTime is the time taken by the I/O to complete, in Windows filetime
units.

4. Attribution.

Please cite the following publication as a reference in any published
work using these traces.

Write Off-Loading: Practical Power Management for Enterprise Storage
Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron
Microsoft Research Ltd.
Proc. 6th USENIX Conference on File and Storage Technologies (FAST ▒08)
http://www.usenix.org/event/fast08/tech/narayanan.html

There are 36 I/O traces from 36 different volumes on 13 servers. More
information about these is available from the FAST 2008 paper (see
below).
36个I/O traces 来自 36个不同的卷, 13个服务器

Each trace file is named as _.csv.gz. The hostnames
are “friendly” names corresponding to the names used in the paper. Disk numbers are logical block device numbers.

  1. I/O trace file format

The files are gzipped csv (comma-separated text) files. The fields in
the csv are:

Timestamp,Hostname,DiskNumber,Type,Offset,Size,ResponseTime

Timestamp is the time the I/O was issued in “Windows filetime”
Hostname is the hostname (should be the same as that in the trace file name)
DiskNumber is the disknumber (should be the same as in the trace file name)
Type is “Read” or “Write”
Offset is the starting offset of the I/O in bytes from the start of the logical disk.
Size is the transfer size of the I/O request in bytes.
ResponseTime is the time taken by the I/O to complete, in Windows filetime units.

列解析:
Timestamp, Hostname, DiskNumber, Type, Offset, Size, ResponseTime
我理解DiskNumber 遂文件应该是相同的,当然也可以都收集起来一起搞.
需要注意的是,Size和Offset的单位都是bytes

下面要做的是对所有的文件进行一次梳理,梳理出

total_count, R/W ratio, LBA unique, LBA_frequency, LBA_reuse_distance, LBA_delta, size, page_no, page_count(所占的页数)等,

LBA_detla覆盖率,LBA 覆盖率, reuse_distance分布等做铺垫。

-----------------------------------------------------------------------------------20240821---------等待补充

下面的内容包含:

  1. 按照指标挑选出使用的csv文件
  2. 对csv进行回归和分类的分别措施代码解析
  • 3
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值