【深度学习】预取和替换---对msr-cambridge的数据预处理

weixin_40293999

于 2024-08-21 18:09:39 发布

阅读量565

点赞数 3

分类专栏：深度学习文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/weixin_40293999/article/details/141396346

版权

深度学习专栏收录该内容

96 篇文章 21 订阅

订阅专栏

1. Files

This distribution contains the following files:
     a) README.txt: this file
     b) *.csv.gz: I/O trace files, described below
     c) MD5.txt: output of md5sum for csv.gz files
     d) DISCLAIMER.txt: Disclaimer applying to all files in distribution

2. I/O trace files

There are 36 I/O traces from 36 different volumes on 13 servers.  More
information about these is available from the FAST 2008 paper (see
below).

Each trace file is named as <hostname>_<disknumber>.csv.gz. The hostnames
are "friendly" names corresponding to the names used in the paper. Disk
numbers are logical block device numbers.

Generally, disk 0 corresponds to the system or boot disk. However, for
the host "src1", disk 2 is the system disk.

3. I/O trace file format

The files are gzipped csv (comma-separated text) files. The fields in
the csv are:

Timestamp,Hostname,DiskNumber,Type,Offset,Size,ResponseTime

Timestamp is the time the I/O was issued in "Windows filetime"
Hostname is the hostname (should be the same as that in the trace file name)
DiskNumber is the disknumber (should be the same as in the trace file name)
Type is "Read" or "Write"
Offset is the starting offset of the I/O in bytes from the start of the logical
disk.
Size is the transfer size of the I/O request in bytes.
ResponseTime is the time taken by the I/O to complete, in Windows filetime
units.

4. Attribution.

Please cite the following publication as a reference in any published
work using these traces.

Write Off-Loading: Practical Power Management for Enterprise Storage
Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron
Microsoft Research Ltd.
Proc. 6th USENIX Conference on File and Storage Technologies (FAST ▒08)
http://www.usenix.org/event/fast08/tech/narayanan.html

There are 36 I/O traces from 36 different volumes on 13 servers. More
information about these is available from the FAST 2008 paper (see
below).
36个I/O traces 来自 36个不同的卷， 13个服务器

Each trace file is named as _.csv.gz. The hostnames
are “friendly” names corresponding to the names used in the paper. Disk numbers are logical block device numbers.

I/O trace file format

The files are gzipped csv (comma-separated text) files. The fields in
the csv are:

Timestamp,Hostname,DiskNumber,Type,Offset,Size,ResponseTime

Timestamp is the time the I/O was issued in “Windows filetime”
Hostname is the hostname (should be the same as that in the trace file name)
DiskNumber is the disknumber (should be the same as in the trace file name)
Type is “Read” or “Write”
Offset is the starting offset of the I/O in bytes from the start of the logical disk.
Size is the transfer size of the I/O request in bytes.
ResponseTime is the time taken by the I/O to complete, in Windows filetime units.

列解析：
Timestamp， Hostname, DiskNumber, Type, Offset, Size, ResponseTime
我理解DiskNumber 遂文件应该是相同的，当然也可以都收集起来一起搞.
需要注意的是，Size和Offset的单位都是bytes

下面要做的是对所有的文件进行一次梳理，梳理出

total_count, R/W ratio, LBA unique, LBA_frequency, LBA_reuse_distance, LBA_delta, size， page_no, page_count(所占的页数)等，
为
LBA_detla覆盖率，LBA 覆盖率， reuse_distance分布等做铺垫。

-----------------------------------------------------------------------------------20240821---------等待补充

下面的内容包含：

按照指标挑选出使用的csv文件
对csv进行回归和分类的分别措施代码解析

weixin_40293999

关注

3
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
【深度学习】预取和替换---对msr-cambridge的数据预处理

below).36个I/O traces 来自 36个不同的卷， 13个服务器列解析：我理解DiskNumber 遂文件应该是相同的，当然也可以都收集起来一起搞.需要注意的是，Size和Offset的单位都是bytes。
复制链接

扫一扫

专栏目录