Amazon Science 团队计划于VLDB 2024 (August 26-30 2024) 发布 redset 数据集

数据集介绍

        Redset是一个数据集,包含了三个月的AWS Redshift fleet 中选定实例样本上运行的用户查询元数据。

数据集用途

        Amazon Science 团队打算在VLDB2024 期间开放该部分数据, 虽然目前数据集还没有开放,但是从数据集的Schema 来看,和在VLDB 2024 会议期间公开该数据集。 可以大胆猜测Redshift 开发团队会在VLDB 上有重要论文发布,同时公布其真实用户的提升数据。 

        研究数据集的Schema ,从另外一方面看,顶级数据库大厂,对数据库的核心指标的描述,或者监控维度在这个Schema 里面已经得到应有的表达。该数据集未来一段时间应该会成为数据库领域优化的benchmark 数据集,值得大家关注 。  数据集链接请看https://www.selectdataset.com/dataset/1dfe70fc50251057041a91e5a882eb57。 

        后续数据集公开后,数据库领域感兴趣的小伙伴,可以第一时间去看看。

数据集 Schema

ColumnName Description
instance_idUniquely identifies a redshift cluster
cluster_sizeSize of the cluster (only available for provisioned)
user_idIdentifies the user that issued the query
database_idIdentifies the database that was queried
query_idUnique per instance
arrival_timeTimestamp when the query arrived on the system
compile_duration_msTime the query spent compiling in milliseconds
queue_duration_msTime the query spent queueing in milliseconds
execution_duration_msTime the query spent executing in milliseconds
feature_fingerprintHash value of the query fingerprint. A proxy for query-likeness, though not based on text. Will overestimate repetition.
was_abortedWhether the query was aborted during its lifetime
was_cachedWhether the query was answered from result cache
cache_source_query_idIf query was answered from result cache, this is the query id for the query which populated the cache
query_typeType of query, e.g.., selectcopy, ...
num_permanent_tables_accessedNumber of permanent table accesses by the query (regular database table)
num_external_tables_accessedNumber of external tables accessed by the query
num_system_tables_accessedNumber of system tables accessed by the query
read_table_idsComma separated list of unique permanent table ids read by the query
write_table_idsComma separated list of unique table ids written to by the query
mbytes_scannedTotal number of megabytes scanned by the query
mbytes_spilledTotal number of megabytes spilled by the query
num_joinsNumber of joins in the query plan
num_scansNumber of scans in the query plan
num_aggregationsNumber of aggregations in the query plan
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值