改进符号数据可视化以促进模式识别和知识发现

符号数据通常是从大型数据集聚合而来,用于隐藏条目特定的细节,并将大量数据(如大数据)转换成可分析量。在总体趋势比个别细节更重要的地方它可用来提供总览。符号数据有多种形式,如区间、直方图、类别和模态多值对象。符号数据也可以认为是一种分布。目前,实际使用的符号数据可视化方法是zoomstars,它有许多局限性。最大的限制是因为需要另一维度的数据,默认分布(直方图)在2D内不受支持。
在这里插入图片描述
本文研究符号数据的可视化,并分析其复杂结构带来的挑战,同时提出了对zoomstars的几种改进,使其能够通过分位数或等价的区间方法实现2D内直方图的可视化。此外,还提出了对分类变量和模态变量的几项改进,使之能更清楚地展现所呈现的类别。

根据数据类型和期望的目标,本文为用户提供了基于zoomstars的不同可视化方案。此外,提出了一种形状编码的方法,可在综合的类似表格的图中可视化整个数据集。这些可视化方法及其可用性通过三个符号数据集进行了验证,这三个数据集在探索性数据挖掘阶段分别用来识别趋势、相似对象和重要特征,检测数据中的异常值和差异。
Example of Zoomstars using SODAS software (a) 2D (with opened distribution window for feature “Urbancity”) and (b) 3D zoomstars of environment data .

关键词:
数据可视化, 符号数据, Zoomstar, 形状编码,探索性数据分析

全文信息

Improving symbolic data visualization for pattern recognition and knowledge discovery

BY: Kadri Umbleja, Manabu Ichino, Hiroyuki Yaguchi

Abstract:
This paper examines the visualization of symbolic data and considers the challenges rising from its complex structure. Symbolic data is usually aggregated from large data sets and used to hide entry specific details and to transform huge amounts of data (like big data) into analyzable quantities. It is also used to offer an overview in places where general trends are more important than individual details. Symbolic data comes in many forms like intervals, histograms, categories and modal multi-valued objects. Symbolic data can also be considered as a distribution. Currently, the de facto visualization approach for symbolic data is zoomstars which has many limitations. The biggest limitation is that the default distributions (histograms) are not supported in 2D as additional dimension is required. This paper proposes several new improvements for zoomstars which would enable it to visualize histograms in 2D by using a quantile or an equivalent interval approach. In addition, several improvements for categorical and modal variables are proposed for a clearer indication of presented categories. Recommendations for different approaches to zoomstars are offered depending on the data type and the desired goal. Furthermore, an alternative approach that allows visualizing the whole data set in comprehensive table-like graph, called shape encoding, is proposed. These visualizations and their usefulness are verified with three symbolic data sets in exploratory data mining phase to identify trends, similar objects and important features, detecting outliers and discrepancies in the data.

Keywords: Data visualization, Symbolic data, Zoomstar, Shape encoding, Exploratory data analysis

Link: https://www.sciencedirect.com/science/article/pii/S2468502X19300014

期刊信息:
Visual Informatics(中文名《可视信息学》)是由浙江大学主办、浙江大学出版社和Elsevier出版集团联合出版、在线发行、开放获取的国际学术期刊。该刊聚焦于面向人类感知的视觉信息的建模、分析、合成、增强与自然交互。主编是周昆教授、Hans-Peter Seidel教授。
Elsevier link (including First Online Articles): https://www.journals.elsevier.com/visual-informatics
Submit your paper:
https://www.editorialmanager.com/VISINF/default.aspx
Tel:(86-571)88206681-519
E-mail: lujinzhi@cad.zju.edu.cn
Linked in:Visual Informatics
Wechat
在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值