PacificVAST2019 收录论文1（Visual Informatics 特刊系列二）

PacificVis2019将于4月23-26日在泰国曼谷召开。此次的PacificVAST2019 收录了6篇文章，在Visual Informatics VI官网2019年3月第一期上出版。

Interactive Labelling of a Multivariate Dataset for Supervised Machine Learning using Linked Visualisations, Clustering, and Active Learning
使用链接可视化、聚类和主动学习对监督机器学习进行多变量数据集的交互式标记
Mohammad Chegini, Jürgen Bernard, Philip Berger, Alexei Sourin, Keith Andrews, Tobias Schreck
在这里插入图片描述
摘要：监督机器学习技术需要标记多变量训练数据集。许多方法将机器学习算法与交互式可视化相结合来解决未标记数据集的问题。通过采用合适的技术，分析师可以在可高度交互的迭代式机器学习过程中发挥积极作用，实现对数据集的标记并构建有意义的划分。尽管这一思路已经在无监督、半监督或有监督的机器学习任务中得到实施，但将这三种方法组合到一起仍然具有挑战性。

本文提出了一种可视化分析方法，该方法将多种机器学习功能与四个链接的可视化视图集成到mVis系统中。通过技术调色板，分析人员可对多变量数据集进行探索性数据分析，实现有意义的数据标记划分，进而构建分类器。在这一过程中，分析师可以在主动学习支持的半监督过程中标记值得关注的模式或异常值。数据集被交互式标记后，分析师就可以通过有监督的机器学习继续后面的流程，来评估随后的分类器是否能有效体现标记过的训练数据集所表达的概念。通过采用自动选择维度的新技术，分析师可以对多变量数据集的维度进行交互来引导机器学习算法。

本文通过一个现实世界的足球数据集来展示mVis在执行多项分析和标记任务中的实用性，这些任务从初始标记过程中的迭代式数据探索、聚集、分类、通过主动学习来优化命名分区，到最终产生一个适用于训练分类器的、高质量标记的训练数据集。该工具为分析人员提供了交互式可视化功能，包括散点图，平行坐标，记录的相似性图，以及新的分区的相似性图。
The results of clustering, classification, and active learning in mVis

全文信息

Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning
BY: Mohammad Chegini, Jürgen Bernard, Philip Berger, Alexei Sourin, Keith Andrews, Tobias Schreck

Abstract: Supervised machine learning techniques require labelled multivariate training datasets. Many approaches address the issue of unlabelled datasets by tightly coupling machine learning algorithms with interactive visualisations. Using appropriate techniques, analysts can play an active role in a highly interactive and iterative machine learning process to label the dataset and create meaningful partitions. While this principle has been implemented either for unsupervised, semi-supervised, or supervised machine learning tasks, the combination of all three methodologies remains challenging.

In this paper, a visual analytics approach is presented, combining a variety of machine learning capabilities with four linked visualisation views, all integrated within the mVis (multivariate Visualiser) system. The available palette of techniques allows an analyst to perform exploratory data analysis on a multivariate dataset and divide it into meaningful labelled partitions, from which a classifier can be built. In the workflow, the analyst can label interesting patterns or outliers in a semi-supervised process supported by active learning. Once a dataset has been interactively labelled, the analyst can continue the workflow with supervised machine learning to assess to what degree the subsequent classifier has effectively learned the concepts expressed in the labelled training dataset. Using a novel technique called automatic dimension selection, interactions the analyst had with dimensions of the multivariate dataset are used to steer the machine learning algorithms.

A real-world football dataset is used to show the utility of mVis for a series of analysis and labelling tasks, from initial labelling through iterations of data exploration, clustering, classification, and active learning to refine the named partitions, to finally producing a high-quality labelled training dataset suitable for training a classifier. The tool empowers the analyst with interactive visualisations including scatterplots, parallel coordinates, similarity maps for records, and a new similarity map for partitions.

Key words: Labelling, Clustering, Classification, Active learning, Multivariate data, Visualisation

Link: https://www.sciencedirect.com/science/article/pii/S2468502X19300178

更多详细内容请关注VI 公众号
在这里插入图片描述